A minor shortcut or add-on can sometimes prove to be a Godsend and can be a real productivity booster. So, here are some of my favourite tips and tricks that I have used and compiled together in the form of this article. Some may be fairly known and some may be new but I am sure they would come in pretty handy the next time you work on a Data Analysis project.
Profiling is a process that helps us in understanding our data andPandas Profiling is python package which does exactly that. It is a simple and fast way to perform exploratory data analysis of a Pandas Dataframe. The pandas
df.info()functions are normally used as a first step in the EDA process. However, it only gives a very basic overview of the data and doesn’t help much in the case of large data sets. The Pandas Profiling function, on the other hand, extends the pandas DataFrame with
df.profile_report() for quick data analysis. It displays a lot of information with a single line of code and that too in an interactive HTML report.
For a given dataset the pandas profiling package computes the following statistics:
pip install pandas-profilingorconda install -c anaconda pandas-profiling
Let’s use the age-old titanic dataset to demonstrate the capabilities of the versatile python profiler.
#importing the necessary packages import pandas as pd import pandas_profiling # Depreciated: pre 2.0.0 version df = pd.read_csv('titanic/train.csv') pandas_profiling.ProfileReport(df)
Edit: A week after this article was published, Pandas-Profiling came out with a major upgrade -version 2.0.0. The syntax has changed a bit, in fact, the functionality has been included in the pandas itself and the report has become more comprehensive. Below is the latest usage syntax:
To display the report in a Jupyter notebook, run:
#Pandas-Profiling 2.0.0 df.profile_report()
This single line of code is all that you need to display the data profiling report in a Jupyter notebook. The report is pretty detailed including charts wherever necessary.
The report can also be exported into an interactive HTML file with the following code.
profile = df.profile_report(title='Pandas Profiling Report') profile.to_file(outputfile="Titanic data profiling.html")
Refer the documentation for more details and examples.
Pandas has a built-in
.plot() function as part of the DataFrame class. However, the visualisations rendered with this function aren’t interactive and that makes it less appealing. On the contrary, the ease to plot charts with
pandas.DataFrame.plot() function also cannot be ruled out. What if we could plot interactive plotly like charts with pandas without having to make major modifications to the code? Well, you can actually do that with the help of Cufflinkslibrary**.**
pip install plotly # Plotly is a pre-requisite before installing cufflinks pip install cufflinks
#importing Pandas import pandas as pd #importing plotly and cufflinks in offline mode import cufflinks as cf import plotly.offline cf.go_offline() cf.set_config_file(offline=False, world_readable=True)
Time to see the magic unfold with the Titanic dataset.
The visualisation on the right shows the static chart while the left chart is interactive and more detailed and all this without any major change in the syntax.
Click here for more examples.
Magic commands are a set of convenient functions in Jupyter Notebooks that are designed to solve some of the common problems in standard data analysis. You can see all available magics with the help of
Magic commands are of two kinds: line magics, which are prefixed by a single
% character and operate on a single line of input, and cell magics, which are associated with the double
%% prefix and operate on multiple lines of input. Magic functions are callable without having to type the initial % if set to 1.
Let’s look at some of them that might be useful in common data analysis tasks:
%pastebin uploads code to Pastebin and returns the url. Pastebin is an online content hosting service where we can store plain text like source code snippets and then the url can be shared with others. In fact, Github gist is also akin to pastebin albeitwith version control.
Consider a python script
file.py with the following content:
#file.py def foo(x): return x
Using %pastebin in Jupyter Notebook generates a pastebin url.
%matplotlib inline function is used to render the static matplotlib plots within the Jupyter notebook. Try replacing the
inline part with
notebook to get zoom-able & resize-able plots, easily. Make sure the function is called before importing the matplotlib library.
%run function runs a python script inside a notebook.
%%writefile writes the contents of a cell to a file. Here the code will be written to a file named foo.py and saved in the current directory.
The %%latex function renders the cell contents as LaTeX. It is useful for writing mathematical formulae and equations in a cell.
The interactive debugger is also a magic function but I have given it a category of its own. If you get an exception while running the code cell, type
%debug in a new line and run it. This opens an interactive debugging environment which brings you to the position where the exception has occurred. You can also check for values of variables assigned in the program and also perform operations here. To exit the debugger hit
If you want to produce aesthetically pleasing representations of your data structures, pprint is the go-to module. It is especially useful when printing dictionaries or JSON data. Let’s have a look at an example which uses both
pprint to display the output.
We can use alert/Note boxes in your Jupyter Notebooks to highlight something important or anything that needs to stand out. The colour of the note depends upon the type of alert that is specified. Just add any or all of the following codes in a cell that needs to be highlighted.
<div class="alert alert-block alert-info"> <b>Tip:</b> Use blue boxes (alert-info) for tips and notes. If it’s a note, you don’t have to include the word “Note”. </div>
<div class="alert alert-block alert-warning"> <b>Example:</b> Yellow Boxes are generally used to include additional examples or mathematical formulas. </div>
<div class="alert alert-block alert-success"> Use green box only when necessary like to display links to related content. </div>
<div class="alert alert-block alert-danger"> It is good to avoid red boxes but can be used to alert users to not delete some important part of code etc. </div>
Consider a cell of Jupyter Notebook containing the following lines of code:
In : 10+5 11+6 Out : 17
It is a normal property of the cell that only the last output gets printed and for the others, we need to add the
print() function. Well, it turns out that we can print all the outputs just by adding the following snippet at the top of the notebook.
from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all"
Now all the outputs get printed one after the other.
In : 10+5 11+6 12+7 Out : 15 Out : 17 Out : 19
To revert to the original setting :
InteractiveShell.ast_node_interactivity = "last_expr"
A typical way of running a python script from the command line is:
python hello.py. However, if you add an additional
-i while running the same script e.g
python -i hello.py it offers more advantages. Let’s see how.
import pdb pdb.pm()
This will bring us o the position where the exception has occurred and we can then work upon the code.
Ctrl/Cmd + / comments out selected lines in the cell by automatically. Hitting the combination again will uncomment the same line of code.
Have you ever accidentally deleted a cell in a Jupyter Notebook? If yes then here is a shortcut which can undo that delete action.
EDIT > Undo Delete Cells
In this article, I’ve listed the main tips I have gathered while working with Python and Jupyter Notebooks. I am sure they will be of use to you and you will take back something from this article. Till then Happy Coding!.
If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.
If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.
In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.
#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition
Hack Wifi Passwords easily..
#wifi #python #passwords #wifipasswords #linux #coding #programming #hacking #hack
#wifi #hack #using #python #python #hacking
At the end of 2019, Python is one of the fastest-growing programming languages. More than 10% of developers have opted for Python development.
In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.
Table of Contents hide
The Size and declared value and its sequence of the object can able to be modified called mutable objects.
Mutable Data Types are list, dict, set, byte array
The Size and declared value and its sequence of the object can able to be modified.
Immutable data types are int, float, complex, String, tuples, bytes, and frozen sets.
id() and type() is used to know the Identity and data type of the object
a**=str(“Hello python world”)****#str**
Numbers are stored in numeric Types. when a number is assigned to a variable, Python creates Number objects.
Python supports 3 types of numeric data.
int (signed integers like 20, 2, 225, etc.)
float (float is used to store floating-point numbers like 9.8, 3.1444, 89.52, etc.)
complex (complex numbers like 8.94j, 4.0 + 7.3j, etc.)
A complex number contains an ordered pair, i.e., a + ib where a and b denote the real and imaginary parts respectively).
The string can be represented as the sequence of characters in the quotation marks. In python, to define strings we can use single, double, or triple quotes.
# String Handling
#single (') Quoted String
# Double (") Quoted String
# triple (‘’') (“”") Quoted String
In python, string handling is a straightforward task, and python provides various built-in functions and operators for representing strings.
The operator “+” is used to concatenate strings and “*” is used to repeat the string.
'Output : Python python ’
#python web development #data types in python #list of all python data types #python data types #python datatypes #python types #python variable type
Welcome to my Blog, In this article, we will learn python lambda function, Map function, and filter function.
Lambda function in python: Lambda is a one line anonymous function and lambda takes any number of arguments but can only have one expression and python lambda syntax is
Syntax: x = lambda arguments : expression
Now i will show you some python lambda function examples:
#python #anonymous function python #filter function in python #lambda #lambda python 3 #map python #python filter #python filter lambda #python lambda #python lambda examples #python map
With possibly everything that one can think of which revolves around data, the need for people who can transform data into a manner that helps in making the best of the available data is at its peak. This brings our attention to two major aspects of data – data science and data analysis. Many tend to get confused between the two and often misuse one in place of the other. In reality, they are different from each other in a couple of aspects. Read on to find how data analysis and data science are different from each other.
Before jumping straight into the differences between the two, it is critical to understand the commonalities between data analysis and data science. First things first – both these areas revolve primarily around data. Next, the prime objective of both of them remains the same – to meet the business objective and aid in the decision-making ability. Also, both these fields demand the person be well acquainted with the business problems, market size, opportunities, risks and a rough idea of what could be the possible solutions.
Now, addressing the main topic of interest – how are data analysis and data science different from each other.
As far as data science is concerned, it is nothing but drawing actionable insights from raw data. Data science has most of the work done in these three areas –
#big data #latest news #how are data analysis and data science different from each other #data science #data analysis #data analysis and data science different