Data Visualization with Microsoft Power BI, Python and Plotly

Learn to build interactive charts with Plotly, Python and Power BI. Learn to create various types of Data Visualization charts using Microsoft Power BI, Python and Plotly.

There are a wide range of professionals who require data visualization skills to plot various charts to find critical insights from the dataset. From Marketing and sales professionals to Developers, Analysts and Data Scientists, a large number of professionals require some kind of knowledge to adequately model and represent data into creative visuals that makes it easy and intuitive to understand the complex data value in form for comparable and easy to understand visual charts. Most of the basic charts such as bar, line, pie, tree map and other charts in Power BI and other visualization software are just inefficient to represent various kinds of data with complex information. Professionals just don't rely on few basic charts, rather they could create some custom chart to solve complex problem. Most of the custom or advanced visualization charts can be created by writing few lines of python code.

In this course, you will be learning following concepts and visualization charts using Python libraries such as Pandas, Matplotlib and Seaborn:

  •    Installing python packages and defining path
  •    Creating a Line chart with matplotlib
  •    Putting labels and creating dashed scatterplot
  •    Violin chart with seaborn
  •    More on Violin chart
  •    Stripplot
  •    Boxplot
  •    Lmplot or align plot

Data visualization make this task little bit more handy and fast. With the help of visual charts and graph, we can easily find out the outliers, nulls, random values, distinct records, the format of dates, sensibility of spatial data, and string and character encoding and much more.

Moreover, you will be learning different charts to represent different kind of data like categorical, numerical, spatial, textual and much more.

  •    Bar Charts (Horizontal and Vertical)
  •    Line Charts
  •    Pie Charts
  •    Donut Charts
  •    Scatter Charts
  •    Grouped Bar Chart (Horizontal and Vertical)
  •    Segmented Bar Chart (Horizontal and Vertical)
  •    Time and series Chart
  •    Sunburst Chart
  •    Candlestick Chart
  •    OHLC Charts
  •    Bubble Charts
  •    Dot Charts
  •    Multiple Line Charts and so on.

Most of the time data scientists pay little attention to graphs and focuses only on the numerical calculations which at times can be misleading. Data visualization is much crucial step to follow to achieve goals either in Data Analytics or Data Science to get meaningful insights or in machine learning to build accurate model. The skills you learn in this course can be used in various domains related to data science and data analytics to business intelligence and machine learning.

What you'll learn:

  • Learn to create interactive charts with Plotly
  • Learn to build dummy datasets like Fake Stock market price simulator
  • Learn to create vertical and horizontal bar charts
  • Learn to create vertical and horizontal grouped and stacked bar charts
  • Learn to create scatter charts
  • Learn to create line charts
  • Learn to create time series charts
  • Learn to create pie, donut and sunburst charts
  • Learn to create multiple line charts
  • Learn to create bubble and dot charts

 

#datavisualization #python #plotly #powerbi

What is GEEK

Buddha Community

Data Visualization with Microsoft Power BI, Python and Plotly
Anil  Sakhiya

Anil Sakhiya

1652748716

Exploratory Data Analysis(EDA) with Python

Exploratory Data Analysis Tutorial | Basics of EDA with Python

Exploratory data analysis is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods. It helps determine how best to manipulate data sources to get the answers you need, making it easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions. EDA is primarily used to see what data can reveal beyond the formal modeling or hypothesis testing task and provides a better understanding of data set variables and the relationships between them. It can also help determine if the statistical techniques you are considering for data analysis are appropriate or not.

🔹 Topics Covered:
00:00:00 Basics of EDA with Python
01:40:10 Multiple Variate Analysis
02:30:26 Outlier Detection
03:44:48 Cricket World Cup Analysis using Exploratory Data Analysis


Learning the basics of Exploratory Data Analysis using Python with Numpy, Matplotlib, and Pandas.

What is Exploratory Data Analysis(EDA)?

If we want to explain EDA in simple terms, it means trying to understand the given data much better, so that we can make some sense out of it.

We can find a more formal definition in Wikipedia.

In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.

EDA in Python uses data visualization to draw meaningful patterns and insights. It also involves the preparation of data sets for analysis by removing irregularities in the data.

Based on the results of EDA, companies also make business decisions, which can have repercussions later.

  • If EDA is not done properly then it can hamper the further steps in the machine learning model building process.
  • If done well, it may improve the efficacy of everything we do next.

In this article we’ll see about the following topics:

  1. Data Sourcing
  2. Data Cleaning
  3. Univariate analysis
  4. Bivariate analysis
  5. Multivariate analysis

1. Data Sourcing

Data Sourcing is the process of finding and loading the data into our system. Broadly there are two ways in which we can find data.

  1. Private Data
  2. Public Data

Private Data

As the name suggests, private data is given by private organizations. There are some security and privacy concerns attached to it. This type of data is used for mainly organizations internal analysis.

Public Data

This type of Data is available to everyone. We can find this in government websites and public organizations etc. Anyone can access this data, we do not need any special permissions or approval.

We can get public data on the following sites.

The very first step of EDA is Data Sourcing, we have seen how we can access data and load into our system. Now, the next step is how to clean the data.

2. Data Cleaning

After completing the Data Sourcing, the next step in the process of EDA is Data Cleaning. It is very important to get rid of the irregularities and clean the data after sourcing it into our system.

Irregularities are of different types of data.

  • Missing Values
  • Incorrect Format
  • Incorrect Headers
  • Anomalies/Outliers

To perform the data cleaning we are using a sample data set, which can be found here.

We are using Jupyter Notebook for analysis.

First, let’s import the necessary libraries and store the data in our system for analysis.

#import the useful libraries.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Read the data set of "Marketing Analysis" in data.
data= pd.read_csv("marketing_analysis.csv")

# Printing the data
data

Now, the data set looks like this,

If we observe the above dataset, there are some discrepancies in the Column header for the first 2 rows. The correct data is from the index number 1. So, we have to fix the first two rows.

This is called Fixing the Rows and Columns. Let’s ignore the first two rows and load the data again.

#import the useful libraries.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Read the file in data without first two rows as it is of no use.
data = pd.read_csv("marketing_analysis.csv",skiprows = 2)

#print the head of the data frame.
data.head()

Now, the dataset looks like this, and it makes more sense.

Dataset after fixing the rows and columns

Following are the steps to be taken while Fixing Rows and Columns:

  1. Delete Summary Rows and Columns in the Dataset.
  2. Delete Header and Footer Rows on every page.
  3. Delete Extra Rows like blank rows, page numbers, etc.
  4. We can merge different columns if it makes for better understanding of the data
  5. Similarly, we can also split one column into multiple columns based on our requirements or understanding.
  6. Add Column names, it is very important to have column names to the dataset.

Now if we observe the above dataset, the customerid column has of no importance to our analysis, and also the jobedu column has both the information of job and education in it.

So, what we’ll do is, we’ll drop the customerid column and we’ll split the jobedu column into two other columns job and education and after that, we’ll drop the jobedu column as well.

# Drop the customer id as it is of no use.
data.drop('customerid', axis = 1, inplace = True)

#Extract job  & Education in newly from "jobedu" column.
data['job']= data["jobedu"].apply(lambda x: x.split(",")[0])
data['education']= data["jobedu"].apply(lambda x: x.split(",")[1])

# Drop the "jobedu" column from the dataframe.
data.drop('jobedu', axis = 1, inplace = True)

# Printing the Dataset
data

Now, the dataset looks like this,

Dropping Customerid and jobedu columns and adding job and education columns

Missing Values

If there are missing values in the Dataset before doing any statistical analysis, we need to handle those missing values.

There are mainly three types of missing values.

  1. MCAR(Missing completely at random): These values do not depend on any other features.
  2. MAR(Missing at random): These values may be dependent on some other features.
  3. MNAR(Missing not at random): These missing values have some reason for why they are missing.

Let’s see which columns have missing values in the dataset.

# Checking the missing values
data.isnull().sum()

The output will be,

As we can see three columns contain missing values. Let’s see how to handle the missing values. We can handle missing values by dropping the missing records or by imputing the values.

Drop the missing Values

Let’s handle missing values in the age column.

# Dropping the records with age missing in data dataframe.
data = data[~data.age.isnull()].copy()

# Checking the missing values in the dataset.
data.isnull().sum()

Let’s check the missing values in the dataset now.

Let’s impute values to the missing values for the month column.

Since the month column is of an object type, let’s calculate the mode of that column and impute those values to the missing values.

# Find the mode of month in data
month_mode = data.month.mode()[0]

# Fill the missing values with mode value of month in data.
data.month.fillna(month_mode, inplace = True)

# Let's see the null values in the month column.
data.month.isnull().sum()

Now output is,

# Mode of month is
'may, 2017'
# Null values in month column after imputing with mode
0

Handling the missing values in the Response column. Since, our target column is Response Column, if we impute the values to this column it’ll affect our analysis. So, it is better to drop the missing values from Response Column.

#drop the records with response missing in data.
data = data[~data.response.isnull()].copy()
# Calculate the missing values in each column of data frame
data.isnull().sum()

Let’s check whether the missing values in the dataset have been handled or not,

All the missing values have been handled

We can also, fill the missing values as ‘NaN’ so that while doing any statistical analysis, it won’t affect the outcome.

Handling Outliers

We have seen how to fix missing values, now let’s see how to handle outliers in the dataset.

Outliers are the values that are far beyond the next nearest data points.

There are two types of outliers:

  1. Univariate outliers: Univariate outliers are the data points whose values lie beyond the range of expected values based on one variable.
  2. Multivariate outliers: While plotting data, some values of one variable may not lie beyond the expected range, but when you plot the data with some other variable, these values may lie far from the expected value.

So, after understanding the causes of these outliers, we can handle them by dropping those records or imputing with the values or leaving them as is, if it makes more sense.

Standardizing Values

To perform data analysis on a set of values, we have to make sure the values in the same column should be on the same scale. For example, if the data contains the values of the top speed of different companies’ cars, then the whole column should be either in meters/sec scale or miles/sec scale.

Now, that we are clear on how to source and clean the data, let’s see how we can analyze the data.

3. Univariate Analysis

If we analyze data over a single variable/column from a dataset, it is known as Univariate Analysis.

Categorical Unordered Univariate Analysis:

An unordered variable is a categorical variable that has no defined order. If we take our data as an example, the job column in the dataset is divided into many sub-categories like technician, blue-collar, services, management, etc. There is no weight or measure given to any value in the ‘job’ column.

Now, let’s analyze the job category by using plots. Since Job is a category, we will plot the bar plot.

# Let's calculate the percentage of each job status category.
data.job.value_counts(normalize=True)

#plot the bar graph of percentage job categories
data.job.value_counts(normalize=True).plot.barh()
plt.show()

The output looks like this,

By the above bar plot, we can infer that the data set contains more number of blue-collar workers compared to other categories.

Categorical Ordered Univariate Analysis:

Ordered variables are those variables that have a natural rank of order. Some examples of categorical ordered variables from our dataset are:

  • Month: Jan, Feb, March……
  • Education: Primary, Secondary,……

Now, let’s analyze the Education Variable from the dataset. Since we’ve already seen a bar plot, let’s see how a Pie Chart looks like.

#calculate the percentage of each education category.
data.education.value_counts(normalize=True)

#plot the pie chart of education categories
data.education.value_counts(normalize=True).plot.pie()
plt.show()

The output will be,

By the above analysis, we can infer that the data set has a large number of them belongs to secondary education after that tertiary and next primary. Also, a very small percentage of them have been unknown.

This is how we analyze univariate categorical analysis. If the column or variable is of numerical then we’ll analyze by calculating its mean, median, std, etc. We can get those values by using the describe function.

data.salary.describe()

The output will be,

4. Bivariate Analysis

If we analyze data by taking two variables/columns into consideration from a dataset, it is known as Bivariate Analysis.

a) Numeric-Numeric Analysis:

Analyzing the two numeric variables from a dataset is known as numeric-numeric analysis. We can analyze it in three different ways.

  • Scatter Plot
  • Pair Plot
  • Correlation Matrix

Scatter Plot

Let’s take three columns ‘Balance’, ‘Age’ and ‘Salary’ from our dataset and see what we can infer by plotting to scatter plot between salary balance and age balance

#plot the scatter plot of balance and salary variable in data
plt.scatter(data.salary,data.balance)
plt.show()

#plot the scatter plot of balance and age variable in data
data.plot.scatter(x="age",y="balance")
plt.show()

Now, the scatter plots looks like,

Pair Plot

Now, let’s plot Pair Plots for the three columns we used in plotting Scatter plots. We’ll use the seaborn library for plotting Pair Plots.

#plot the pair plot of salary, balance and age in data dataframe.
sns.pairplot(data = data, vars=['salary','balance','age'])
plt.show()

The Pair Plot looks like this,

Correlation Matrix

Since we cannot use more than two variables as x-axis and y-axis in Scatter and Pair Plots, it is difficult to see the relation between three numerical variables in a single graph. In those cases, we’ll use the correlation matrix.

# Creating a matrix using age, salry, balance as rows and columns
data[['age','salary','balance']].corr()

#plot the correlation matrix of salary, balance and age in data dataframe.
sns.heatmap(data[['age','salary','balance']].corr(), annot=True, cmap = 'Reds')
plt.show()

First, we created a matrix using age, salary, and balance. After that, we are plotting the heatmap using the seaborn library of the matrix.

b) Numeric - Categorical Analysis

Analyzing the one numeric variable and one categorical variable from a dataset is known as numeric-categorical analysis. We analyze them mainly using mean, median, and box plots.

Let’s take salary and response columns from our dataset.

First check for mean value using groupby

#groupby the response to find the mean of the salary with response no & yes separately.
data.groupby('response')['salary'].mean()

The output will be,

There is not much of a difference between the yes and no response based on the salary.

Let’s calculate the median,

#groupby the response to find the median of the salary with response no & yes separately.
data.groupby('response')['salary'].median()

The output will be,

By both mean and median we can say that the response of yes and no remains the same irrespective of the person’s salary. But, is it truly behaving like that, let’s plot the box plot for them and check the behavior.

#plot the box plot of salary for yes & no responses.
sns.boxplot(data.response, data.salary)
plt.show()

The box plot looks like this,

As we can see, when we plot the Box Plot, it paints a very different picture compared to mean and median. The IQR for customers who gave a positive response is on the higher salary side.

This is how we analyze Numeric-Categorical variables, we use mean, median, and Box Plots to draw some sort of conclusions.

c) Categorical — Categorical Analysis

Since our target variable/column is the Response rate, we’ll see how the different categories like Education, Marital Status, etc., are associated with the Response column. So instead of ‘Yes’ and ‘No’ we will convert them into ‘1’ and ‘0’, by doing that we’ll get the “Response Rate”.

#create response_rate of numerical data type where response "yes"= 1, "no"= 0
data['response_rate'] = np.where(data.response=='yes',1,0)
data.response_rate.value_counts()

The output looks like this,

Let’s see how the response rate varies for different categories in marital status.

#plot the bar graph of marital status with average value of response_rate
data.groupby('marital')['response_rate'].mean().plot.bar()
plt.show()

The graph looks like this,

By the above graph, we can infer that the positive response is more for Single status members in the data set. Similarly, we can plot the graphs for Loan vs Response rate, Housing Loans vs Response rate, etc.

5. Multivariate Analysis

If we analyze data by taking more than two variables/columns into consideration from a dataset, it is known as Multivariate Analysis.

Let’s see how ‘Education’, ‘Marital’, and ‘Response_rate’ vary with each other.

First, we’ll create a pivot table with the three columns and after that, we’ll create a heatmap.

result = pd.pivot_table(data=data, index='education', columns='marital',values='response_rate')
print(result)

#create heat map of education vs marital vs response_rate
sns.heatmap(result, annot=True, cmap = 'RdYlGn', center=0.117)
plt.show()

The Pivot table and heatmap looks like this,

Based on the Heatmap we can infer that the married people with primary education are less likely to respond positively for the survey and single people with tertiary education are most likely to respond positively to the survey.

Similarly, we can plot the graphs for Job vs marital vs response, Education vs poutcome vs response, etc.

Conclusion

This is how we’ll do Exploratory Data Analysis. Exploratory Data Analysis (EDA) helps us to look beyond the data. The more we explore the data, the more the insights we draw from it. As a data analyst, almost 80% of our time will be spent understanding data and solving various business problems through EDA.

Thank you for reading and Happy Coding!!!

#dataanalysis #python

Managing your Data with Microsoft’s Power BI

In a world where the sheer amount of data is often overwhelming, the ability to interrogate and organise data to make meaningful business decisions is more important than ever.

Microsoft have created Power BI to enable every day users to use the data more effectively rather than relying on IT staff or system admins.

This is image title

Power BI is an analytics tool that uses natural language in a similar way to searching in Google. It will offer a real-time insight into your company’s performance by reporting on and encouraging the manipulation of your CRM data and in some cases, pulling together data from multiple sources to give you a richer snapshot.

To get in-Depth knowledge on Power BI you can enroll for a live demo on Power BI online training

Creating and personalising your very own reports, sharing them through your organisation…Power BI makes all these actions possible.

We have found Directors and Executives are excited about what Power BI offers, an insight into their complete business performance in a way they have never had before either at their desk or on their mobile device.

One of the things that makes Power BI so special is how easy it is to use and gives everyone the information they need to make better business decisions.

One of the many ease of use features utilises Cortana Technology.
This is image title

‘Ask a question about your data’ gives you an instant, graphical report response to the data you seek and offers intelligent suggestions on what else might interest you?

The following are some of the reasons why Power BI, here’s an insight of Power BI’s features.

  • Creation of reports and interactive dashboards
  • Access to Power BI reports in your Dynamics 365 dashboards
  • Creation of reports based on multiple data sources
  • Reports publication and centralization on web service
  • Visualization of reports on different devices via Power BI Mobile
  • Real time reporting
    Take your career to new heights of success with Power BI online training Hyderabad

#power bi training #power bi course #learn power bi #microsoft power bi training #power bi online course #power bi training in ameerpet

sophia tondon

sophia tondon

1620885491

Microsoft Power BI Consulting | Power BI Solutions in India

Hire top dedicated Mirosoft power BI consultants from ValueCoders who aim at leveraging their potential to address organizational challenges for large-scale data storage and seamless processing.

We have a team of dedicated power BI consultants who help start-ups, SMEs, and enterprises to analyse business data and get useful insights.

What are you waiting for? Contact us now!

No Freelancers, 100% Own Staff
Experienced Consultants
Continuous Monitoring
Lean Processes, Agile Mindset
Non-Disclosure Agreement
Up To 2X Less Time

##power bi service #power bi consultant #power bi consultants #power bi consulting #power bi developer #power bi development

sophia tondon

sophia tondon

1619670565

Hire Power BI Developer | Microsoft Power BI consultants in India

Hire our expert Power BI consultants to make the most out of your business data. Our power bi developers have deep knowledge in Microsoft Power BI data modeling, structuring, and analysis. 16+ Yrs exp | 2500+ Clients| 450+ Team

Visit Website - https://www.valuecoders.com/hire-developers/hire-power-bi-developer-consultants

#power bi service #power bi consultant #power bi consultants #power bi consulting #power bi developer #power bi consulting services

Explore your JIRA Data with Power BI

JIRA Software provides bug tracking, issue tracking, and project management capabilities for teams and organizations. The JIRA content pack for Power BI helps you quickly import JIRA data so you can get an instant dashboard to analyze workloads, see how quickly you’re resolving issues, visualize velocity over time, and more. Power BI helps you quickly filter by project or component to generate new insights into your JIRA data.

To get in-Depth knowledge on Power BI you can enroll for a live demo on Power BI online training

This is image title

To connect to the JIRA content pack, simply choose JIRA from the list of available content packs. You will be asked to provide your JIRA URL and credentials. Once the connection has completed, Power BI will automatically create an out-of-the-box dashboard, report, and dataset with data from JIRA.

dashboard

The dashboard for the JIRA content pack shows you key metrics about your workloads, velocity over time, and the breakdown of your bugs and issues by status and assignee. Clicking on any of the dashboard tiles will open the JIRA content pack report, where you can interact and explore your data.

This is image title

You can slice the data in the report using the ‘Item Type’ slicer on the top, so you can find exactly the items you are looking for. Power BI reports are interactive, so if you select a ‘Fix Version’, a ‘Component’, or an ‘Assignee’, the rest of the report page will update per the selection.Learn more from Power BI online course

The report has a second page that focuses on the velocity of creating and resolving issues, and also shows the list of recently resolved items.

This is image title

After the initial import, the dashboard and the reports continue to update daily so you are always seeing up-to-date data. You can control the refresh schedule on the dataset, and configure it to refresh at the exact times you choose.
Take your career to new heights of success with Power BI online training Hyderabad

#power bi training #power bi course #learn power bi #power bi online training #microsoft power bi training #power bi online course