Descriptive Statistics in Python

Descriptive statistics include those that summarize the central tendency, dispersion, and shape of a dataset’s distribution.

  1. Measure of central tendency
  2. Measure of spread/dispersion
  3. Measure of symmetry [ will save this for the future post]

Dataset

Imported all the libraries needed for statistical plots and created a dataframe from the dataset given in bmi.csv file.

This dataset contains Height, Weight, Age, BMI, and Gender columns. Let’s calculate descriptive statistics for this dataset.

The code used in this project is available as a Jupyter Notebook on GitHub.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
% matplotlib inline
df=pd.read_csv("bmi.csv")
df

Image for post

DataFrame

Measure of Central Tendency

Measure of central tendency is used to describe the middle/center value of the data.

Mean, Median, Mode are measures of central tendency.

1. Mean

  • Mean is the average value of the dataset.
  • Mean is calculated by adding all values in the dataset divided by the number of values in the dataset.
  • We can calculate the mean for only numerical variables

Formula to calculate mean

Image for post

#programming #python3 #pandas #python #data-science

Exploring Descriptive Statistics Using Pandas and Seaborn
6.70 GEEK