While dealing with a big dataset, it is important to understand the relationship between the features. That is a big part of data analysis. The relationships can be between two variables or amongst several variables. In this article, I will discuss how to present the relationships between multiple variables with some simple techniques. I am going to use Python’s Numpy, Pandas, Matplotlib, and Seaborn libraries.

First, import the necessary packages and the dataset.

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.read_csv("nhanes_2015_2016.csv")

This dataset is very large. At least too large to show a screenshot here. Here are the columns in this dataset.

df.columns
#Output:
Index(['SEQN', 'ALQ101', 'ALQ110', 'ALQ130', 'SMQ020', 'RIAGENDR', 'RIDAGEYR',        'RIDRETH1', 'DMDCITZN', 'DMDEDUC2', 'DMDMARTL', 'DMDHHSIZ', 'WTINT2YR',        'SDMVPSU', 'SDMVSTRA', 'INDFMPIR', 'BPXSY1', 'BPXDI1', 'BPXSY2',        'BPXDI2', 'BMXWT', 'BMXHT', 'BMXBMI', 'BMXLEG', 'BMXARML', 'BMXARMC',        'BMXWAIST', 'HIQ210'],       dtype='object')

Now, let’s make the dataset smaller with a few columns. So, it’s easier to handle and show in this article.

df = df[['SMQ020', 'RIAGENDR', 'RIDAGEYR','DMDCITZN', 
         'DMDEDUC2', 'DMDMARTL', 'DMDHHSIZ','SDMVPSU', 
         'BPXSY1', 'BPXDI1', 'BPXSY2', 'BPXDI2', 'RIDRETH1']]
df.head()

#artificial-intelligence #machine-learning #programming #technology #data-science

How to Present the Relationships Amongst Multiple Variables in Python
1.55 GEEK