Suppose you are exploring a dataset and you want to examine if two categorical variables are dependent on each other. The motivation could be a better understanding of the relationship between an outcome variable and a predictor, identification of dependent predictors, etc. In this case, a** Chi-square test** can be an effective statistical tool.

In this post, I will discuss how to do this test in Python (both from scratch and using SciPy) with examples on a popular HR analytics dataset — the IBM Employee Attrition & Performance dataset.


Chi-square test is a statistical hypothesis test to perform when the test statistic is Chi-square distributed under the null hypothesis and particularly the Chi-square test for independence is often used to examine independence between two categorical variables [1].

#scipy #data-science #python

Chi-Square Test for Independenc in Python from the IBM HR Dataset
2.95 GEEK