Suppose you are exploring a dataset and you want to examine if two categorical variables are dependent on each other. The motivation could be a better understanding of the relationship between an outcome variable and a predictor, identification of dependent predictors, etc. In this case, a** Chi-square test** can be an effective statistical tool.
In this post, I will discuss how to do this test in Python (both from scratch and using SciPy) with examples on a popular HR analytics dataset — the IBM Employee Attrition & Performance dataset.
Chi-square test is a statistical hypothesis test to perform when the test statistic is Chi-square distributed under the null hypothesis and particularly the Chi-square test for independence is often used to examine independence between two categorical variables [1].
#scipy #data-science #python