## Introduction

Picking the right machine learning algorithm is decisive, where it decides the performance of the model. The most dominating factor in choosing a model is the performance, which employs the KFold-cross-validation technique to achieve independence.

The chosen model usually has a higher mean performance. Nevertheless, sometimes it originated through a statistical fluke. There are many **statistical hypothesis-testing** approaches to evaluate the mean performance difference resulting from the cross-validation to address this concern. If the difference is above the significance level `**p-value**`

we can reject the null hypothesis that the two algorithms are the same, and the difference is not significant.

I usually include such a step in my pipeline either when developing a new classification model or competing in one of Kaggle’s competitions.

## Tutorial Objectives

- Understanding the difference between statistical hypothesis tests.
- Model selection based on the mean performance score could be misleading.
- Why using the Paired Student’s t-test over the original Student’s t-test.
- Applying the advance technique of
**5X2 fold** by utilizing the **MLxtend** library for comparing the algorithms based on **p-value**

## Table of content

- What does the statistical significance testing mean?
- Types of commonly used statistical hypothesis testings
- Extract the best two models based on performance.
- Steps to conduct hypothesis testing on the best two
- Steps to apply the 5X2 fold
- Comparing Classifier algorithms
- Summary
- References

#statistics #machine-learning #python #classification-algorithms #hypothesis-testing