Hypothesis Testing is a broad subject that is applicable to many fields. When we study statistics, the Hypothesis Testing there involves data from multiple populations and the test is to see how significant the effect is on the population.

This involves calculating the p-value and comparing it with the critical value or the alpha. When it comes to Machine Learning, Hypothesis Testing deals with finding the function that best approximates independent features to the target. In other words, map the inputs to the outputs.

By the end of this tutorial, you will know the following:

  • What is Hypothesis in Statistics vs Machine Learning
  • What is Hypothesis space?
  • Process of Forming a Hypothesis

Hypothesis in Statistics

A Hypothesis is an assumption of a result that is falsifiable, meaning it can be proven wrong by some evidence. A Hypothesis can be either rejected or failed to be rejected. We never accept any hypothesis in statistics because it is all about probabilities and we are never 100% certain. Before the start of the experiment, we define two hypotheses:

1. Null Hypothesis: says that there is no significant effect

2. Alternative Hypothesis: says that there is some significant effect

In statistics, we compare the P-value (which is calculated using different types of statistical tests) with the critical value or alpha. The larger the P-value, the higher is the likelihood, which in turn signifies that the effect is not significant and we conclude that we fail to reject the null hypothesis.

In other words, the effect is highly likely to have occurred by chance and there is no statistical significance of it. On the other hand, if we get a P-value very small, it means that the likelihood is small. That means the probability of the event occurring by chance is very low.

Significance Level

The Significance Level is set before starting the experiment. This defines how much is the tolerance of error and at which level can the effect can be considered significant. A common value for significance level is 95% which also means that there is a 5% chance of us getting fooled by the test and making an error. In other words, the critical value is 0.05 which acts as a threshold. Similarly, if the significance level was set at 99%, it would mean a critical value of 0.01%.

P-Value

A statistical test is carried out on the population and sample to find out the P-value which then is compared with the critical value. If the P-value comes out to be less than the critical value, then we can conclude that the effect is significant and hence reject the Null Hypothesis (that said there is no significant effect). If P-Value comes out to be more than the critical value, we can conclude that there is no significant effect and hence fail to reject the Null Hypothesis.

Now, as we can never be 100% sure, there is always a chance of our tests being correct but the results being misleading. This means that either we reject the null when it is actually not wrong. It can also mean that we don’t reject the null when it is actually false. These are type 1 and type 2 errors of Hypothesis Testing.

**Example **

Consider you’re working for a vaccine manufacturer and your team develops the vaccine for Covid-19. To prove the efficacy of this vaccine, it needs to statistically proven that it is effective on humans. Therefore, we take two groups of people of equal size and properties. We give the vaccine to group A and we give a placebo to group B. We carry out analysis to see how many people in group A got infected and how many in group B got infected.

We test this multiple times to see if group A developed any significant immunity against Covid-19 or not. We calculate the P-value for all these tests and conclude that P-values are always less than the critical value. Hence, we can safely reject the null hypothesis and conclude there is indeed a significant effect.

#hypothesis #hypothesis in machine learning #machine learning

What is Hypothesis in Machine Learning? How to Form a Hypothesis?
1.10 GEEK