ML: Student’s, Two-Sample & Paired Sample T-tests. Don’t Use It Blindly.

In Z-test, we assume we know the standard deviation of the population. What if we don’t know the standard deviation of the population? In this case, we assume the standard deviation of the sample distribution and keep going with Z-test. What if we don’t know the mean of the population? We can similarly somehow assume the mean and go with Z-test. When do we use a t-test? We use a t-test when the sample size is small. How small is small? We are using CLT(Central Limit Theorem) and it works well when the sample size is large enough. Since the sampling distribution should be Gaussian Distribution. If the sample size is too small, then this assumption starts to break apart, it does not follow Gaussian Distribution. It follows the heavy tail distribution, t-distribution.

Notes: The assumption is that the standard deviation of samples and populations is the same. We are trying to find out the difference in the mean of samples and populations. However, ANOVA doesn’t assume the standard deviation is the same. I will cover ANOVA also in the later posts.

#machine-learning #bioinformatics #data-science #two-sample-t-test #t-test

medium.com

ML: Student’s, Two-Sample & Paired Sample T-tests. Don’t Use It Blindly.