Here is a summary of how I was taught to assess the p-value in hopes of helping some other non-statistician out there.

P-value in Context

Let’s start with the context. When does the p-value even come into play? It is important to make decisions that are backed by data. In Data Science, this is called Data-Driven Decision Making (DDDM). Data is collected, hypotheses are formed about what that data means, the data is then run through a series of statistical calculations also known as hypothesis testing, and in the end, you have calculated values that help guide you in assessing the validity of your hypotheses. One of these calculated values is the p-value or probability value.

Hypothesis Testing

Assume you have data on animal sightings in city streets. These sightings include foxes, coyotes, mice, cats, dogs, and even elephants! What is the probability of seeing an elephant walking down the street? As any good scientist does, you develop a hypothesis and test it. This is called hypothesis testing. In hypothesis testing, you have two opposing hypotheses. First is the null hypothesis, which effectively states there’s no evidence of anything significant in the data here, in this case, elephant sightings are not rare. Alternately, you have a hypothesis that essentially states the purpose of the study or what you are testing for in your calculations. Put simply, the alternative hypothesis states there is evidence of a significant event occurring and you should reject the null hypothesis, in this case, sighting an elephant is rare and therefore is a significant event.

#data-science #p-value #data analysis

P-Value for the Non-Statistician
1.05 GEEK