What functions to use when, while dealing with probability distributions in python-scipy.

A large part of decision making in today’s businesses involves analyzing the available historical data efficiently through machine learning and AI algorithms. ML algorithms rely totally on the underlying statistical analyses that are conducted on those data points.

There are many phases in the statistical analysis where probability distributions are major considerations.

One such scenario is while performing hypothesis tests in inferential statistics. Most statistical tests assume an underlying probability distribution. Based on that distribution, we can attain confidence in the inference we made about the underlying data.

Distributions of data can be broadly classified based on the data types — Discrete and Continuous. If the variable has a discrete value, then it could possess one among the following two fundamental distributions

1. Binomial — When the outcome of the event that the random variable is tied to, either “Outcome A” or “Outcome B”. E.g., Let’s say a random variable “X” is tied to the following

a. Event — tossing a coin

b. Outcome — Either “Head” or “Tail”

Here “X” follows a binomial distribution since it has 2 outcomes.

2. _Poisson Distribution _— When the outcome of a particular event is time-dependant. E.g., Let’s say we need to find the probability of 5 people visiting the shop in the next ten minutes, knowing that on an average 3 people visit the shop every ten minutes. Let’s say a random variable “Y” is tied to the following

a. _Event _— People visiting the shop

b. Outcome — could be any number of people.

Here “X” follows a Poisson distribution since it could have many discrete outcomes.

If the variable has a continuous value, then it could possess a “Normal” distribution.

3. _Normal Distribution _— E.g., If you want to find the probability of height of a persona with a specific age. You have the mean height of people at that specific age and also the general spread of height across all age groups (standard Deviation). Let’s say a random variable “Z” is tied to the following

a. Event — Height of a person in a specific Age group

b. Outcome — could be any real number.

Here “Z” follows a Normal distribution.

Now that we know these basic probability distributions, let’s have a look at understanding these distributions with hands-on in python library — “Scipy.stats” — which is a module containing various libraries for probability distributions and various other statistical tests and functions.

This article would guide you through the most commonly used and most basic libraries –

  1. Binom()
  2. Poisson()
  3. Norm()

Let’s take an example to work with a binomial distribution. Let’s say we have a problem statement like this –

“The probability of a customer buying a TV from the store is 0.25. There are 15 people in the store looking around for TV. The customers are not interacting with each other.”

We know that the above problem follows a binomial distribution because it has only 2 discrete outcomes.

The following table and represents the distribution of probability that n (across the 15) persons buy a TV.

#data-analysis #probability-distributions #statistics #scipy #probability #data analysis

Everything You Need, to Kickstart Your “Scipy” Journey!!
1.05 GEEK