Often, you’ll have some level of intuition — or perhaps concrete evidence — to suggest that a set of observations has been generated by a particular statistical distribution. Similar phenomena to the one you are modelling may have been shown to be explained well by a certain distribution. The setup of the situation or problem you are investigating may naturally suggest a family of distributions to try. Or maybe you just want to have a bit of fun by fitting your data to some obscure model just to see what happens (if you are challenged on this, tell people you’re doing Exploratory Data Analysis and that you don’t like to be disturbed when you’re in your zone).

Now, there are many ways of estimating the parameters of your chosen model from the data you have. The simplest of these is the method of moments — an effective tool, but one not without its disadvantages (notably, these estimates are often biased).

Another method you may want to consider is Maximum Likelihood Estimation (MLE), which tends to produce better (ie more unbiased) estimates for model parameters. It’s a little more technical, but nothing that we can’t handle. Let’s see how it works.

What is likelihood?

The likelihood — more precisely, the likelihood function — is a function that represents how likely it is to obtain a certain set of observations from a given model. We’re considering the set of observations as fixed — they’ve happened, they’re in the past — and now we’re considering under which set of model parameters we would be most likely to observe them.

A simple coin-flipping example

Consider an example. Let’s say we flipped a coin 100 times and observed 52 heads and 48 tails. We want to come up with a model that will predict the number of heads we’ll get if we kept flipping another 100 times.

Formalising the problem a bit, let’s think about the number of heads obtained from 100 coin flips. Given that:

  • there are only two possible outcomes (heads and tails),
  • there’s a fixed number of “trials” (100 coin flips), and that
  • there’s a fixed probability of “success” (ie getting a heads),

we might reasonably suggest that the situation could be modelled using a binomial distribution.

We can use R to set up the problem as follows (check out the Jupyter notebook used for this article for more detail):

# I don’t know about you but I’m feeling
set.seed(22)

# Generate an outcome, ie number of heads obtained, assuming a fair coin was used for the 100 flips
heads <- rbinom(1,100,0.5)
heads
# 52

(For the purposes of generating the data, we’ve used a 50/50 chance of getting a heads/tails, although we are going to pretend that we don’t know this for the time being. For almost all real world problems we don’t have access to this kind of information on the processes that generate the data we’re looking at — which is entirely why we are motivated to estimate these parameters!)

#data #towards-data-science #statistics #r #data-science #data analysis

Maximum Likelihood Estimation in R
1.10 GEEK