Metropolis Hastings Review

Very Short Introduction

Metropolis Hastings is a MCMC (Markov Chain Monte Carlo) class of sampling algorithms. Its most common usage is optimizing sampling from a posterior distribution when the analytical form is intractable or implausible to sample. This post follows the Statistics and the historical steps that led to the appearance of this algorithm.

Statistical Inference

Statistics is an inspiring discipline. The idea that each source of data: condo size in NY, annual income in Africa or distribution of sure names in Japan can be handled by using the same set of mathematical tools and provide a coherent corollary must not be taken for granted. As physics uses its rigorous equations to describe the entire real world phenomena, statistics use a fairly small amount of analytical tools to clarify the behavior of every real world data.

One of the leading inference problems in statistics is estimating distribution’s parameters such as identifying the parameters of Gaussian (mean and stdev) is one of the main assignment of statisticians. For using distributions, one must define a probability framework. There are two leading approaches for such frameworks:

· The frequentist

· The Bayesian

The frequentist has no prior knowledge about the data, she sees the probability as a long term frequencies. Each trial is a single experiment in an infinite sequence. She never assigns probability to parameters, i.e. a sample mean is a fixed measurable value. Nevertheless since it is fixed we cannot assign it a probability. In the words of one statistician: “there is no probability to have soup today” since today is not an infinite sequence of events.

Bayesian on the other hand has a belief. The Bayesian has a prior belief about the parameter’s distribution. Data is aggregated in order to update this belief. The belief in the parameters’ distribution is manifested in the prior distribution and the likelihood is the probability to get the data upon this belief. The product of these two functions is proportional to the posterior function. In the world of the Bayes not only the data but the parameters as well are random variables. In contrast to the frequentist, once a data is obtained, it is not a random variable any more.

#markov-chains #sampling #monte-carlo #metropolis-hastings #bayesian-machine-learning

Very Short Introduction

Statistical Inference

towardsdatascience.com

Metropolis Hastings Review