Bayesian Decision Theory

Oh! What are we if not patterns?

It’s amazing how we perform so many sophisticated tasks easily and yet yearn to teach machines how to do them. Telling the difference between apples and oranges is quite trivial for us but to teach this to someone who only understands ‘0’ and ‘1’ is troublesome. Now what would seem an strenous task can be made easy (or at least, feasible) using a very familiar mathematical formula. But the question is: Is the formula intuitive?

Bayes’ Theorem is like E=mc² of probability theory. Everyone has seen it but only few understands it. If I had to choose one adjective for it, it’d be revolutionary! It changed the course of many applications we were fiddling with before. But before diving into the nitty dritty of the algorithm, we need to set our fundamentals straight.

P.S.: To explain the concepts I have taken the example of binary classification for the sake of simplicity but without any loss of generality.

Marginal, Conditional and Joint Probabilities

Marginal Probability

When we usually talk about probability of an event, it is the marginal probability we are concerned with. In other words, it is the probability of an event irrespective of any other factor/event/circumstance. Basically, you ‘marginalize’ other events and hence the name. It is denoted by **_P(A) _**and read as “probability of A”.

Conditional Probability

Conditional probability is when the occurence of an event is wholly or partially affected by other event(s). Alternatively put, it is the probability of occurrence of an event A when an another event B has already taken place. It is denoted by P(A|B) and read as “probability of A given B”.

Joint Probability

Joint probability is calculated when we are interested in the occurrence of two different events simultaneously. It is also often referenced as probability of intersection of two events. It is denoted by P(A, B) and read as “probability of A and B”.

Probability and Likelihood

There is a very subtle difference between likelihood and probability and perhaps this is the very reason why people often consider them similar, if not same.

To understand the difference between them, we first need to understand what a model is, more specifically, statistical model.

A model can be viewed as any one of the process, relationship, equation or an approximation to understand and describe the data.

Consider the below graph:

Image for post

This could be a model as it gives us a ‘description’ of how our data look like. We can see a relationship between the features (x and y) in the above graph i.e., the variation of the features w.r.t each other.

#pattern-recognition #machine-learning #bayesian-statistics #algorithms #classification #algorithms

Marginal, Conditional and Joint Probabilities

Probability and Likelihood

towardsdatascience.com

Bayesian Decision Theory