My least favorite kind of data science interview question is probability. It’s just not something that I think about everyday, so the probability muscles always feel super rusty whenever I am forced to exercise them. But if you are hunting for a data job, it’s inevitable that you will run into one at some point — so let’s keep our probability skills fresh with some practice. As usual, we will use simulation (and Python code) to better visualize what’s going on.


The Question

You’re headed to Seattle. You want to know if you should bring an umbrella so you call 3 random friends who live there and ask each independently whether it’s raining or not. Each friend has a 2/3 chance of telling you the truth and a 1/3 chance of lying (so mean!). All 3 friends tell you “Yes, it’s raining”. What is the probability that it’s actually raining in Seattle?

The first time I saw this problem, I thought “only if all 3 of my friends lied to me then it would mean it’s not raining in Seattle”. Because as long as one of my friends was not lying, then one of the yeses would be true (implying rain).

probability rain = 1 - probability all lying

= 1 - (1/3)^3 = 0.963

But then I thought this seems too simple. How could this question gain such notoriety if it was this simple. Sadly my instincts were right and this question is more complicated than meets the eye.


It’s A Conditional Probability

My previous approach ignored the given condition. What condition you ask? The interviewer told us that all 3 of our friends answered yes. That’s relevant information that needs to be factored into our solution.

Image for post

To see why, imagine that it’s raining in Seattle. Our friends answered [yes, yes, yes] when we asked them whether it was raining. Let’s think through the possible outcomes in this state of the world. If they all told the truth, then that is consistent with the world state that we’re in (because it IS raining in Seattle).

What if they all lied? That’s impossible. In a state of the world where it is raining, our friends can’t have answered “yes” and lied (to lie, they would have had to have replied “no”). Is it possible that just one of them lied? That’s not possible either — all of them said “yes” and to lie in the state of the world where it’s raining requires a “no”. So the only possible result if it IS raining in Seattle is that our friends are all telling the truth.

#data-science #statistics #technology #data analysis

Conditional Probabilities, Seattle Rain, And Tricky Friends
1.55 GEEK