Suppose you download your bank transactions for 2020. What are the chances that a random transaction’s amount begins with 3? Considering that there are 9 possible digits (omitting 0 as a 1st digit), you’d logically guess 1/9. Surprisingly, this is wrong. The true probability is actually around 12%. And the probability that the first digit is a 1 is amazingly over 30%.

So where did this rule come from and how can we use it?

History

Although commonly known as Benford’s Law, like many famous laws, it’s not named after the first person to discover it. It was actually an astronomer named Simoon Newcomb who noticed in the late 1800’s that in logarithm tables, some pages were worn much more than others — particularly the first few pages. His finding was later re-discovered by Frank Benford, who continued to do many more empirical tests to validate the theory.

In short, the law states that the leading digit of numbers in a “real” dataset do not occur with uniform probability. What do we mean by “real”? Here, we mean naturally-occurring sets of numbers — bank account transactions, street addresses, mathematical constants. The probability that a number begins with d (1,2,3…9) is given by the following formula:

Image for post

#benfords-law #analytics #data-science #fraud-detection #anomaly-detection

Anomaly Detection using Benford’s Law
1.30 GEEK