In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. An outlier can cause serious problems in statistical analyses.
In this new article we will explain what outliers are and why they are so important, we will see a practical step by step example in Python, 1, 2 and 3 dimensional displays and the use of a general purpose library.
It is interesting to see the translations of “outlier” — according to its context — in English:
That gives us an idea, doesn’t it?
That is, the outliers in our dataset will be the values that “escape the range where most samples are concentrated”. According to Wikipedia they are the samples that are distant from other observations.
And why are we interested in detecting these Outliers? Because they can considerably affect the results that a Machine Learning model can obtain… For bad… or for good! That’s why we have to detect them, and take them into account. For example in Linear Regression or Assembly algorithms can have a negative impact on your predictions.
Outliers can mean a number of things:
Many times it is easy to identify the outliers in graphs. Let’s see examples of outliers in 1, 2 and 3 dimensions.
#detection #outliers #anomaly #python