What is neuron analysis of a machine? Learn machine learning by designing Robotics algorithm. Click here for best machine learning course models with AI

**Outliers**

Many machine learning algorithms are sensitive to the range and distribution of attribute values in the input data. Outliers in input data can skew and mislead the training process of machine learning algorithms resulting in longer training times, less accurate models and ultimately poorer results. Even before predictive models are prepared on training data, outliers can result in misleading representations and in turn misleading interpretations of collected data. Outliers can skew the summary distribution of attribute values in descriptive statistics like mean and standard deviation and in plots such as histograms and scatterplots, compressing the body of the data.

Finally, outliers can represent examples of data instances that are relevant to the problem such as anomalies in the case of fraud detection and computer security. To get in-depth knowledge of Machine Learning, you can enroll for a live Machine Learning certification training by OnlineITGuru with 24/7 support and lifetime access.

**Outlier Modeling**
Outliers are extreme values that fall a long way outside of the other observations. For example, in a normal distribution, outliers may be values on the tails of the distribution.

The process of identifying outliers has many names in data mining and machine learning such as outlier mining, outlier modeling and novelty detection and anomaly detection.

**Extreme Value Analysis:** Determine the statistical tails of the underlying distribution of the data. For example, statistical methods like the z-scores on univariate data.
**Probabilistic and Statistical Models:** Determine unlikely instances from a probabilistic model of the data. For example, gaussian mixture models optimized using expectation-maximization.
**Linear Models:** Projection methods that model the data into lower dimensions using linear correlations. For example, principle component analysis and data with large residual errors may be outliers.
**Proximity-based Models:** Data instances that are isolated from the mass of the data as determined by cluster, density or nearest neighbor analysis.
**Information Theoretic Models:** Outliers are detected as data instances that increase the complexity (minimum code length) of the dataset.
**High-Dimensional Outlier Detection:** Methods that search subspaces for outliers give the breakdown of distance based measures in higher dimensions (curse of dimensionality).

Irad Ben-Gal proposes a taxonomy of outlier models as univariate or multivariate and parametric and nonparametric. This is a useful way to structure methods based on what is known about the data. For example:

- Are you considered with outliers in one or more than one attributes (univariate or multivariate methods)?
- Can you assume a statistical distribution from which the observations were sampled or not (parametric or nonparametric)?

**Get Started**
There are many methods and much research put into outlier detection. Start by making some assumptions and design experiments where you can clearly observe the effects of the those assumptions against some performance or accuracy measure.

I recommend working through a stepped process from extreme value analysis, proximity methods and projection methods.

**Extreme Value Analysis**
You do not need to know advanced statistical methods to look for, analyze and filter out outliers from your data. Start out simple with extreme value analysis.

**Focus on univariate methods**
Visualize the data using scatterplots, histograms and box and whisker plots and look for extreme values
Assume a distribution (Gaussian) and look for values more than 2 or 3 standard deviations from the mean or 1.5 times from the first or third quartile
Filter out outliers candidate from training dataset and assess your models performance

**Proximity Methods**
Once you have explore simpler extreme value methods, consider moving onto proximity-based methods.

Use clustering methods to identify the natural clusters in the data (such as the k-means algorithm) Identify and mark the cluster centroids Identify data instances that are a fixed distance or percentage distance from cluster centroids Filter out outliers candidate from training dataset and assess your models performance

**Projection Methods**
Projection methods are relatively simple to apply and quickly highlight extraneous values.

Use projection methods to summarize your data to two dimensions (such as PCA, SOM or Sammonâ€™s mapping) Visualize the mapping and identify outliers by hand Use proximity measures from projected values or codebook vectors to identify outliers Filter out outliers candidate from training dataset and assess your models performance

**Methods Robust to Outliers**
An alternative strategy is to move to models that are robust to outliers. There are robust forms of regression that minimize the median least square errors rather than mean (so-called robust regression), but are more computationally intensive. There are also methods like decision trees that are robust to outliers.

You could spot check some methods that are robust to outliers. If there are significant model accuracy benefits then there may be an opportunity to model and filter out outliers from your training data. l hope you enjoyed reading and understood what is Machine Learning. Check out our Machine Learning certification training here, which comes with instructor-led live training and real-life project experience.

best online machine learning course best machine learning course best machine learning course online

What is neuron analysis of a machine? Learn machine learning by designing Robotics algorithm. Click here for best machine learning course models with AI

This Machine Learning course masters you in algorithms like regression, clustering & classification. Get access to Machine Learning certification training course now!

This complete Machine Learning full course video covers all the topics that you need to know to become a master in the field of Machine Learning.

Techtutorials tell you the best online IT courses/training, tutorials, certification courses, and syllabus from beginners to advanced level on the latest technologies recommended by Programming Community through video-based, book, free, paid, Real-time Experience, etc.

Machine Learning is an utilization of Artificial Intelligence (AI) that provides frameworks the capacity to naturally absorb and improve as a matter of fact without being expressly modified. AI centers round the improvement of PC programs which will get to information and use it learn for themselves.The way toward learning starts with perceptions or information, for instance , models, direct understanding, or guidance, so on look for designs in information and choose better choices afterward hooked in to the models that we give. The essential point is to allow the PCs adapt consequently without human intercession or help and modify activities as needs be.