# Tutorial: Uncertainty estimation with CatBoost

Understanding why your model is uncertain and how to estimate the level of uncertainty. This tutorial post details how to quantify both data and knowledge uncertainty in CatBoost.

This tutorial covers the following topics:

• What is predictive uncertainty and why should you care about it?
• What are the two sources of uncertainty?
• How to estimate uncertainty for regression problems using the CatBoost gradient boosting library

You can follow all steps using this Jupyter Notebook.

## What is uncertainty?

Machine learning has been widely applied to a range of tasks. However, in certain high-risk applications, such as autonomous driving, medical diagnostics, and financial forecasting, a mistake can lead to either a fatal outcome or large financial loss. In these applications, it is important to detect when the system makes a mistake and take safer actions. Furthermore, it is also desirable to collect these “failure scenarios”, label them, and teach the system to make the correct prediction through active learning.

Predictive uncertainty estimation can be used for detecting errors . Ideally, the model indicates a high level of uncertainty in situations where it is likely to make a mistake. That allows us to detect errors and take safer actions. Crucially, the choice of action can depend on why the model is uncertain. There are two main sources of uncertainty: data uncertainty (also known as aleatoric uncertainty) and knowledge uncertainty (also known as epistemic uncertainty). If our goal is to detect errors, it is not necessary to separate these two uncertainties. However, if our goal is active learning, then we would like to detect novel inputs, and _knowledge uncertainty _can be used for that.

Data uncertainty arises due to the inherent complexity of the data, such as additive noise or overlapping classes. In these cases, the model knows that the input has attributes of multiple classes or that the target is noisy. Importantly, data uncertainty _cannot b_e reduced by collecting more training data.

Knowledge uncertainty arises when the model is given an input from a region that is either sparsely covered by the training data or far from the training data. In these cases, the model knows very little about this region and is likely to make a mistake. Unlike data uncertainty, knowledge uncertainty can be reduced by collecting more training data from a poorly understood region.

This tutorial post details how to quantify both data and knowledge uncertainty in CatBoost.

## Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

## 15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

## Gradient Descent for Data Science and Machine Learning

Gradient Descent for Data Science and Machine Learning. Solve Optimization Problems using Gradient Descent. You might not find it super exciting in and of itself, but it will enable us to do exciting things throughout the article, so bear with me.

## Best Free Datasets for Data Science and Machine Learning Projects

This post will help you in finding different websites where you can easily get free Datasets to practice and develop projects in Data Science and Machine Learning.

## 50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.