Python Face Detection Tutorial for Beginners

Python Face Detection Tutorial for Beginners

Computer vision is an exciting and growing field. There are tons of interesting problems to solve! One of them is face detection: the ability of a computer to recognize that a photograph contains a human face, tell you where it is located. In this post, you’ll learn about face detection with Python

Computer vision is an exciting and growing field. There are tons of interesting problems to solve! One of them is face detection: the ability of a computer to recognize that a photograph contains a human face, tell you where it is located. In this post, you’ll learn about face detection with Python

In this tutorial you’ll learn:

  • What face detection is
  • How computers understand features in images
  • How to quickly analyze many different features to reach a decision
  • How to use a minimal Python solution for detecting human faces in images

Computer vision is an exciting and growing field. There are tons of interesting problems to solve! One of them is face detection: the ability of a computer to recognize that a photograph contains a human face, and tell you where it is located. In this article, you’ll learn about face detection with Python.

To detect any object in an image, it is necessary to understand how images are represented inside a computer, and how that object differs visually from any other object.

Once that is done, the process of scanning an image and looking for those visual cues needs to be automated and optimized. All these steps come together to form a fast and reliable computer vision algorithm.

Table of Contents

What Is Face Detection?

Face detection is a type of computer vision technology that is able to identify people’s faces within digital images. This is very easy for humans, but computers need precise instructions. The images might contain many objects that aren’t human faces, like buildings, cars, animals, and so on.

It is distinct from other computer vision technologies that involve human faces, like facial recognition, analysis, and tracking.

Facial recognition involves identifying the face in the image as belonging to person X and not person Y. It is often used for biometric purposes, like unlocking your smartphone.

Facial analysis tries to understand something about people from their facial features, like determining their age, gender, or the emotion they are displaying.

Facial tracking is mostly present in video analysis and tries to follow a face and its features (eyes, nose, and lips) from frame to frame. The most popular applications are various filters available in mobile apps like Snapchat.

All of these problems have different technological solutions. This tutorial will focus on a traditional solution for the first challenge: face detection.

How Do Computers “See” Images?

The smallest element of an image is called a pixel, or a picture element. It is basically a dot in the picture. An image contains multiple pixels arranged in rows and columns.

You will often see the number of rows and columns expressed as the image resolution. For example, an Ultra HD TV has the resolution of 3840x2160, meaning it is 3840 pixels wide and 2160 pixels high.

But a computer does not understand pixels as dots of color. It only understands numbers. To convert colors to numbers, the computer uses various color models.

In color images, pixels are often represented in the RGB color model. RGB stands for Red Green Blue. Each pixel is a mix of those three colors. RGB is great at modeling all the colors humans perceive by combining various amounts of red, green, and blue.

Since a computer only understand numbers, every pixel is represented by three numbers, corresponding to the amounts of red, green, and blue present in that pixel.

In grayscale (black and white) images, each pixel is a single number, representing the amount of light, or intensity, it carries. In many applications, the range of intensities is from 0 (black) to 255 (white). Everything between 0 and 255 is various shades of gray.

If each grayscale pixel is a number, an image is nothing more than a matrix (or table) of numbers:

Example 3x3 image with pixel values and colors

In color images, there are three such matrices representing the red, green, and blue channels.

What Are Features?

A feature is a piece of information in an image that is relevant to solving a certain problem. It could be something as simple as a single pixel value, or more complex like edges, corners, and shapes. You can combine multiple simple features into a complex feature.

Applying certain operations to an image produces information that could be considered features as well. Computer vision and image processing have a large collection of useful features and feature extracting operations.

Basically, any inherent or derived property of an image could be used as a feature to solve tasks.


To run the code examples, you need to set up an environment with all the necessary libraries installed. The simplest way is to use conda.

You will need three libraries:

  1. scikit-image
  2. scikit-learn
  3. opencv

To create an environment in conda, run these commands in your shell:

$ conda create --name face-detection python=3.7
$ source activate face-detection
(face-detection)$ conda install scikit-learn
(face-detection)$ conda install -c conda-forge scikit-image
(face-detection)$ conda install -c menpo opencv3

If you are having problems installing OpenCV correctly and running the examples, try consulting their Installation Guide or the article on OpenCV Tutorials, Resources, and Guides.

Now you have all the packages necessary to practice what you learn in this tutorial.

Viola-Jones Object Detection Framework

This algorithm is named after two computer vision researchers who proposed the method in 2001: Paul Viola and Michael Jones.

They developed a general object detection framework that was able to provide competitive object detection rates in real time. It can be used to solve a variety of detection problems, but the main motivation comes from face detection.

The Viola-Jones algorithm has 4 main steps, and you’ll learn more about each of them in the sections that follow:

  1. Selecting Haar-like features
  2. Creating an integral image
  3. Running AdaBoost training
  4. Creating classifier cascades

Given an image, the algorithm looks at many smaller subregions and tries to find a face by looking for specific features in each subregion. It needs to check many different positions and scales because an image can contain many faces of various sizes. Viola and Jones used Haar-like features to detect faces.

Haar-Like Features

All human faces share some similarities. If you look at a photograph showing a person’s face, you will see, for example, that the eye region is darker than the bridge of the nose. The cheeks are also brighter than the eye region. We can use these properties to help us understand if an image contains a human face.

A simple way to find out which region is lighter or darker is to sum up the pixel values of both regions and comparing them. The sum of pixel values in the darker region will be smaller than the sum of pixels in the lighter region. This can be accomplished using Haar-like features.

A Haar-like feature is represented by taking a rectangular part of an image and dividing that rectangle into multiple parts. They are often visualized as black and white adjacent rectangles:

Basic Haar-like rectangle features

In this image, you can see 4 basic types of Haar-like features:

  1. Horizontal feature with two rectangles
  2. Vertical feature with two rectangles
  3. Vertical feature with three rectangles
  4. Diagonal feature with four rectangles

The first two examples are useful for detecting edges. The third one detects lines, and the fourth one is good for finding diagonal features. But how do they work?

The value of the feature is calculated as a single number: the sum of pixel values in the black area minus the sum of pixel values in the white area. For uniform areas like a wall, this number would be close to zero and won’t give you any meaningful information.

To be useful, a Haar-like feature needs to give you a large number, meaning that the areas in the black and white rectangles are very different. There are known features that perform very well to detect human faces:

Haar-like feature applied on the eye region. (Image: Wikipedia)

In this example, the eye region is darker than the region below. You can use this property to find which areas of an image give a strong response (large number) for a specific feature:

Haar-like feature applied on the bridge of the nose. (Image: Wikipedia)

This example gives a strong response when applied to the bridge of the nose. You can combine many of these features to understand if an image region contains a human face.

As mentioned, the Viola-Jones algorithm calculates a lot of these features in many subregions of an image. This quickly becomes computationally expensive: it takes a lot of time using the limited resources of a computer.

To tackle this problem, Viola and Jones used integral images.

Integral Images

An integral image (also known as a summed-area table) is the name of both a data structure and an algorithm used to obtain this data structure. It is used as a quick and efficient way to calculate the sum of pixel values in an image or rectangular part of an image.

In an integral image, the value of each point is the sum of all pixels above and to the left, including the target pixel:

Calculating an integral image from pixel values

The integral image can be calculated in a single pass over the original image. This reduces summing the pixel intensities within a rectangle into only three operations with four numbers, regardless of rectangle size:

Calculate the sum of pixels in the orange rectangle.

The sum of pixels in the rectangle ABCD can be derived from the values of points A, B, C, and D, using the formula D - B - C + A. It is easier to understand this formula visually:

Calculating the sum of pixels step by step

You’ll notice that subtracting both B and C means that the area defined with A has been subtracted twice, so we need to add it back again.

Now you have a simple way to calculate the difference between the sums of pixel values of two rectangles. This is perfect for Haar-like features!

But how do you decide which of these features and in what sizes to use for finding faces in images? This is solved by a machine learning algorithm called boosting. Specifically, you will learn about AdaBoost, short for Adaptive Boosting.


Boosting is based on the following question: “Can a set of weak learners create a single strong learner?” A weak learner (or weak classifier) is defined as a classifier that is only slightly better than random guessing.

In face detection, this means that a weak learner can classify a subregion of an image as a face or not-face only slightly better than random guessing. A strong learner is substantially better at picking faces from non-faces.

The power of boosting comes from combining many (thousands) of weak classifiers into a single strong classifier. In the Viola-Jones algorithm, each Haar-like feature represents a weak learner. To decide the type and size of a feature that goes into the final classifier, AdaBoost checks the performance of all classifiers that you supply to it.

To calculate the performance of a classifier, you evaluate it on all subregions of all the images used for training. Some subregions will produce a strong response in the classifier. Those will be classified as positives, meaning the classifier thinks it contains a human face.

Subregions that don’t produce a strong response don’t contain a human face, in the classifiers opinion. They will be classified as negatives.

The classifiers that performed well are given higher importance or weight. The final result is a strong classifier, also called a boosted classifier, that contains the best performing weak classifiers.

The algorithm is called adaptive because, as training progresses, it gives more emphasis on those images that were incorrectly classified. The weak classifiers that perform better on these hard examples are weighted more strongly than others.

Let’s look at an example:

The blue and orange circles are samples that belong to different categories.

Imagine that you are supposed to classify blue and orange circles in the following image using a set of weak classifiers:

The first weak classifier classifies some of the blue circles correctly.

The first classifier you use captures some of the blue circles but misses the others. In the next iteration, you give more importance to the missed examples:

The missed blue samples are given more importance, indicated by size.

The second classifier that manages to correctly classify those examples will get a higher weight. Remember, if a weak classifier performs better, it will get a higher weight and thus higher chances to be included in the final, strong classifiers:

The second classifier captures the bigger blue circles.

Now you have managed to capture all of the blue circles, but incorrectly captured some of the orange circles. These incorrectly classified orange circles are given more importance in the next iteration:

The misclassified orange circles are given more importance, and others are reduced.

The final classifier manages to capture those orange circles correctly:

The third classifier captures the remaining orange circles.

To create a strong classifier, you combine all three classifiers to correctly classify all examples:

The final, strong classifier combines all three weak classifiers.

Using a variation of this process, Viola and Jones have evaluated hundreds of thousands of classifiers that specialize in finding faces in images. But it would be computationally expensive to run all these classifiers on every region in every image, so they created something called a classifier cascade.

Cascading Classifiers

The definition of a cascade is a series of waterfalls coming one after another. A similar concept is used in computer science to solve a complex problem with simple units. The problem here is reducing the number of computations for each image.

To solve it, Viola and Jones turned their strong classifier (consisting of thousands of weak classifiers) into a cascade where each weak classifier represents one stage. The job of the cascade is to quickly discard non-faces and avoid wasting precious time and computations.

When an image subregion enters the cascade, it is evaluated by the first stage. If that stage evaluates the subregion as positive, meaning that it thinks it’s a face, the output of the stage is maybe.

If a subregion gets a maybe, it is sent to the next stage of the cascade. If that one gives a positive evaluation, then that’s another maybe, and the image is sent to the third stage:

A weak classifier in a cascade

This process is repeated until the image passes through all stages of the cascade. If all classifiers approve the image, it is finally classified as a human face and is presented to the user as a detection.

If, however, the first stage gives a negative evaluation, then the image is immediately discarded as not containing a human face. If it passes the first stage but fails the second stage, it is discarded as well. Basically, the image can get discarded at any stage of the classifier:

A cascade of n classifiers for face detection

This is designed so that non-faces get discarded very quickly, which saves a lot of time and computational resources. Since every classifier represents a feature of a human face, a positive detection basically says, “Yes, this subregion contains all the features of a human face.” But as soon as one feature is missing, it rejects the whole subregion.

To accomplish this effectively, it is important to put your best performing classifiers early in the cascade. In the Viola-Jones algorithm, the eyes and nose bridge classifiers are examples of best performing weak classifiers.

Now that you understand how the algorithm works, it is time to use it to detect faces with Python.

Using a Viola-Jones Classifier

Training a Viola-Jones classifier from scratch can take a long time. Fortunately, a pre-trained Viola-Jones classifier comes out-of-the-box with OpenCV! You will use that one to see the algorithm in action.

First, find and download an image that you would like to scan for the presence of human faces. Here’s an example:

Example stock photo for face detection (Image source)

Import OpenCV and load the image into memory:

import cv2 as cv

# Read image from your local file system
original_image = cv.imread('path/to/your-image.jpg')

# Convert color image to grayscale for Viola-Jones
grayscale_image = cv.cvtColor(original_image, cv.COLOR_BGR2GRAY)

Next, you need to load the Viola-Jones classifier. If you installed OpenCV from source, it will be in the folder where you installed the OpenCV library.

Depending on the version, the exact path might vary, but the folder name will be haarcascades, and it will contain multiple files. The one you need is called haarcascade_frontalface_alt.xml.

If for some reason, your installation of OpenCV did not get the pre-trained classifier, you can obtain it from the OpenCV GitHub repo:

# Load the classifier and create a cascade object for face detection
face_cascade = cv.CascadeClassifier('path/to/haarcascade_frontalface_alt.xml')

The face_cascade object has a method detectMultiScale(), which receives an image as an argument and runs the classifier cascade over the image. The term MultiScale indicates that the algorithm looks at subregions of the image in multiple scales, to detect faces of varying sizes:

detected_faces = face_cascade.detectMultiScale(grayscale_image)

The variable detected_faces now contains all the detections for the target image. To visualize the detections, you need to iterate over all detections and draw rectangles over the detected faces.

OpenCV’s rectangle() draws rectangles over images, and it needs to know the pixel coordinates of the top-left and bottom-right corner. The coordinates indicate the row and column of pixels in the image.

Luckily, detections are saved as pixel coordinates. Each detection is defined by its top-left corner coordinates and width and height of the rectangle that encompasses the detected face.

Adding the width to the row and height to the column will give you the bottom-right corner of the image:

for (column, row, width, height) in detected_faces:
        (column, row),
        (column + width, row + height),
        (0, 255, 0),

rectangle() accepts the following arguments:

  • The original image
  • The coordinates of the top-left point of the detection
  • The coordinates of the bottom-right point of the detection
  • The color of the rectangle (a tuple that defines the amount of red, green, and blue (0-255))
  • The thickness of the rectangle lines

Finally, you need to display the image:

cv.imshow('Image', original_image)

imshow() displays the image. waitKey() waits for a keystroke. Otherwise, imshow() would display the image and immediately close the window. Passing 0 as the argument tells it to wait indefinitely. Finally, destroyAllWindows() closes the window when you press a key.

Here’s the result:

Original image with detections

That’s it! You now have a ready-to-use face detector in Python.

If you really want to train the classifier yourself, scikit-image offers a tutorial with the accompanying code on their website.


Good work! You are now able to find faces in images.

In this tutorial, you have learned how to represent regions in an image with Haar-like features. These features can be calculated very quickly using integral images.

You have learned how AdaBoost finds the best performing Haar-like features from thousands of available features and turns them into a series of weak classifiers.

Finally, you have learned how to create a cascade of weak classifiers that can quickly and reliably distinguish faces from non-faces.

These steps illustrate many important elements of computer vision:

  • Finding useful features
  • Combining them to solve complex problems
  • Balancing between performance and managing computational resources

These ideas apply to object detection in general and will help you solve many real-world challenges. Good luck!

Data Science Full Course - Data Science For Beginners

Data Science Full Course - Data Science For Beginners

This video on Data Science is a full course compilation that will help you gain all the concepts, techniques, and algorithms involved in data science. Python and R are the primary programming languages used for data science.

Data Science Full Course | Data Science For Beginners | Learn Data Science In 10 Hours

Here, you will understand the basics of data science, such as data munging, data mining, and data wrangling. You will come across how to implement linear regression (2:30:10), logistic regression (3:09:12), decision tree (3:27:04), random forest, support vector machines (5:17:21), time series analysis (7:33:05), and lots more.

You will get an idea about the salary, skills, jobs, and resume of a data scientist (9:00:04).

Finally, you will learn about the important data science interview questions (9:04:42) that would help you crack any data science interview. Now, let's get started and learn data science in detail.

Below topics are explained in this Data Science tutorial:

1. Data Science basics (01:28)

2. What is Data Science (05:51)

3. Need for Data Science (06:38)

4. Business intelligence vs Data Science (17:30)

5. Prerequisites for Data Science (22:31)

6. What does a Data Scientist do? (30:23)

7. Demand for Data Scientist (53:03)

8. Linear regression (2:30:10)

9. Decision trees (2:53:39)

10. Logistic regression in R (3:09:12)

11. What is a decision tree? (3:27:04)

12. What is clustering? (4:35:40)

13. Divisive clustering (4:51:14)

14. Support vector machine (5:17:21)

15. K-means clustering 96:44:13)

16. Time series analysis (7:33:05)

17. How to become a Data Scientist (8:26:54)

18. Job roles in Data Science (8:30:59)

19. Simplilearn certifications in Data Science (8:33:50)

20. Who is a Data Science engineer? (8:34:34)

21. Data Science engineer resume (9:00:04)

22. Data Science interview questions and answers (9:04:42)

Best Python Libraries For Data Science & Machine Learning

Best Python Libraries For Data Science & Machine Learning

Best Python Libraries For Data Science & Machine Learning | Data Science Python Libraries

This video will focus on the top Python libraries that you should know to master Data Science and Machine Learning. Here’s a list of topics that are covered in this session:

  • Introduction To Data Science And Machine Learning
  • Why Use Python For Data Science And Machine Learning?
  • Python Libraries for Data Science And Machine Learning
  • Python libraries for Statistics
  • Python libraries for Visualization
  • Python libraries for Machine Learning
  • Python libraries for Deep Learning
  • Python libraries for Natural Language Processing

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading about Python

Complete Python Bootcamp: Go from zero to hero in Python 3

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python and Django Full Stack Web Developer Bootcamp

Complete Python Masterclass

Python Tutorial - Python GUI Programming - Python GUI Examples (Tkinter Tutorial)

Computer Vision Using OpenCV

OpenCV Python Tutorial - Computer Vision With OpenCV In Python

Python Tutorial: Image processing with Python (Using OpenCV)

A guide to Face Detection in Python

Machine Learning Tutorial - Image Processing using Python, OpenCV, Keras and TensorFlow

PyTorch Tutorial for Beginners

The Pandas Library for Python

Introduction To Data Analytics With Pandas

Data Science Vs Machine Learning Vs Artificial Intelligence

Data Science Vs Machine Learning Vs Artificial Intelligence

Data Science vs Artificial Intelligence vs Machine Learning vs Deep Learning - Learn about each concept and relation between them for their ...

Data Science vs Artificial Intelligence vs Machine Learning vs Deep Learning - Learn about each concept and relation between them for their ...

What is Data Science?

Data Science is an interdisciplinary field whose primary objective is the extraction of meaningful knowledge and insights from data. These insights are extracted with the help of various mathematical and Machine Learning-based algorithms. Hence, Machine Learning is a key element of Data Science.

Alongside Machine Learning, as the name suggests, “data” itself is the fuel for Data Science. Without the availability of appropriate data, key insights cannot be extracted from it. Both the volume and accuracy of data matters in this field, since the algorithms are designed to “learn” with “experience”, which comes through the data provided. Data Science involves the use of various types of data, from multiple sources. Some of the types of data are image data, text data, video data, time-dependent data, time-independent data, audio data, etc.

Data Science requires knowledge of multiple disciplines. As shown in the figure, it is a combination of Mathematics and Statistics, Computer Science skills and Domain Specific Knowledge. Without a mastery of all these sub-domains, the grasp on Data Science will be incomplete.

What is Machine Learning?

Machine Learning is a subset or a part of Artificial Intelligence. It primarily involves the scientific study of algorithmic, mathematical, and statistical models which performs a specific task by analyzing data, without any explicit step-by-step instructions, by relying on patterns and inference, which is drawn from the data. This also contributes to its alias, Pattern Recognition.

Its objective is to recognize patterns in a given data and draw inferences, which allows it to perform a similar task on similar but unseen data. These two separate sets of data are known as the “Training Set” and “Testing Set” respectively.

Machine Learning primarily finds its applications in solving complex problems, which, a normal procedure oriented program cannot solve, or in places where there are too many variables that need to be explicitly programmed, which is not feasible.

As shown in the figure, Machine Learning is primarily of three types, namely: Supervised Learning, Unsupervised Learning and Reinforcement Learning.

  • Supervised Learning: This is the most commonly used form of machine learning and is widely used across the industry. In fact, most of the problems that are solved by Machine Learning belong to Supervised Learning. A learning problem is known as supervised learning when the data is in the form of feature-label pairs. In other words, the algorithm is trained on data where the ground truth is known. This is learning with a teacher. Two common types of supervised learning are:
    Classification: This is a process where the dataset is categorized into discrete values or categories. For example, if the input to the algorithm is an image of a dog or a cat, ideally, a well-trained algorithm should be able to predict whether the input image is that of a dog or of a cat.Regression: This is a process where the dataset has continuous valued target values. That is, the output of the function is not categories, but is a continuous value. For example, algorithms that forecast the future price of the stock market would output a continuous value (like 34.84, etc.) for a given set of inputs. * Unsupervised Learning: This is a much lesser used, but quite important learning technique. This technique is primarily used when there is unlabeled data or data without the target values mentioned. In such learning, the algorithm has to analyze the data itself and bring out insights based on certain common traits or features in the dataset. This is learning without a teacher. Two common types of unsupervised learning are:
    Clustering: Clustering is a well known unsupervised learning technique where similar data are automatically grouped together by the algorithm based on common features or traits (eg. color, values, similarity, difference, etc.).Dimensionality Reduction: Yet another popular unsupervised learning is dimensionality reduction. The dataset that is used for machine learning is often huge and of high dimensions (higher than three dimensions). One major problem in working with high dimensional data is data-visualization. Since we can visualize and understand up-to 3 dimensions, higher dimensional data is often difficult for human beings to interpret. In addition to this, higher dimension means more features, which in turn means a more complex model, which is often a curse for any machine learning model. The aim is to keep the simplest model that works best on a wide range of unseen data. Hence, dimensionality reduction is an important part of working with high dimensional data. One of the most common methods of dimensionality reduction is Principal Component Analysis (PCA).* Reinforcement Learning: This is a completely different approach to “learning” when compared to the previous two categories. This particular class of learning algorithms primarily finds its applications in Game AI, Robotics and Automatic Trading Bots. Here, the machine is not provided with a huge amount of data. Instead, in a given scenario (playground) some parameters and constrictions are defined and the algorithm is let loose. The only feedback given to the algorithm is that, if it wins or performs a correct task, it is rewarded. If it loses or performs an incorrect task, it is penalized. Based on this minimal feedback, over time the algorithm learns to how to do the correct task on its own.
What is Artificial Intelligence?

Artificial Intelligence is a vast field made up of multidisciplinary subjects, which aims to artificially create “intelligence” to machines, similar to that displayed by humans and animals. The term is used to describe machines that mimic cognitive functions such as learning and problem-solving.

Artificial Intelligence can be broadly classified into three parts: Analytical AI, Human-Inspired AI, and Humanized AI.

  1. Analytical AI: It only has characteristics which are consistent with Cognitive Intelligence. It generates a cognitive representation of the world around it based on past experiences, which inspires future decisions.
  2. Human-Inspired AI: In addition to having Cognitive Intelligence, this class of AI also has Emotional Intelligence. It has a deeper understanding of human emotions in addition to Cognitive Intelligence and thus has a better understanding of the world around it. Both Cognitive Intelligence and Emotional Intelligence contributes to the decision making of Human-Inspired AI.
  3. Humanized AI: This is the most superior form of AI among the three. This form of AI incorporates Cognitive Intelligence, Emotional Intelligence, and Social Intelligence into its decision making. With a broader understanding of the world around it, this form of AI is able to make self-conscious and self-aware decisions and interactions with the external world.
How are they interrelated?

From the above introductions, it may seem that these fields are not related to each other. However, that is not the case. Each of these three fields is quite closely related to each other than it may seem.

If we look at Venn Diagrams, Artificial Intelligence, Machine Learning and Data Science are overlapping sets, with Machine Learning being a subset or a part of Artificial Intelligence, and Data Science having a significant chunk of it under Artificial Intelligence and Machine Learning.

Artificial Intelligence is a much broader field and it incorporates most of the other intelligence-related fields of study. Machine Learning, being a part of AI, deals with the algorithmic learning and inference based on data, and finally, Data Science is primarily based on statistics, probability theory, and has significant contribution of Machine Learning to it; of course, AI also being a part of it, since Machine Learning is indeed a subset of Artificial Intelligence.

Similarities: All of the three fields have one thing in common, Machine Learning. Each of these is heavily dependent on Machine Learning Algorithms.

In Data Science, the statistical algorithms that are used are limited to certain applications. In most cases, Data Scientists rely on Machine Learning techniques to extract inferences from data.

The current technological advancement in Artificial Intelligence is heavily based on Machine Learning. The part of AI without Machine Learning is like a car without an engine. However, without the “learning” part, Artificial Intelligence is basically Expert Systems, Search and Optimization algorithms.

Difference between the three

Even though they are significantly similar to each other, there are still a few key differences that are to be noted.


Since all the three domains are interrelated, they have some common applications and some unique to each of them. Most applications involve the use of Machine Learning in some form or the other. Even then, there are certain applications of each domain, which are unique. A few of them are listed below:

  • Data Science: The applications in this domain are dependent on machine learning and mathematical algorithms, such as statistics and probability based algorithms.
    Time Series Forecasting: This is a very important application of data science and is used across the industry, primarily in the banking sector and the stock market sector. Even though there are Machine Learning based algorithms for this specific application, Data Scientists usually prefer the statistical approach.Recommendation Engines: This is a statistics-based approach towards recommending products or services to the user, based on data of his/her previous interests. Similar to the previous application, Machine Learning based algorithms to achieve similar or better results is also present.* Machine Learning: The applications of this domain is nearly limitless. Every industry has some problem that can partially or fully be solved by Machine Learning techniques. Even Data Science and Artificial Intelligence roles make use of Machine Learning to solve a huge set of problems.
    Computer Vision: This is another sub-field which falls under Machine Learning and deals with visual information. This field itself finds its applications in many industries, for example, Autonomous Driving Vehicles, Medical Imaging, Autonomous Surveillance Systems, etc.Natural Language Processing: Similar to the previous example, this field is also self-contained sub-field of research. Natural Language Processing (NLP) or Natural Language Understanding (NLU) primarily deals with the interpretation and understanding of the meaning behind spoken or written text/language. Understanding the exact meaning of a sentence is quite difficult (even for human beings). Teaching a machine to understand the meaning behind a text is even more challenging. Few of the major applications of this sub-field are the development of intelligent chatbots, artificial voice assistants (Google Assistant, Siri, Alexa, etc.), spam detection, hate speech detection and so on.* Artificial Intelligence: Most of the current advancements and applications in this domain is based on a sub-field of Machine Learning, known as Deep Learning. Deep Learning deals with artificially emulating the structure and function of the biological neuron. However, since few of the applications of Deep Learning have already been discussed under Machine Learning, let us look at applications of Artificial Intelligence that is not primarily dependent on Machine Learning.
    Game AI: Game AI is an interesting application of Artificial Intelligence, where the machine automatically learns to play complex games to the level where it can challenge and even win against a human being. Google’s DeepMind had developed a Game AI called AlphaGo, which outperformed and beat the human world champion in 2017. Similarly, video game AI’s have been developed to play Dota 2, flappy bird and Mario. These models are developed using several algorithms like Search and Optimization, Generative Models, Reinforcement Learning, etc.Search: Artificial Intelligence has found several applications in Search Engines, for example, Google and Bing Search. The method of displaying results and the order in which results are displayed are based on algorithms developed in the field of Artificial Intelligence. These applications do contain Machine Learning techniques, but their older versions were developed by algorithms like Google’s proprietary PageRank Algorithm, which were not based on “Learning”.Robotics: One of the major applications of Artificial Intelligence is in the field of robotics. Teaching robots to walk/run automatically (for example, Spot and Atlas) using Reinforcement Learning has been one of the biggest goals of companies like Boston Dynamics. In addition to that, humanoid robots like Sophia are a perfect example of AI being applied for Humanized AI.## Skill-set Required

Since the fields are interrelated by a significant degree, the skill-set required to master each of these fields is nearly the same and overlapping. However, there are a few skill-sets that are uniquely associated with each of them. The same has been discussed further.

  • Mathematics: Each of these fields is math heavy, which means mathematics are the basic building blocks of these fields and in order to fully understand the algorithms and master them, a great math background is necessary. However, all the fields of math are not necessary for all of these. The specific fields of math that are required are discussed below:
    Linear Algebra: Since all of these fields are based on data, which comes in huge volumes of rows and columns, matrices are the easiest and most convenient method of representing and manipulating such data. Hence, a thorough knowledge of Linear Algebra and Matrix operations is necessary for all three fields.Calculus: Deep Learning, the sub-field of Machine Learning is heavily dependent on calculus. To be more precise, multivariate derivatives. In neural networks, backpropagation algorithms require multiple derivative calculations, which demands a thorough knowledge of calculus.Statistics: Since these fields deal with a huge amount of data, the knowledge of statistics is imperative. Statistical methods to deal with the selection and testing of smaller sample size with diversity is the common application for all three fields. However, statistics finds its main application in Data Science, where most of the algorithms are purely based on statistics (eg. ARIMA algorithm used for Time Series Analysis).Probability: Similar to the reason behind statistics, probability and the conditional probability of a certain event is the basic building block of important Machine Learning algorithms like Naive Bayes Classifier. Probability theory is also very important in understanding Data Science Algorithms.* Computer Science: There is no doubt about either of these fields being a part of the Computer Science field. Hence, a thorough knowledge of computer science algorithms is quite necessary.
    Search and Optimization Algorithms: Fundamental Search Algorithms like Breadth-First Search (BFS), Depth-First Search (DFS), Bidirectional Search, Route Optimization Algorithms, etc. are quite important. These search and optimization algorithms find their use in the Artificial Intelligence field.Fuzzy Logic: Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. It imitates the way human beings make decisions. For example, making a YES or NO decision based on a certain set of events or environmental conditions. Fuzzy Logic is primarily used in Artificially Intelligent Systems.Basic Algorithms and Optimization: Even though this is not a necessity, but it is a good-to-have knowledge since fundamental knowledge on algorithms (searching, sorting, recursion, etc.) and optimization (space and time complexity) is necessary for any computer science related fields.* Programming Knowledge: The implementation of any of the algorithms in these fields is through programming. Hence a thorough knowledge of programming is a necessity. Some of the most commonly used programming languages are discussed further.
    Python: One of the most commonly used programming languages for either of these fields is Python. It is used across the industry and has support for a plethora of open source libraries for Machine Learning, Deep Learning, Artificial Intelligence, and Data Science. However, programming is not just about writing code, it is about writing proper Pythonic code. This has been discussed in detail in this article: A Guide to Best Python Practices.R: This is the second most used programming language for such applications across the industry. R excels in statistical libraries and data visualization when compared to python. However, lacks significantly when it comes to Deep Learning libraries. Hence, R is a preferred tool for Data Scientists.## Job Market

The Job Market for each of these fields is in very high demand. As a direct quote from Andrew Ng says, “AI is the new Electricity”. This is quite true as the extended field of Artificial Intelligence is at the verge of revolutionizing every industry in ways that could not be anticipated earlier.

Hence, the demand for jobs in the field of Data Science and Machine Learning is quite high. There are more job openings worldwide than the number of qualified Engineers who are eligible to fill that position. Hence, due to supply-demand constraints, the amount of compensation offered by companies for such roles exceeds any other domain.

The job scenario for each of the different domains are discussed further:

  1. Data Science: The number of job posting with the profile of Data Science is highest, among the three discussed domains. Data Scientists are handsomely paid for their work. Due to the blurred lines in terms of the difference between the fields, the job description of a Data Scientist ranges from Time Series Forecasting to Computer Vision. It basically covers the entire domain. For further insights on the job aspect of Data Science, the article on What is Data Science can be referred to.
  2. Machine Learning: Even though the number of jobs postings having the job profile as “Machine Learning Engineer” is much lesser when compared to that of a Data Scientist, it is still a significant field to consider when it comes to availability of jobs. Moreover, someone who is skilled in Machine Learning is a good candidate to consider for a Data Science role. However, unlike Data Science, Machine Learning job descriptions primarily deal with the requirements of “Learning” algorithms (including Deep Learning), and the industry ranges from Natural Language Processing to developing Recommendation Engines.
  3. Artificial Intelligence: Coming across job postings with profiles of “Artificial Intelligence Developer” developer is quite rare. Instead of “Artificial Intelligence”, most companies write “Data Scientists” or “Machine/Deep Learning Engineers” in the job profile. However, Artificial Intelligence Developers, in addition to getting jobs in the Machine Learning domain, mostly find jobs in Robotics and AI R&D oriented companies like Boston Dynamics, DeepMind, OpenAI, etc.


Data Science, Machine Learning and Artificial Intelligence are like the different branches of the same tree. They are highly overlapping and there is no clear boundary amongst them. They have common skill set requirements and common applications as well. They are just different names given to slightly different versions of AI.

Finally, it is worth mentioning that since there is high overlap in required skill-set, an optimally skilled Engineer is eligible to work in either of the three domains and switch domains without any major changes.