**Takeaways from this article **

  • In this post, we understand the concept of classification, regression, classification predictive modelling, and the different types of classification and regression.
  • We understand why and how classification is important.
  • We also see a few classification algorithms and their implementations in Python.
  • We understand logistic regression, decision trees, random forests, support vector machines, k nearest neighbour and neural networks.
  • We understand their inner workings and their prominence.

Introduction

Classification refers to the process of classifying the given data set into different classes or groups. The classification algorithm is placed under predictive modelling problem, wherein every class of the dataset is given a label, to indicate that it is different from other classes. Some examples include email classification as spam or not, recognition of a handwritten character as a specific character only, and not another character and so on.

Classification algorithms need data to be trained with many inputs and their respective output, with the help of which the model learns. It is important to understand that the training data must encompass all kinds of data (options) which could be encountered in the test data set or real world.

Classification

The 4 different prominent types of classification include the following:

  • Binary classification
  • Multi-class classification
  • Multi-label classification
  • Imbalanced classification

** Binary classification**

As the name suggests, it deals with the tasks in classification that only have two class labels. Some examples include: email classification as spam or not, whether the price of a stock will go up or go down (ignoring the fact that it could also remain as is), and so on. The value obtained after classifying the data would be either 0 or 1, yes or no, normal or abnormal.

The Bernoulli probability distribution is used as prediction to classify the data as 0 or 1. Bernoulli distribution is a discrete (discontinuous) distribution that gives a binary outcome – a 0 or a 1.

Algorithms that are used to perform binary classification include the following:

  • Logistic regression
  • Decision trees
  • Support vector machine
  • Naïve Bayes
  • ‘k’nn (k nearest neighbors)

Code to demonstrate a binary classification task:

from numpy import where 
from collections import Counter 
from sklearn.datasets import make_blobs 
from matplotlib import pyplot 
X, y = make_blobs(n_samples=560, centers=2, random_state=1) 
print("Data has been generated ") 
print("The number of rows and columns are ") 
print(X.shape, y.shape) 
my_counter = Counter(y) 
print(my_counter) 
for i in range(10): 
print(X[i], y[i]) 
for my_label, _ in my_counter.items(): 
row_ix = where(y == my_label)[0] 
pyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(my_label)) 
pyplot.legend() 
pyplot.show()

**Output: **

Data has been generated  
The number of rows and columns are  
(560, 2) (560,) 
Counter({1: 280, 0: 280}) 
[-9.64384208 -4.14030356] 1 
[-0.8821407  4.2877187] 0 
… 

Types of classification in Machine Learning

#machine learning

What are the types of classification in Machine Learning?
1.25 GEEK