**Takeaways from this article **
Classification refers to the process of classifying the given data set into different classes or groups. The classification algorithm is placed under predictive modelling problem, wherein every class of the dataset is given a label, to indicate that it is different from other classes. Some examples include email classification as spam or not, recognition of a handwritten character as a specific character only, and not another character and so on.
Classification algorithms need data to be trained with many inputs and their respective output, with the help of which the model learns. It is important to understand that the training data must encompass all kinds of data (options) which could be encountered in the test data set or real world.
The 4 different prominent types of classification include the following:
As the name suggests, it deals with the tasks in classification that only have two class labels. Some examples include: email classification as spam or not, whether the price of a stock will go up or go down (ignoring the fact that it could also remain as is), and so on. The value obtained after classifying the data would be either 0 or 1, yes or no, normal or abnormal.
The Bernoulli probability distribution is used as prediction to classify the data as 0 or 1. Bernoulli distribution is a discrete (discontinuous) distribution that gives a binary outcome – a 0 or a 1.
Algorithms that are used to perform binary classification include the following:
Code to demonstrate a binary classification task:
from numpy import where
from collections import Counter
from sklearn.datasets import make_blobs
from matplotlib import pyplot
X, y = make_blobs(n_samples=560, centers=2, random_state=1)
print("Data has been generated ")
print("The number of rows and columns are ")
print(X.shape, y.shape)
my_counter = Counter(y)
print(my_counter)
for i in range(10):
print(X[i], y[i])
for my_label, _ in my_counter.items():
row_ix = where(y == my_label)[0]
pyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(my_label))
pyplot.legend()
pyplot.show()
**Output: **
Data has been generated
The number of rows and columns are
(560, 2) (560,)
Counter({1: 280, 0: 280})
[-9.64384208 -4.14030356] 1
[-0.8821407 4.2877187] 0
…
#machine learning