What is classification?

Classification is one of two types of supervised machine learning tasks (i.e. tasks where we have a labeled dataset) with the other being regression.

Key point to remember: supervised learning tasks use features to predict targets, or, in non-tech speak, they use attributes/characteristics to predict something. For instance, we can take a basketball player’s height, weight, age, foot-speed, and/or multiple other aspects to predict how many points they’ll score or whether they will be an all-star.

So what’s the difference between the two?

  • Regression tasks predict a continuous value (i.e., how many points someone will score)
  • Classification tasks predict a non-continuous value (i.e. if someone will be an all-star)

How do I know which technique to use?

Answer the following question:

“Does my target variable have an order to it?”

For example, my project predicting the recommended age of a reader was a regression task because I was predicting a precise age (e.g., 4 years old). If I was attempting to identify whether a book was suitable for teens or not, then it would have been a classification task since the answer would have been either yes or no.

OK, so classification is only for yes/no, true/false, cat/dog problems, right?

Nope, those are just the easy examples 😄

Example 1: Sorting People into Groups

Imagine a scenario where you get a new batch of students every year and have to sort them into houses based on their personality traits.

In this situation, the houses do not have any type of sequence/ranking to them. Sure, Harry definitely didn’t want to be housed in Slytherin, and the Sorting Hat clearly took that into consideration, but that doesn’t mean Slytherin is closer to Gryffindor in the same way that 25 is closer to 30 than it is to 19.

Example 2: Applying Labels

Similarly, if we had a data set containing the ingredients of dishes and attempted to predict the country of origin, we’d be solving a classification problem. Why? Because country names have no numerical order. We can say that Russia is the largest country on earth or that China has the most people but those are attributes of the country (i.e., land size and population) which are not intrinsic to the name of the country.

#supervised-learning #machine-learning #classification #c

C is for Classification
1.20 GEEK