Today we are going to understand how active learning can be used in data labeling.

Machine learning algorithms require -generally lots of- enough amount of data to be trained. In this stage obviously humans can label data by their hands. But what will be happened if there is no enough money to use AMT like services?

If you’re suffering from this situation, yes there is one more salvation way to label your data. And your hero’s name is Active Learning!

By the way this post is my first tutorial on Medium so i’m not going talk to much :)

So i’m going to give you naive active learning labeling strategy to implement yourself using Python, Scikit-learn on FashionMnist dataset.

Here are the steps;

1- Label only small part of your data — lets call it “df_labeled”

2- Train a classifier (Linear SVM will be used in here) with these data

3- Using your trained classifier -which comes from in step 2- predict the class probabilities for your unlabeled data — lets call it “df_unlabeled”

4- Foreach sample if predicted class probability is above from your pre-defined threshold, -yes, its a hyperparam :(- move that sample from “df_unlabeled” to “df_labeled”

5- Repeat 2–4 step until some sort of stopping criteria

Of course, there are many different starategies can be existed. For example, after 4.th step you can define one more threshold for lowest boundary and if predicted class probability is below from that threshold, this sample can be labeled manually and then will be moved to “df_labeled”.

Yes, i hope we got the main concept for active labeling. And the time comes to the coding section.

#machine-learning #data-science #data-labeling #active-learning #python

Active Learning for Labeling in Python
1.45 GEEK