Stuck behind the paywall? Click here to read the full story with my friend link!

We all can agree on one point and that is ‘Numbers are everywhere’.Whether you’re at your office, in your kitchen, at the local supermarket, etc. We are, at all times, surrounded by numbers. Your laptop has an HDD storage, the vegetable you’re buying has a numeric price, you have height, weather temperature is measured in numeric Celsius (It’s 52 at my place 😅).


Similarly, newbies in the Machine Learning space are always presented with the MNIST dataset. MNIST is like the first milk to a toddler for ML newbies. What is it?

The MNIST database contains 60,000 training images and 10,000 testing images. Half of the training set and half of the test set were taken from NIST’s training dataset, while the other half of the training set and the other half of the test set were taken from NIST’s testing dataset. The original creators of the database keep a list of some of the methods tested on it. In their original paper, they use a support-vector machine to get an error rate of 0.8%. An extended dataset similar to MNIST called EMNIST has been published in 2017, which contains 240,000 training images, and 40,000 testing images of handwritten digits and characters. [1]

This is a Stanford collected Dataset and is available for the public to experiment and to learn.

SVHN_ is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST(e.g., the images are of small cropped digits), but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images. [2]_

The images are, in no way, preprocessed or ready to be used yet. Hence, whoever wants to use it has to do a bit of the work!

The Challenge

Build an algorithm to classify the different house numbers from the Dataset.

The Problem

The dataset that is available on the website is in the **_.mat _**format. And in case you don’t know, Python Notebooks and all the algorithms can’t process with these kind of files. Hence why, it’s a compulsion to convert the data to an acceptable data format before getting into the cool stuff.

#machine-learning #svm #deep-learning #data-science #cnn #deep learning

Deep Learning for House Number Detection
2.65 GEEK