Welcome to this tutorial on the MNIST dataset. In this tutorial, we will learn what is the MNIST dataset, how to import it in Python, and how to plot it using matplotlib.

What is the MNIST dataset?

MNIST set is a large collection of **handwritten digits. **It is a very popular dataset in the field of image processing. It is often used for benchmarking machine learning algorithms.

_MNIST _is short for**Modified National Institute of Standards and Technology database.**

MNIST contains a collection of 70,000, 28 x 28 images of handwritten digits from 0 to 9.

The dataset is already divided into training and testing sets. We will see this later in the tutorial.

For more information on MNIST, refer to its Wikipedia page. We are going to import the dataset from Keras.

Let’s start with loading the dataset into our python notebook.

Loading MNIST from Keras

We will first have to import the MNIST dataset from the Keras module.

We can do that using the following line of code:

from keras.datasets import mnist

Now we will load the training and testing sets into separate variables.

(train_X, train_y), (test_X, test_y) = mnist.load_data()

Let’s find out how many images are there in the training and testing sets. In other words, let’s try and find out the split ratio of the this dataset.

To learn more about split ratio, refer to this tutorial on how to slit data into training and testing sets.

To find the split ratio we are going to print the shapes of all the vectors.

print``(``'X_train: ' + str``(train_X.shape))

print``(``'Y_train: ' + str``(train_y.shape))

print``(``'X_test:  '  + str``(test_X.shape))

print``(``'Y_test:  '  + str``(test_y.shape))


X_train: (``60000``, 28``, 28``)

Y_train: (``60000``,)

X_test:  (``10000``, 28``, 28``)

Y_test:  (``10000``,)

We can see that there are 60k images in the training set and 10k images in the testing set.

The dimension of our training vector is (60000, 28, 28), this is because there are** 60,000 grayscale images **with the dimension 28X28.

