Neural Network Tutorials - Herong's Tutorial Examples - 1.20, by Dr. Herong Yang
What Is The MNIST Database
This section provides a quick introduction to the MNIST database, which contains a large number of images of handwritten digits preprocessed and ready to be used to test machine learning models.
What Is the MNIST database? The MNIST database is the “hello world” database for machine learning. It contains a large number of images of handwritten digits preprocessed and ready to be used to test machine learning models.
The database name MNIST for Modified National Institute of Standards and Technology, because those handwritten digit images are extracted from the a larger database from the NIST (National Institute of Standards and Technology) of the US government.
A copy of the MNIST database in the original format is maintained by Yann LeCun and available at http://yann.lecun.com/exdb/mnist/. It is divided into 4 files:
1. train-images-idx3-ubyte.gz (9912422 bytes) - Images of the training set with 60,000 examples of handwritten digits (from 0 to 9) normalized and centered in 28x28 pixels.
2. train-labels-idx1-ubyte.gz (28881 bytes) - Labels of the training set. Each label gives correct numeric value of the handwritten digit presented in the corresponding image in the training set. This label file is useful to train supervised machine learning models.
3. t10k-images-idx3-ubyte.gz (1648877 bytes) - Images of the test set with 10,000 examples of handwritten digits (from 0 to 9) normalized and centered in 28x28 pixels in the same way as the training set.
4. t10k-labels-idx1-ubyte.gz (4542 bytes) - Labels of the test set. Each label gives correct numeric value of the handwritten digit presented in the corresponding image in the test set. This label file is used to validate the accuracy of a trained machine learning model.
The MNIST database from Yann LeCun's Website is encoded in a binary format, which can be decoded according to the given format specification. But if you want to skip the decoding process, you can download the CSV (Comma Separated Values) version maintained by Joseph Redmon from https://pjreddie.com/projects/mnist-in-csv.
The CSV version of the MNIST database is organized into 2 files. You should download and save them in a sub-directory called "mnist_dataset", so that you can use them to train and test your neural network models:
Just to give you an idea of what a handwritten image from the MNIST database looks like, here is an example of digit 0 taking from Tariq's GitHub Website at https://github.com/makeyourownneuralnetwork. The resolution of the original image is only 28x28 pixels. It is enlarged by about 10 times in the picture below.
Table of Contents