How I Identify Handwritten Digits Using Only Python
Training a neural network to correctly identify digits from the MNIST dataset
What I'm Building
In this post I'll show you how I built a neural network which takes an array of numbers representing a handwritten digit and output a prediction of what digit it is.
The handwritten digits are from the famous MNIST dataset. The Modified National Institute of Standards and Technology (MNIST) dataset is a collection of 60,000 small, square 28×28 pixel grayscale images of handwritten single digits between 0 and 9.
The task is to classify a given image into one of the 10 digits.
I’m doing it all in Python.
Let's get started.
The Code
import itertools
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
images = x_train[0:1000]
labels = y_train[0:1000]
def flatten_image(image):
return list(itertools.chain.from_iterable(image))
def weighted_sum(a, b):
assert(len(a) == len(b))
output = 0
for i in range(len(a)):
output += (a[i] * b[i])
return output
def vector_matrix_multiplication(a, b):
output = [0 for i in range(10)]
for i in range(len(output)):
assert(len(a) == len(b[i]))
output[i] = weighted_sum(a, b[i])
return output
def zeros_matrix(rows, cols):
output = []
for r in range(rows):
output.append([0 for col in range(cols)])
return output
def outer_product(a, b):
output = zeros_matrix(len(a), len(b))
for i in range(len(a)):
for j in range(len(b)):
output[i][j] = a[i] * b[j]
return output
class NeuralNet:
def __init__(self):
self.weights = [
[0.0000 for i in range(784)],
[0.0001 for i in range(784)],
[0.0002 for i in range(784)],
[0.0003 for i in range(784)],
[0.0004 for i in range(784)],
[0.0005 for i in range(784)],
[0.0006 for i in range(784)],
[0.0007 for i in range(784)],
[0.0008 for i in range(784)],
[0.0009 for i in range(784)]
]
self.alpha = 0.0000001
def predict(self, input):
return vector_matrix_multiplication(input, self.weights)
def train(self, input, labels, epochs):
for i in range(epochs):
for j in range(len(input)):
pred = self.predict(input[j])
label = labels[j]
goal = [0 for k in range(10)]
goal[label] = 1
error = [0 for k in range(10)]
delta = [0 for k in range(10)]
for a in range(len(goal)):
delta[a] = pred[a] - goal[a]
error[a] = delta[a] ** 2
weight_deltas = outer_product(delta, input[j])
for x in range(len(self.weights)):
for y in range(len(self.weights[0])):
self.weights[x][y] -= (self.alpha * weight_deltas[x][y])
# Train on first image
first_image = images[0]
first_label = labels[0]
input = [flatten_image(first_image)]
label = [first_label]
nn = NeuralNet()
nn.train(input, label, 5)
prediction = nn.predict(input[0])
print(prediction)
print("The label is: " + str(label[0]) + ". The prediction is: " + str(prediction.index(max(prediction))))
# Train on full dataset
prepared_images = [flatten_image(image) for image in images]
mm = NeuralNet()
mm.train(prepared_images, labels, 45)
# Test 1 prediction
prediction = mm.predict(prepared_images[3])
print("That image is the number " + str(prediction.index(max(prediction))))
# Calculate accuracy
test_set = x_test[0:100]
test_labels = y_test[0:100]
num_correct = 0
for i in range(len(test_set)):
prediction = mm.predict(flatten_image(test_set[i]))
correct = test_labels[i]
if prediction.index(max(prediction)) == int(correct):
num_correct += 1
print(str(num_correct/len(test_set) * 100) + "%")
Get Dataset
The keras
library helpfully includes the dataset so I can import it from the library.
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
images = x_train[0:1000]
labels = y_train[0:1000]
When I call load_data()
, I get back two tuples: a training set and a test set. To successfully finishing training on my personal laptop, I had to limit the data to the first 1000 elements. When I tried training on the full data set, it was hadn't finished after a full 24 hours and I had to kill the process to use my laptop :D.
With only 1000 images, the best accuracy I achieved was about 75%. Maybe you can tweak the numbers and get something better!
Getting back to the data, if I take a look at one of the images in the training set, I see that it is an array of arrays - a matrix. The numbers range from 0 to 255 - each representing the greyscale value of the pixel at a particular position in the image.
images[0]
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 18, 18, 18, 126, 136, 175, 26, 166, 255, 247, 127, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 30, 36, 94, 154, 170, 253, 253, 253, 253, 253, 225, 172, 253, 242, 195, 64, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 49, 238, 253, 253, 253, 253, 253, 253, 253, 253, 251, 93, 82, 82, 56, 39, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 18, 219, 253, 253, 253, 253, 253, 198, 182, 247, 241, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 80, 156, 107, 253, 253, 205, 11, 0, 43, 154, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 1, 154, 253, 90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 139, 253, 190, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 190, 253, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 241, 225, 160, 108, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 240, 253, 253, 119, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 186, 253, 253, 150, 27, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 93, 252, 253, 187, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 249, 253, 249, 64, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 46, 130, 183, 253, 253, 207, 2, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 148, 229, 253, 253, 253, 250, 182, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 114, 221, 253, 253, 253, 253, 201, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 23, 66, 213, 253, 253, 253, 253, 198, 81, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 18, 171, 219, 253, 253, 253, 253, 195, 80, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 55, 172, 226, 253, 253, 253, 253, 244, 133, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 136, 253, 253, 253, 212, 135, 132, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)
If I look at the first label, I see the number five. This means that the collection of numbers in images[0] represents is the number 5.
labels[0]
5
Prepare Data
The matrix math that I implement does not know how to handle an array of arrays so, the first thing I do is prepare the data by flattening the image into a single array.
import itertools
def flatten_image(image):
return list(itertools.chain.from_iterable(image))
What I'm doing in this function is using the itertools
library to flatten the array. Specifically, I'm using the .chain.from_iterable()
method to give me one element at a time. Then I use the list()
function to create a flat list to return.
When I print the first image, I see that all the numbers are in one flat array.
print(flatten_image(images[0]))
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 18, 18, 18, 126, 136, 175, 26, 166, 255, 247, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 36, 94, 154, 170, 253, 253, 253, 253, 253, 225, 172, 253, 242, 195, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 49, 238, 253, 253, 253, 253, 253, 253, 253, 253, 251, 93, 82, 82, 56, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 219, 253, 253, 253, 253, 253, 198, 182, 247, 241, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 80, 156, 107, 253, 253, 205, 11, 0, 43, 154, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 1, 154, 253, 90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 139, 253, 190, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 190, 253, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 241, 225, 160, 108, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 240, 253, 253, 119, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 186, 253, 253, 150, 27, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 93, 252, 253, 187, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 249, 253, 249, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 46, 130, 183, 253, 253, 207, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 148, 229, 253, 253, 253, 250, 182, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 114, 221, 253, 253, 253, 253, 201, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 66, 213, 253, 253, 253, 253, 198, 81, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 171, 219, 253, 253, 253, 253, 195, 80, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 172, 226, 253, 253, 253, 253, 244, 133, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 136, 253, 253, 253, 212, 135, 132, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Matrix Math Helper Functions
Now that I've prepared the data, I can move on to the next step - implement matrix math.
Since I'm working with arrays, I need math functions which understand arrays. You may remember from the previous post that a neural network makes predictions by multiplying the input by the weights. So one thing I need to do now is figure out how to do matrix multiplication.
In order to do matrix multipliation, I need a method to calculate weighted sums.
def weighted_sum(a, b):
assert(len(a) == len(b))
output = 0
for i in range(len(a)):
output += (a[i] * b[i])
return output
The weighted sum function takes two arrays of the same length. It multiplies each number in the same index and adds the result to a running sum. So the weighted sum takes two arrays and gives you back a single number.
The best way to think about what this single number represents is as a score of similarity between two arrays. The higher the weighted sum, the more similar arrays a
and b
are to each other. Roughly speaking, the neural network will give higher scores to inputs that are more similar to its weights.
def vector_matrix_multiplication(a, b):
output = [0 for i in range(10)]
for i in range(len(output)):
assert(len(a) == len(b[i]))
output[i] = weighted_sum(a, b[i])
return output
Next, I have the matrix multiplication method. This calculates the weighted sum between weight and input for each position in the array. When it's done, I get an array of weighted sums.
In my case, the returned output of 10 elements contain the probability of which digit the input represents. Whichever index has the highest number is the prediction for what digit is in the image.
I need two other matrix math helpers. These functions will be used to adjust the weights in the right direction.
First, I have a zeros matrix method which creates a matrix filled with zeros.
def zeros_matrix(rows, cols):
output = []
for r in range(rows):
output.append([0 for col in range(cols)])
return output
This is used to implement a function to calculate the outer product of two matrices.
The outer product does an elementwise multiplication between two matricies. This will be used to tell the neural network how to change its weights.
def outer_product(a, b):
output = zeros_matrix(len(a), len(b))
for i in range(len(a)):
for j in range(len(b)):
output[i][j] = a[i] * b[j]
return output
Okay. That's a lot of math. Let's find out how these functions are being used in the neural network.
Neural Network
class NeuralNet:
def __init__(self):
self.weights = [
[0.0000 for i in range(784)],
[0.0001 for i in range(784)],
[0.0002 for i in range(784)],
[0.0003 for i in range(784)],
[0.0004 for i in range(784)],
[0.0005 for i in range(784)],
[0.0006 for i in range(784)],
[0.0007 for i in range(784)],
[0.0008 for i in range(784)],
[0.0009 for i in range(784)]
]
self.alpha = 0.0000001
def predict(self, input):
return vector_matrix_multiplication(input, self.weights)
def train(self, input, labels, epochs):
for i in range(epochs):
for j in range(len(input)):
pred = self.predict(input[j])
label = labels[j]
goal = [0 for k in range(10)]
goal[label] = 1
error = [0 for k in range(10)]
delta = [0 for k in range(10)]
for a in range(len(goal)):
delta[a] = pred[a] - goal[a]
error[a] = delta[a] ** 2
weight_deltas = outer_product(delta, input[j])
for x in range(len(self.weights)):
for y in range(len(self.weights[0])):
self.weights[x][y] -= (self.alpha * weight_deltas[x][y])
This neural network is similar to the one from the previous post. The only real difference is that we're using an array of numbers instead of a single number.
In the initializer, I have the weights and the alpha. I've initialized each weight array to have 784
elements of an initial number. 784
is the number of pixels in the image.
def __init__(self):
self.weights = [
[0.0000 for i in range(784)],
[0.0001 for i in range(784)],
[0.0002 for i in range(784)],
[0.0003 for i in range(784)],
[0.0004 for i in range(784)],
[0.0005 for i in range(784)],
[0.0006 for i in range(784)],
[0.0007 for i in range(784)],
[0.0008 for i in range(784)],
[0.0009 for i in range(784)]
]
self.alpha = 0.0000001
The prediction function is again multplying the input by the weights.
def predict(self, input):
return vector_matrix_multiplication(input, self.weights)
The training function iterates through the dataset an epoch
number of times.
for i in range(epochs):
for j in range(len(input)):
For each image, it makes a prediction
pred = self.predict(input[j])
Next we transform the label into a format that the neural network expects.
label = labels[j]
goal = [0 for k in range(10)]
goal[label] = 1
I create an array of ten 0s and then set the index of the goal prediction to 1. So all the wrong answers are 0 and the right answer is 1.
Next, I calculate the error and the delta.
error = [0 for k in range(10)]
delta = [0 for k in range(10)]
for a in range(len(goal)):
delta[a] = pred[a] - goal[a]
error[a] = delta[a] ** 2
I then calculate the weight deltas by using an outer product between delta and the input.
weight_deltas = outer_product(delta, input[j])
Finally I update all the weights using the weight deltas.
for x in range(len(self.weights)):
for y in range(len(self.weights[0])):
self.weights[x][y] -= (self.alpha * weight_deltas[x][y])
The main takeaway here is that this is exactly like the neural network with one digit. The only difference is that the math is done on arrays instead of on single numbers.
Training The Network On The First Data Point
Let's put this new network into action. To test it out, I take take the first image and the first label. I create a neural network and train it on that first image and label for five epochs. When I predict the digit on that same image, I see the output array is an array of 10 numbers.
first_image = images[0]
first_label = labels[0]
input = [flatten_image(first_image)]
label = [first_label]
nn = NeuralNet()
nn.train(input, label, 5)
prediction = nn.predict(input[0])
print(prediction)
print("The label is: " + str(label[0]) + ". The prediction is: " + str(prediction.index(max(prediction))))
[0.0, 0.03036370905054081, 0.06072741810108162, 0.09109112715162263, 0.12145483620216324, 1.1407872249800253, 0.18218225430324525, 0.21254596335378556, 0.24290967240432648, 0.2732733814548679] The label is: 5. The prediction is: 5
The number in index five is the greatest, so the network correctly identified the handwritten number of the number 5
.
It works on one data point but what about the entire data set?
Let's do that next.
Training The Network On All The Whole Dataset
I prepare the images by flattening every image in our data set. Again, this is the first 1000 from the MNIST dataset. I create the neural network, giving it the prepared images and labels.
I run it for 5 epochs. Through trial and error I found that 5 epochs gives me the highest accuracy of just under 75%.
When it's finished, I test the network by making a prediction on a random image. It correctly identified the image.
prepared_images = [flatten_image(image) for image in images]
mm = NeuralNet()
mm.train(prepared_images, labels, 5)
prediction = mm.predict(prepared_images[3])
print("That image is the number " + str(prediction.index(max(prediction))))
That image is the number 1
labels[3]
1
To test the true accuracy, I use the test data and labels.
I run through a loop of the test set, make a prediction, checking its accuracy, and counting the number correct.
test_set = x_test
test_labels = y_test
num_correct = 0
for i in range(len(test_set)):
prediction = mm.predict(flatten_image(test_set[i]))
correct = test_labels[i]
if prediction.index(max(prediction)) == int(correct):
num_correct += 1
print(str(num_correct/len(test_set) * 100) + "%")
74.47%
In the end, I'm able to correctly predict 3 out of every 4 images in the test set.
So What Did We Do?
This was a fun little exercise to see how neural networks use matrix math to make predictions.
What's Next?
In the next post, I’ll experiment with adding multiple layers to make the network "deep". I'll also swap my handwritten matrix math functions for NumPy functions and see how much easier it makes some of this for me.
See you next time!