Walk-Through on Tariq's Code

This section provides a walk-through session on the Python code associated with Tariq's book 'Make Your Own Neural Network'. Explanations are provided on all main statements of the code. Graphical illustrations are provided on some key matrix operations used in the code.

In the last tutorial we learned how to install and run the Python code associated with Tariq's book "Make Your Own Neural Network". Now let's walk through Tariq's code and learn the neural network model used in the code.

1. If you open Tariq's code in a text editor, you see that the first section provides copyright information and imports 2 libraries: NumPy and SciPy. The Matplotlib library is commented out since it is not used.

```# %%
# python notebook for Make Your Own Neural Network
# code for a 3-layer neural network, and code for learning the MNIST dataset
# (c) Tariq Rashid, 2016

# %%
import numpy
# scipy.special for the sigmoid function expit()
import scipy.special
# library for plotting arrays
#hy import matplotlib.pyplot
#hy # ensure the plots are inside this notebook, not an external window
#hy %matplotlib inline
```

2. The next section starts to define the "neuralNetwork" class with the standard __init__() method, which allows us to create a generic neural network with 3 layers. 4 parameters, inputnodes, hiddennodes, outputnodes, and learningrate are provided to control the network size and learning rate.

```# %%
# neural network class definition
class neuralNetwork:

# initialize the neural network
def __init__(self, inputnodes, hiddennodes, outputnodes, learningrate):
```

3. The next 4 lines of code copies parameters inputnodes, hiddennodes, and outputnodes to instance variables self.inodes, self.hnodes, and self.onodes, representing the number of nodes in each layer.

```    # set number of nodes in each input, hidden, output layer
self.inodes = inputnodes
self.hnodes = hiddennodes
self.onodes = outputnodes
```

4. The next few lines of code initializes two weight matrices, self.wih for weights associated with links from input layer nodes to hidden layer nodes, and self.who for weights associated with links from hidden layer nodes to output layer nodes.

```    # link weight matrices, wih and who
# weights inside the arrays are w_i_j, where link is from node i to node j in the next layer
# w11 w21
# w12 w22 etc
self.wih = numpy.random.normal(0.0, pow(self.inodes, -0.5), (self.hnodes, self.inodes))
self.who = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.onodes, self.hnodes))
```

Note that the weight matrix is initialized with with random numbers with a normal distribution that has a mean value of 0.0. The standard deviation of this distribution is set to the inverse of the square root of the inbound layer nodes N, or N-0.5, which is coded as pow(self.inodes, -0.5) for weight matrix self.wih, and pow(self.hnodes, -0.5) for weight matrix self.who. Random numbers that meet these given requirements are actually generated by the NumPy function numpy.random.normal().

5. The next few lines of code copies parameters learningrate to instance variable self.lr. Another instance variable self.activation_function is also created to register the logistic sigmoid function provided as scipy.special.expit() from the SciPy library.

```    # learning rate
self.lr = learningrate

# activation function is the sigmoid function
self.activation_function = lambda x: scipy.special.expit(x)

pass
```

That's the end of the __init__() method of the "neuralNetwork" class.

6. Tariq's code continues to define the train() method of the "neuralNetwork" class. The train() method takes two parameters, inputs_list and targets_list, representing input values of a single training sample and expected output values of the same sample.

```  # train the neural network
def train(self, inputs_list, targets_list):
# convert inputs list to 2d array
inputs = numpy.array(inputs_list, ndmin=2).T
targets = numpy.array(targets_list, ndmin=2).T
```

Note that method parameters, inputs_list and targets_list, are converted from 1-dimensional arrays (like [N]) to 2-dimensional matrices (like [1,N]) by the numpy.array() function. Resulting 2-dimensional matrices are then transposed (like [N,1]) to be ready for matrix operations in the next step. Those transposed matrices are stored in local variables, inputs and targets. The transposition operation ().T on a matrix can be illustrated graphically as:

```                    |-|
( [- - -] ) . T = |-|
|-|
```

7. The next 4 lines of code moves signals of the training sample from the input layer to the hidden layer by performing a matrix dot operation for the weight matrix self.wih and the input matrix inputs using the numpy.dot() function. The resulting matrix, hidden_inputs is then passed through the activation function to become the signal matrix in the hidden layer and stored as hidden_outputs.

```    # calculate signals into hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
# calculate the signals emerging from hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
```

The above code can be illustrated graphically as:

```hidden_outputs                   wih      inputs
|-|                         |- - -|
|-|                         |- - -|     |-|
|-| = activation_function ( |- - -| dot |-| )
|-|                         |- - -|     |-|
|-|                         |- - -|
```

9. The next 4 lines of code moves signals stored in the hidden layer to the output layer by performing a matrix dot operation for the weight matrix self.who and signal matrix of the hidden layer hidden_outputs using the numpy.dot() function. The resulting matrix, final_inputs is then passed through the activation function to become the signal matrix in the output layer and stored as final_outputs.

```    # calculate signals into final output layer
final_inputs = numpy.dot(self.who, hidden_outputs)
# calculate the signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)
```

The above code can also be illustrated graphically as:

```final_outputs                      who      hidden_outputs
|-|
|-|                         |- - - - -|     |-|
|-| = activation_function ( |- - - - -| dot |-| )
|-|                         |- - - - -|     |-|
|-|
```

10. The code continues to calculate error values by comparing final_outputs against the given targets. Those error values, output_errors, are then distributed back to the hidden layer according to the transposed weight matrix.

```    # output layer error is the (target - actual)
output_errors = targets - final_outputs
# hidden layer error is the output_errors, split by weights, recombined at hidden nodes
hidden_errors = numpy.dot(self.who.T, output_errors)
```

The above code can also be illustrated graphically as:

```hidden_errors      who.T    output_errors
|-|            |- - -|
|-|            |- - -|     |-|
|-|        = ( |- - -| dot |-| )
|-|            |- - -|     |-|
|-|            |- - -|
```

11. Then, it's time to do the weight matrix adjustment using the formula of "adjustment = rate * ( error * output * (1 - output) dot (input) )". The code starts with the weight matrix between the hidden and output layers first.

```    # update the weights for the links between the hidden and output layers
self.who += self.lr * numpy.dot((output_errors * final_outputs * \
(1.0 - final_outputs)), numpy.transpose(hidden_outputs))
```

Graphically, the weight matrix adjustment can be illustrated as:

```      who               error output       output
|- - - - -|            |-|   |-|    |1|   |-|      hidden_outputs.T
|- - - - -| += lr * ( (|-| * |-| * (|1| - |-|)) dot |- - - - -| )
|- - - - -|            |-|   |-|    |1|   |-|
```

12. The next few lines adjusts the weight matrix between the input and hidden layers in the same way as the previous section.

```    # update the weights for the links between the input and hidden layers
self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * \
(1.0 - hidden_outputs)), numpy.transpose(inputs))

pass
```

That's the end of the train() method of the "neuralNetwork" class.

13. Tariq's code continues to define the query() method of the "neuralNetwork" class. The query() method takes one parameters, inputs_list, only. It performs only the signal forward propagation in the same way as the train() method.

```  # query the neural network
def query(self, inputs_list):
# convert inputs list to 2d array
inputs = numpy.array(inputs_list, ndmin=2).T

# calculate signals into hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
# calculate the signals emerging from hidden layer
hidden_outputs = self.activation_function(hidden_inputs)

# calculate signals into final output layer
final_inputs = numpy.dot(self.who, hidden_outputs)
# calculate the signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)

return final_outputs
```

That's the end of the "neuralNetwork" class, which represents a 3-layer neural network model using logistic sigmoid function as the activation function.

14. Tariq's code continues to create an instance of the above neural network model for the MNIST database starting with model's parameters. The input_nodes is set to 784, because the handwritten digit samples are normalized in 784 (28x28) pixels. So the darkness value (or grey scale) in each pixel is taken into a single node in the input layer without any trancations or paddings.

```# %%
# number of input, hidden and output nodes
input_nodes = 784
```

15. The hidden_nodes is set to 200 with no particular reason. But it should be large enough so that the neural network has enough memory (weight matrices) to remember handwritten digit patterns. However it can not be too large to consume too much computing resources. We will do some experiments later on hidden_nodes to see its impact on the neural network model.

```hidden_nodes = 200
```

16. The output_nodes is set to 10, because Tariq decided to encode those 10 expected labels (to be recognized from input samples) directly into 10 nodes in the output layer. Each label is encoded with a single node turned on in the output layer. For example, label 3 is expected as [0, 0, 0, 1, 0, 0, 0, 0, 0, 0] in the output layer. We will do some experiments later on output_nodes with different encoding schema to see its impact on the neural network model.

```output_nodes = 10
```

17. The learning_rate is set to 0.1 with no particular reason. But it should be large enough so that the neural network can reach the stable state quickly. However it can not be too large causing the neural network to jump back and forth around the stable point. We will do some experiments later on learning_rate to see its impact on the neural network model.

```# learning rate
learning_rate = 0.1
```

18. The following code creates an instance of "neuralNetwork" with above parameters and stores it in a local variable n.

```# create instance of neural network
n = neuralNetwork(input_nodes,hidden_nodes,output_nodes, learning_rate)
```

19. The next few lines of code reads in the training dataset (with 60,000 samples) of the MNIST database and stores it in a local variable training_data_list.

```# %%
# load the mnist training data CSV file into a list
training_data_file = open("mnist_dataset/mnist_train.csv", 'r')
training_data_file.close()
```

20. Tariq's code continues to train the neural network model by running the training dataset 5 times (or epochs). Repeating training multiple times can improve the accuracy of the model, if the training dataset is not big enough. We will do some experiments later on epochs to see its impact on the neural network model.

```# %%
# train the neural network

# epochs is the number of times the training data set is used for training
epochs = 5

for e in range(epochs):
```

21. The next section of code loops through each sample in the training dataset. The code extracts input values (grey scales of 784 pixels) represented in a single line in CSV format from second position to the end of line. Remember that the first position stores the label of the expected digit of the sample. Input values are then normalized in the range of 0.01 to 1.0.

```  # go through all records in the training data set

for record in training_data_list:
# split the record by the ',' commas
all_values = record.split(',')
# scale and shift the inputs
inputs = (numpy.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
```

22. The code continues to prepare the expected output values and stores them in targets. Expected output values are set to 0.01 for all nodes, except the node (set to 0.99) that corresponds to the expected label of the sample.

```    # create the target output values (all 0.01, except the desired label which is 0.99)
targets = numpy.zeros(output_nodes) + 0.01
# all_values[0] is the target label for this record
targets[int(all_values[0])] = 0.99
```

23. Finally, the train() method is called with above input and expected values of the sample to train the neural network once. After that, the execution continues to the next training sample and next epoch.

```    n.train(inputs, targets)
pass
pass
```

That's the end of training phase on the neural network model.

24. Tariq's code continues to evaluate the accuracy of the neural network model using the test dataset of the MNIST database by reading the test samples first.

```# %%
# load the mnist test data CSV file into a list
test_data_file = open("mnist_dataset/mnist_test.csv", 'r')
test_data_file.close()
```

25. The next section of code loops through the test dataset. Input values and expected label are extract from a single line in CSV format in the same way in the training phase.

```# %%
# test the neural network

# scorecard for how well the network performs, initially empty
scorecard = []

# go through all the records in the test data set
for record in test_data_list:
# split the record by the ',' commas
all_values = record.split(',')
# correct answer is first value
correct_label = int(all_values[0])
# scale and shift the inputs
inputs = (numpy.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
```

26. The next few lines of code queries the neural network model with input values. The output values of the query is scanned for the node with the highest value to determine the predicted label.

```  # query the network
outputs = n.query(inputs)
# the index of the highest value corresponds to the label
label = numpy.argmax(outputs)
```

27. The next few lines of code compares the predicted label against the expected label. The result is registered in a local variable scorecard. After that, the execution continues to the next test sample.

```  # append correct or incorrect to list
if (label == correct_label):
scorecard.append(1)
else:
scorecard.append(0)
pass

pass
```

28. Finally, test results are summarized as a performance score ( success rate on the test dataset) of the neural network model.

```# %%
# calculate the performance score, the fraction of correct answers
scorecard_array = numpy.asarray(scorecard)
print ("performance = ", scorecard_array.sum() / scorecard_array.size)

# %%
```

Well, that's a long code walk-through session. Hope you enjoyed it!