What Is RNN (Recurrent Neural Network)

Neural Network Tutorials - Herong's Tutorial Examples

∟What Is RNN (Recurrent Neural Network)

This section provides a quick introduction of RNN (Recurrent Neural Network). It starts from a generic neural network hidden layer, and slowly converts it into a RNN layer by combining the weighted average with the activation function into a recursive function to manage a feed from one sample to the next sample recursively.

What Is RNN (Recurrent Neural Network)? RNN is an extension of the traditional neural network, where hidden layers are taking extra feeds from the same layer left the previous sample.

Those extra feeds make the output being depended on not only from the current sample, but also from the previous sample. Because of this feature, RNN works better than the traditional neural network on sample datasets where samples have some impacts on subsequent samples. For example, words in an novel.

If you have a good understanding of traditional neural network models, you can follow this tutorial to learn the basic architecture of RNN models.

1. Take a standard illustration of a traditional neural network model.

2. Zoom in on hidden layer #2 by hiding other layers to change it towards a generic NN (Neural Network) layer. Input values of the layer are represented as x₁, x₂, x₃, x₄. Output values of the layer are represented as y₁, y₂, y₃ as its output. The weight matrix W₂ is renamed to W to be more general.

Traditional Neural Network - Hidden Layer — Traditional Neural - Hidden Layer

3. The formula of the forward calculation that generates the output from the input for a generic NN (Neural Network) layer can be expressed as:

NN forward calculation on a generic layer:
   y₁ = f(∑(W_1,j*x_j))
   y₂ = f(∑(W_2,j*x_j))
   y₃ = f(∑(W_3,j*x_j))

∑() represents the weighted average
   of input values calculated as the dot product (or inner product) of one
   row from the weight matrix and the input vector.

f() represents the activation function (the logistic sigmoid function
   for example)

4. Now condense all nodes in the generic layer as a sing element, using x to represent the input vector (input values as its elements), y to represent the output vector (output values as it members), and W to represent the weight matrix. Note that this condensed diagram is valid for all hidden and output layers.

5. Condense also the expression the forward calculation.

NN forward calculation on a generic layer condensed:
   y = f(W·x)

· represents the dot operation of a matrix and a vector.

6. If the NN model is applied to a sequence of samples, (..., x_t-1, x_t, x_t+1, ...), the condensed diagram can be rotated and replicated to illustrate forward calculations of the same layer on multiple sequential samples. It shows that forward calculations on different samples are independent from each other in a traditional neural network model. Note that the weight matrix is updated after each sample during the training phase. This is why it is labeled differently for different sample.

Neural Network Layer on Sequential Samples

7. Now we are ready to extend the traditional neural network into a recurrent neural network by replacing the combination of activation function f() and the weighted average operation ∑() a recursive function R() that takes two inputs and generate two outputs. One output, called y, goes to the layer and the other output, called s, stays in the same layer and is used as an input the next sample.

RNN (Recurrent Neural Network) Layer Architecture

8. If we focus on the forward calculation of a single sample, x_t, in a sequence of samples, we can express the recursive function R() as:

(y_t, s_t) = R(x_t, W_t, s_t-1, U_t)

Inputs:

  x_t represents the input vector of the current sample.

  W_t represents the weight matrix on the input vector
  for the current sample.

  s_t-1 represents the state vector generated from
  the calculation of the previous sample. The state vector is
  introduced in RNN to feed information from one sample to the
  next sample.

  U_t represents the weight matrix on the state vector
  for the current sample.

Outputs:

  y_t represents the output vector of the current sample.

  s_t represents the state vector to feed information to
  the next sample.

9. If you like a more compact format, we can illustrate the RNN layer architecture with a single component, also called a RNN cell, with a circular arrow to represent its recursive nature. The state vector is omitted.

Hope you have a good understanding on how information flows in a neural network layer in a RNN model now. What's left is how to construct the recursive function R(), which will be discussed in the next tutorial.

Table of Contents

About This Book

Deep Playground for Classical Neural Networks

Building Neural Networks with Python

Simple Example of Neural Networks

TensorFlow - Machine Learning Platform

PyTorch - Machine Learning Platform

Gradio - ML Demo Platform

CNN (Convolutional Neural Network)

►RNN (Recurrent Neural Network)

►What Is RNN (Recurrent Neural Network)

RNN Recursive Function

What Is LSTM (Long Short-Term Memory)

What Is GRU (Gated Recurrent Unit)

GNN (Graph Neural Network)

GAN (Generative Adversarial Network)

Performance Evaluation Metrics

References

Full Version in PDF/EPUB