RNN Recursive Function

Neural Network Tutorials - Herong's Tutorial Examples

∟RNN Recursive Function

This section provides a quick introduction of RNN (Recurrent Neural Network). It starts from a generic neural network hidden layer, and slowly converts it into a RNN layer by combining the weighted average with the activation function into a recursive function to manage a feed from one sample to the next sample recursively.

From previous section, we learn that a RNN layer requires a recursive function R() that takes two inputs and generate two outputs. One output, called y, goes to the layer and the other output, called s, stays in the same layer and is used as an input the next sample.

In this section, we will learn how to construct the recursive function R().

A simple option to construct the recursive function R() is to set the state vector, s_t, to be the same as the output vector, y_t. And use an activation function as the recursive function with input extended to take both inputs x_t and s_t-1:

Generic form:
  (y_t, s_t) = R(x_t, W_t, s_t-1, U_t)

Simplified form:
  y_t = f(W_t·x_t + U_t·s_t-1)
  s_t = y_t

Or:
  y_t = f(W_t·x_t + U_t·y_t-1)

Or:
  y_t = f( |W_t, U_t| · |x_t| )
       (          |y_t-1| )

f() represents the activation function, same as traditional neural networks.

· represents the dot operation of a matrix and a vector.

The above recursive function become different variations with different activation functions:

f() = sigmoid():
  y_t = sigmoid(W_t·x_t + U_t·y_t-1)

f() = tanh():
  y_t = tanh(W_t·x_t + U_t·y_t-1)

f() = ReLU():
  y_t = ReLU(W_t·x_t + U_t·y_t-1)

...

If this simplified recursive function model is used, a RNN cell (actually a RNN layer) can be illustrated in different diagrams. Here are some examples I have collected from the Internet.