Recurrent Neural Networks

Recurrent neural networks are a variation of feed forward artificial neural networks that have become exceedingly popular and have been used for many learning tasks related to genomics, NLP, and image classification. The distinguishing features of RNNs is that they possess memory, and here I will attempt to explain what this means and why it is useful.

To understand RNNs it is first useful to revisit standard feed forward ANNs. These type of neural networks feed (via matrix multiplication) feature information through a network of layers consisting of nodes. Each node receives input from previous nodes and passes an output to all of the nodes in the next layer, never moving backwards or sideways within a layer (hence the description feed-forward). When each node in each layer is connected to each node in the next layer we call this a fully connected network. This is illustrated in the diagram below.

On a side note, you can find all of the Tensorflow code used for this project in github:
https://github.com/JTDean123/tolstoyLSTM

(check out a previous post for more details: http://jasontdean.com/python/ann.html)

In [4]:
from IPython.display import Image
In [11]:
Image("ANN.png")
Out[11]:

The model above learns via the inputs that are fed to it at a particular moment in time. This seems obvious, but the implication is that when the model is tasked with making a prediction it does not 'remember' the immediately previous data that is has seen. Put another way, this means that a decision made at time (t) does not influence a decision made at time (t+1), and this is suboptimal when classifying observations that occur in sequences (like data dependent on time). For example, a feed forward ANN trying to classify an apple will not know whether or not the previous images was an orange. Additionally, ANNs operate over a single input and generate an output.

Here I will show that the architectures of RNNs allow us to overcome these limitations: RNNs allow for both operations over multiple inputs in a sequence dependent fashion and for a network that retains memory of previous events. A figure will make this more clear (at least to me!). Take for example the potential RNN network structures shown below.

In [3]:
# figure from http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Image('rnn.png')
Out[3]: