Recurrent Neural Networks (RNN) — Overview
In my understanding RNN is slightly different from Vanilla NNs. This blog is purely an overview of the RNN and how it functions.
As the name suggests, its ‘recurrent’; which means ‘something that occurs over and over again iteratively’. RNNs are mostly used for sequential data, in where one has to make decisions based on previous outputs. For instance, weather forecasting; in order to predicts today’s weather we must take into account yesterday’s weather stats, like if yesterday was an overcast and it rained, then chances are it will be a clear day today. Hence, there is a flow of information from one day to another. Likewise there is a flow of information from one cell state to another. In the below image, each square box is a cell state.
C1 is the cell state at time t=1, C2 at time t=2 so on.
Unlike deep neural networks, where the loss function is calculated at the end where the output ‘Y’ is generated; in RNNs, the loss function is calculated at each time step viz. L1, L2 etc and combined they form the main loss function as L. And for back-propagation the gradients are calculated from the main loss to the sub-loss and finally to the inputs. Hence, there is a constant flow of gradient in the network.
Due to this constant flow of gradients, we often encounter problems like Exploding Gradient and Vanishing Gradient. Gradients are essentially multiplications, so when we multiply any number that is less than 1 and greater than 0, with similar number, the result gets extremely minute when repeated over time in the case of neural networks. For instance, 0.01 * 0.02 = 0.0002, which is much more less compared to 0.01. And due to the vanishing gradient problem we may converge much before than we should. Likewise with exploding gradient, the time to converge may be significantly large.
To tackle this problems, we have some walk arounds, viz: cleverly choosing activation functions, smartly initialising weights, and choosing out network architecture carefully. Hence, researchers have come up some modified form of RNNs called the LSTM and GRUs.