Introduction to RNN and LSTM With Keras

🔮 We continue our journey of artificial intelligence with the Keras library, which is highly important in the field of deep learning. Artificial neural networks with one and more layers have many different neural network models. We’ ll examine the RNN neural network model and the LSTM neural network model created by solving a small problem in repetitive neural networks.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

🧷 RNN (Recurrent Neural Network) model has different structure compared to other artificial neural networks. One of the biggest differences is that it has a temporary memory. This memory has short memory. For this reason, they can not remember very far, they can remember the recent past. Whereas ANN is? No way. ANN doesn’t have a memory. It is possible to forget artificial neural networks, but duplicate neural networks are able to remember the recent past with the characteristic that arises from their memory. And what good is a neural network that can remember the past? It is able to make inferences and predictions about the future by using its knowledge of the past. Let’s even say we’re classifying an image. It is able to perform classification by matching the feature in the image in front of it according to a feature it has already learned. Therefore, it works more effectively than other algorithms 🌟 .

[gdlr_core_space height=”30px”]

The next image shows the structure of an RNN network. In these models, the output from the previous step is fed as input from the current step. The networks that remember the information are realized through the hidden layers, which we call The Hidden Layers. Thus, there is a repetition hierarchy.

If it is necessary to give a good example of this model, the translation processes that we see almost intensively in daily life, text mining, in short, the natural language processing field works with RNN logic. How? Imagine you are doing a translation, just as a machine is translating, when you need to predict the next word of the sentence according to each language’s own semantic structure, you will need the previous words. So, you’ll have to remember the previous words. Here the RNN is running to your aid. But the RNN will have its drawbacks, of course. This is why the structure of the LSTM has been revealed. First, let’s examine the memory structure of RNN nicely and then we’ll see the LSTM together.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

There are nodes in the hidden layers that you see in the image. These nodes have temporal loops, which we call the self-feeding temporal loop. In this way, they use their internal memory for short periods of time to activate the recall mechanism. So, what are we talking about this self-feeding thing? In these connections between nerves, neurons communicate among themselves. Even though it gives out an input information it receives, it feeds itself with that input information so that it does not forget. When he gives this information to the next neuron, it stays in his mind that information he has learned before and when the time comes, he remembers it and makes it available.

📝 Let’s consider word prediction, which involves a simple natural language processing. Let’ s take an RNN character level where the word “artificial” is. As the first nine letters {a, r, t, i, f, i, c, i, a} we give the set of letters to the neural network. The vocabulary here consists of only seven letters {a, r, t, i, f, c, l} for which the letter repeats are ignored. Our neural network is waiting for the last letter ” l ” is guessing the letter. In this network, after giving the letter l to the neural network, an iteration form with the previous letter a is applied for the incoming letter r, and this is performed in all synaptic neurons. In this way the desired prediction is performed with a repeating neural network.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

Let’s make another example to understand the structure of RNN. Based on the image above, let’s remember that the alphabet that your teacher taught you in the first grade of primary school is used in sentences to write and even read books in later processes. For example, even when you are reading this block right now, you remember the use of the letters in the alphabet that you hold in your memory and in a way, you realize the repetitive neural network architecture. In fact, this example is considered because RNN has short-term memory and LSTM architecture, which is long-term memory, would be more appropriate if given an example.

[gdlr_core_space height=”50px”]
Structure of RNN
[gdlr_core_space height=”30px”]
  • One to One:  It is a neural network model in which one input versus one output.
  • One to Many: There is more than one output in a neural network with one input. For example, suppose we have an image as input. The expected output is the action in this image.

Output : The Baby who Plays the Piano 🎹

By detecting the notes (outputs) in the piano, baby and music book in the image, he remembers and predicts the output of these features he has already learned.

[gdlr_core_space height=”60px”]
  • Many to One: A neural network with more than one input is expected to output one. As an example, entries can be given in more than one sentence or words within a sentence. As output, for example, we can determine the desired emotion to be given in this sentence.
[gdlr_core_space height=”30px”]
  • Many to Many: The neural network of more than one input is expected to produce more than one output. Translation programs can be given as the best example of this.

Each word in the sentence “recurring models tackle sequence … healthcare” is actually counted as an input in the neural network. English to Turkish translation process here, as shown using neural network structure, an output containing more than one output is given.

💡 Let’s do a little coding with you! You can download The New York City Airbnb Open Data for free from Kaggle, the data set from which the RNN neural network will be used. Let’s start with the data set, shall we?

[gdlr_core_space height=”30px”]

🗺️ After the necessary libraries were introduced, we were able to take the data from our data set with pandas, ignoring the warnings as well. Now we will print the first 5 data on the screen with the head ( ) command for control purposes.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

After receiving the data, we need to determine according to what we will estimate in this data set. For example, I will be interested in estimating the location according to the hotel prices. That’ s why I assigned price information to the training data 🏷️.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

At this stage, we perform MinMax normalization by scaling to 0-1 range as pre-processing.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

📎 We then create empty X and Y training sequences for the formation of samples and predictions. We use an average number of samples from the data you see in the chart above. For example, we specify 100 samples for X_train. 101 according to the sample data taken from 100. We assign the data to the y_train set for prediction.

[gdlr_core_space height=”30px”]

🧐 So, is this going to happen for just 100 data? No. We will always shift this operation one at a time on all the examples. You can review the scrolling process in the For loop.

By performing a resize, you can print the data in the X_train dataset with shape[0] and shape[1]. You can also print the generated y_train prediction set on the screen.

[gdlr_core_space height=”50px”]
Creation of the RNN Model

So far, we have seen the step of preparing data. Now it’s time to build the RNN neural network model we’re going to use. Sequential is a structure that contains all these architectures. The layer is a layer that we use in the architecture. Dropout is a method of regularization that helps us overcome the problem of Overfitting. This is the layer that we will use the RNN architecture 🌲.

Then we started the Sequential( ) module and declared that we would use RNN in the architecture.

🔎 Let’s examine the parameters that SimpleRNN takes in this project together.

  • units: Size of output area
  • activation: Activation function to be used
  • use_bias: Parameter that specifies whether to use bias
  • return_sequences: Parameter containing the value of whether the value in the output array the last output or the full array is returned
  • input_shape: Parameter that specifies the shape of the data to be used
[gdlr_core_space height=”30px”]


The input_shape parameter must be specified only on the first SimpleRNN layer. Because the subsequent layers will develop depending on the other.

[gdlr_core_space height=”30px”]

A total of 4 SimpleRNN layer structures have been created in the RNN architecture created above. Tangent has been used as an activation function. You can test it with other functions such as ReLU, Sigmoid instead. As I said before, the Dropout process is a method that prevents the neural network from performing memorization. It removes neurons and their synaptic bonds from the neural network with a value of 2%. After the layers are created, ‘Adam’ optimization method is selected, and the mean squared error method is selected as the error calculator. Later training was carried out with up to 100 epochs fit.Below is the examination of the first 5 epoch and loss values 🏋️.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

🧮 After the training process is performed, you can create a test folder, or you can predict using the test folder that is already created for the dataset. We talked about RNN structures, but as you may recall, it was a structure with short-term memory. That’s why they can’t remember the very distant past. When you continue to run 100 epochs in this training, you will notice that the result is not very good with the test process. In order to avoid this problem, The Long Short-Term Memory structure, called LSTM, which has both short term and long term memory, has been created. In this way, neural networks can remember both nearby information and information of the very distant past. LSTM is a customized structure of the RNN architecture.

[gdlr_core_space height=”30px”]

X: Scaling of information is the structures that decide whether the connections that come to it merge or not.

+ : The collection of information is the structures that decide whether the information from X should be collected or not.

σ: The Sigmoid layer returns values such as 0 or 1 as a result of the data coming into this layer. (the sigmoid activation function makes 1 Gold 0 over a given value)

tanh: The Tanh activation function is used for the slow learning (vanishing gradient) action resulting from the very small gradient value.

h (t-1): Output value from previous input

h (t) : Output value

C (t): Information to be reached in the next neural network

C (t-1): Information from a previous neural network

In this way, we also learned the structure of LSTM memory. After creating the LSTM architecture with Keras, you can perform the training and testing process as in the RNN codes above 🤸‍♀️.

[gdlr_core_space height=”50px”]
Creation of the LSTM Model

As follows, Let’s quickly create Keras’ models in the same way and move on to creating the LSTM architecture.

[gdlr_core_space height=”30px”]
[gdlr_core_space height=”30px”]

Let’s start building our general structure with Sequential ( ). You can add the LSTM layer and Dense to compile just like in RNN.

[gdlr_core_space height=”30px”]

When the training and testing process is performed, you will see more successful results than RNN. RNN and LSTM architectures have also been learned in depth. I wish you plenty of coding on a good day 😊.

[gdlr_core_space height=”60px”]


  1. DATAI TEAM, Deep Learning and Python: Deep Learning Course from A to Z,
  2. Retrieved from Keras Recurrent Neural Network,
  3. Retrieved from
  4. Retrieved from adresinden
  5. Retrieved from
  6. Retrieved from
  7. Retrieved from
  8. Retrieved from
  9. Retrieved from
  10. Gatech, CS7650 Spring, Introduction to Deep Learning.
  11. Retrieved from
  12. Retrieved from

Leave a Reply

Your email address will not be published. Required fields are marked *