Hi how are you doing, I hope it’s great likewise.

Today we will start off with a topic LSTM, which is a powerful type of neural network designed and optimized to handle sequence of time series data.

**Long-Strong-Term Memory (LSTM)** is the next generation of **Recurrent Neural Network (RNN)** used in deep learning for its optimized architecture to easily capture the pattern in sequential data. The benefit of this type of network is that it can learn and remember over long sequences and doesn’t rely on pre-specified window lagged observation as input.

In Keras this is referred as stateful and involves settings the “Stateful” argument to “True” in LSTM layer.

**What is LSTM in brief?**

It is a recurrent neural network that is trained by using Backpropagation through time and overcomes the vanishing gradient problem.

Now instead of having Neurons, ** LSTM networks have memory blocks that are connected through layers.** The blocks of LSTM contains 3 non-linear gates that makes it smarter than a classical neuron and a memory for sequences. The 3 types of non-linear gates include

** a.) Input Gate:** decides which values from the input to update the memory state.

** b.) Forget Gate:** handles what information to throw away from the block

** c.) Output Gate:** finally handles what to be in output based on input and the memory gate.

Each LSTM unit is like a mini-state machine that utilizes a ”**memory**” cell that may maintain its state value over a longer time, where the gates of the units have weights that are learned during the training procedure.

There are tons of articles available in the internet about the workings of LSTM even the math behind LSTM. So here I will concentrate more for a quicker practical implementation of LSTM for our day to day problems.

Let’s get started!

First is data pre-processing step where we have to give structure the data into supervised learning that is X and Y format.

In simple words it identifies the strength and values of the relationship (positive/negative impact and the values derived is call quantification of impact) between one dependent variable(Y) and series of other independent variables X

For this example we have a retail sales time series data recorded over a period of time.

Now as u know supervised learning requires X & Y independent and dependent variable for the algorithm to learn /train, so we will first convert our data into such format

What we will do we will first take the sales data(t) in our first column than the second column will have the next months(t+1)sales data that we will use to predict. Remember X & Y independent and dependent variable format where we use Y to predict the data.

The code below will convert time series to supervised learning. And yes `df.fillna(0,inplace=True)`

refers replace NaN value with 0 values.

```
#supervised learning function
def timeseries_to_supervised(data, lag=1):
df = DataFrame(data)
columns = [df.shift(i) for i in range(1, lag+1)]
columns.append(df)
df = concat(columns, axis=1)
df.fillna(0, inplace=True)
return df
```

#deep-learning #lstm #sales #machine-learning #data-science #deep learning

1.40 GEEK