Text Generation Using Long Short Term Memory Network

We will train an LSTM Network on a textual data and by itself learn to generate new text that appears to be of the same form as the training material. If you train your LSTM on textual data, it will learn to generate out new words similar to what we trained with. The LSTM will typically learn human grammar from the source data. You can also use similar technology to complete sentences when a user is entering text like in chatbots.

Importing our dependencies — using tensorflow 2.x

import string
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,LSTM,Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences

Reading the data

file=open('t8.shakespeare.txt','r+')
data=file.read()

Text Cleaning

After getting your text data, the first step in cleaning up text data is to have a strong idea about what you’re trying to achieve, and in that context review your text to see what exactly might help.

There are many punctuations and numerical characters in the data to get the rid off it

data=data.split('\n') 
data=data[253:]
data=' '.join(data)

The cleaner function helps to remove the punctuation and numbers in the data and converts all the characters in to lowercase

def cleaner(data):
    token=data.split()
    table=str.maketrans('','',string.punctuation)
    token=[w.translate(table) for w in token]
    token=[word for word in token if word.isalpha()]
    token=[word.lower() for word in token]
    return token
words=cleaner(data=data)

#deep-learning #naturallanguageprocessing #neural-networks #artificial-intelligence #lstm

Text Cleaning

medium.com

Text Generation Using Long Short Term Memory Network