How to use Tokenizer and also introduce Padded Sequences tool in NLP with Tensorflow and Keras. Simple examples in Python.
The idea behind the pad sequences tool is that it allows you to use sentences of different lengths, and use padding or truncation to make all of the sentences the same length.
Sequences longer than num_timesteps are truncated so that they fit the desired length. The position where padding or truncation happens is determined by the arguments padding and truncating, respectively.
pad_sequences is used to ensure that all sequences in a list have the same length. By default this is done by padding 0 in the beginning of each sequence until each sequence has the same length as the longest sequence.