In this post, I’ll show you how Artificial Intelligence (AI) and Machine Learning (ML) can be used to help you get a start on that novel you always wanted to write. I’ll begin with a brief background on how computers process text using AI. Then I’ll describe how I set up an ML model called GPT-2 to generate new plot summaries, and give instructions on how you can create some new story ideas for yourself.

This is the second part of my series of articles on how AI can be used for creative endeavors. The first part is on how to use ML to create abstract art, available here.

Background

Natural Language Processing

Natural Language Processing (NLP) is a field of Linguistics and Computer Science that studies how machines, with their computer languages like Java and Python, can communicate with humans, with their natural languages like English and Swahili. One of the first proponents of teaching computers to understand human language was Alan Turing. He wrote about it in 1950 [1].

We may hope that machines will eventually compete with men in all purely intellectual fields. But which are the best ones to start with? Even this is a difficult decision. Many people think that a very abstract activity, like the playing of chess, would be best. It can also be maintained that it is best to provide the machine with the best sense organs that money can buy, and then__teach it to understand and speak English. — Alan Turing, Mathematician and Computer Scientist

Text to Tokens

One of the first steps in processing text is converting words to numbers. This process is called tokenization. One of the simplest ways to tokenize text is to simply assign unique words and punctuation marks a numerical value in the order they appear in the input sequence. For example, consider the first part of the opening line from “Tale of Two Cities”.

Tokenizing Text

You can see that four of the words, “was”, “the”, “of”, and “times”, appear twice, and they get the same token value in both instances. In this scheme, capitalized words like “It” get a different token than “it” in lower case. Also, punctuation marks, like the comma, get their own token. You can read about various tokenization schemes in Srinivas Chakravarthy’s post here.

Transformers

Once the words are tokenized, they can be processed by machine learning systems for various tasks. For example, an ML model can be trained to translate text from one language to another. In the example below a transformer has been trained to translate from English to Spanish.

#creative-writing #naturallanguageprocessing #machine-learning #artificial-intelligence #gpt-2

Got Writer’s Block? It’s PlotJam to the Rescue!
1.40 GEEK