Interested in scalable text generation? Learn how to programmatically generate copy for ecommerce category pages using a transformer-based language model.

How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2

My MozCon presentation was a short film created by the iPullRank team.

I’m not going to spoil it because I’d rather you watched it, but the movie is one part “Batman: The Animated Series” and one part “Mr. Robot” presented in a mixed-media format.

If you haven’t seen it, we’ve just released a Director’s Cut as well as all the related resources and code (all the tactics and code are real) from the film, so please have a look. We made it for you!

What I want to highlight today, though, is the scene toward the end of the film wherein the concept of scalable text generation is explored.

Have a look:

In this scene we’re depicting our protagonist, Casey Robins, figuring out how to programmatically generate copy for ecommerce category pages and incorporating data into that copy based on the JSON object used to populate that page.

Yeah, that was a mouthful, but it’s the coolest tactic that I’ve devised in the past five years, so bear with me!

DataToText Is Still Academic, But Here’s a Hack

As I mention in the dialog, there’s a field of Natural Language Generation study called DataToText wherein people are taking structured data and using it to generate copy.

In academic research, engineers have highlighted use cases like giving recaps on sports games and also for generating copy for ecommerce product pages.

Here’s an example of copy generated for a sports game recap from the paper, A hierarchical model for data to text generation.

A hierarchical model for data to text generation

Here’s an example of copy being generated for a product detail page from the paper, Storytelling from Structured Data and Knowledge Graphs.

Storytelling from Structured Data and Knowledge Graphs

Naturally those use cases are a direct reflection of things that would support scalable content creation for SEO.

So, I figured DataToText would be ready to roll and I could just hand some structured data off to an API and be all set.

So I skim-read a few of these papers and tried to run some of the code.

I was, frankly, out of my technical depth and not willing to commit to reading thoroughly enough to truly figure out how to do it.

What do you want from me? I have two children and I’m responsible for two businesses during a global pandemic.

So, instead, I’ve identified a shortcut based on what I already know how to do.

Many ecommerce sites are built on Single Page Applications.

This means that there is an API endpoint somewhere that the client-side code accesses in order to populate its content when a page is being constructed or updated.

By design, many of those API endpoints are open and available to us and the authentication is often little to none.

We can use these same endpoints to gather features and derived data points to generate unique and relevant content.

We can use this data to develop a series of sentences with a significant amount of variance to get the data into a paragraph.

Then we can use a natural language generation library (hello GPT-2!) to complete those paragraphs.

Varying the length of the paragraphs and where that varied sentence falls in a given paragraph will yield a wealth of completely unique and relevant content that features our key data points.

Ok, but before we get into how we do that, let’s talk about how we got here.

#digital experience #seo #data analytic

How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT
1.30 GEEK