I have read a few stories on Medium about writing advice, and there were some of them which, along with other tips, suggested that putting numbers in your story’s title will increase the number of views, as people tend to be more attracted by such headlines, and therefore, more people will click on your story.
It seems interesting that people are attracted by such headlines. But I don’t like to take things for granted. I want to convince myself that this fact is actually true.
So, what I have been thinking? Let’s use Statistics to check if this thing is actually true. But Statistics is useless without data. I first need to obtain some data about Medium articles and use that to do hypothesis testing. Therefore, I used Python and Beautiful Soup to scrape data about a random set of 6K+ Medium articles from 7 different publications. This dataset can be found on Kaggle. If you want to see how I scraped this data, I have an article about that here:
What we are going to do now is to split this dataset into 2 groups (or samples): one that has numbers in headlines and one without numbers. Then, we will do a hypothesis test on the expected value for the number of claps in these 2 groups. We use the number of claps as a measure of “how successful” a story is, although a more logical variable for our scenario would be the number of views as it is the one that is more directly affected by our choice for the title. People typically click on a story because of the preview that they see (including headline and image), and then after they read the story, they decide whether to clap or not. But, because the number of views is not publicly shown on Medium, we use the number of claps as it should be highly correlated with views (the more the views, the more likely is that someone would clap).
If you are not familiar with hypothesis testing, here is an article you can read:
That being said, we will consider the following model:
Sample 1: Articles with numbers in headlines
We will model the number of claps inside this group as n i.i.d. (independent and identically distributed) random variables: X₁, X₂, …, Xₙ with expected value µ₁ and variance σ₁², both of which are finite.
Sample 2: Articles without numbers in headlines
We will model the number of claps inside this group as m i.i.d. random variables: Y₁, Y₂, …, Yₘ with expected value µ₂ and variance σ₂², both of which are finite.
We formulate the null hypothesis as “articles with numbers in headlines bring no improvement over articles that have no numbers in headlines”, and the alternative hypothesis as “articles with numbers in headlines are more successful compared to articles without numbers in headlines”.
#statistics #hypothesis-testing #writing #probability #mathematics