Over the last few months, we’ve been collecting hundreds of COVID-19 blog posts from the R community. Today, we are excited to share this dataset publicly, to help bloggers who want to analyze COVID-19 data by unleashing R and the resources of its community by being able to research such posts.

So far, we have found and recorded 423 COVID posts in English. In an effort to encourage others to explore such posts, we’ve published a Shiny web app which allow users to find the names of the 231 bloggers who wrote those posts, their roles, and their country of focus. The app also lets users interactively search the collection of posts by primary topic, post title, date, and whether the post uses a particular mathematical technique or data source. To learn more about the evolution of this dataset, one of the authors (Rees) has published nine articles on Medium, which you can find here.

We encourage users to submit their own posts-or others’ posts-for inclusion, which can be done on this Google Form. Our dataset, as well as the code for the Shiny app, is available on GitHub. If anyone has corrections to the dataset, please write Rees (at) ReesMorrison (dot) com.

The remainder of this post highlights some of the findings from the dataset of COVID-19 posts. As will be made evident by the plots that follow, this is by no means a comprehensive review of every COVID-19 R blog post, but rather an overview of the data that we have found.

Posts Over Time

As the pandemic has progressed, fewer bloggers have engaged with COVID-related data, as we notice that blog posts peaked in March of 2020.

Image for post

Some bloggers have been prolific; many more have been one and done. The plot below shows the names and posts of the 23 bloggers who have so far published at least four posts. For an example of how to read the plot, Tim Churches, at the bottom of the y-axis, has published a total of nine posts, but none after early April.

The color of the points corresponds to the work role of the blogger as explained in the legend at the bottom. It is immediately apparent that professors and academic researchers predominate in this group of bloggers. If you include the postgraduate students, universities writ large account for nearly all of the prolific bloggers.

#r #shiny #dataset #rstats #data #data-science

COVID-19 Posts: A Public Dataset Containing 400+ COVID-19 Blog Posts
3.90 GEEK