When looking at the TIOBE index for R, we can see that after a peak in ratings in January 2018, the language saw a significant decline after that point. However, it is evident that ratings have been shooting back up since March.
Source: www.tiobe.com
What is the reason for this?
Well, one of the big reasons is that COVID-19 has sparked a strong rebound in an interest for statistics.
Up till now, the field of data science had been eschewing this area in favour of more “sexy” topics such as natural language processing, computer vision, among others.
However, the ongoing pandemic has reignited the debate surrounding the role statistical analysis has to play in analysing and ultimately combatting this virus.
Given the nature of comparing statistics between different countries, there has been a particular interest in logarithmic charts, and their use when it comes to comparing data with vastly different scales, as the below tweet illustrates:
Source: twitter.com
It therefore stands to reason that if one wants to conduct effective statistical analysis quickly, one would seek an intuitive statistical environment to run the numbers. The evidence suggests that R is still dominating here.
However, in my experience of using both Python and R, here are my experiences of how the latter is still superior to the former.
If you have worked with time series analysis before, you are most likely familiar with what is called the ARIMA (Autoregressive Integrated Moving Average) model.
This is a model that can be used to make forecasts based on the structure of a time series. An ARIMA model consists of coordinates (p, d, q):
#statistics #r-programming #rstats #data-science #regression