Whether you are a data scientist building forecasting models for hospitalizations, or a financial analyst trying to predict stock prices, or even a gamer trying to predict when your console will overheat (been there, done that!), properly wrangling and manipulating your data is extremely critical.

In this tutorial, I’ll be using the COVID-19 dataset (available to download as dailycountry.csv) from https://covidtracking.com to illustrate some basic R methods that we can apply in order to quickly put together a cleaner, more usable dataset.

  1. Converting attributes to the _Date T_ype

If you’re working with time series data, chances are that the formats of any date columns in the dataset aren’t in the proper format. That is, R may interpret them as strings or integers instead of readable dates that can be plotted in a time series (note that the default date format in R is year-month-date).

There are multiple variations and consequently multiple ways to solve this problem. First, you’ll want to figure out the data type of the attribute in the data frame. For example, if your date attribute is originally formatted numerically, say 20200701, then using the str() function will yield the data type of the attribute.

In this case, the data type is int, so we can then use the transform() function to apply the conversion to every value in the column. The date column in the COVID-19 dataset can be transformed from int to date as shown below:

#data-science #data-visualization #time-series-analysis #tidyverse #r #data visualization

R Essentials: Time Series Basics, Part 1
1.40 GEEK