Time Series data is data that is observed at a fixed interval time and it could be measured daily, monthly, annually, etc. Time series has a lot of applications, especially on finance and also weather forecasting. In this article, I will introduce to you how to analyze and also forecast time series data using R. For the data itself, I will give you an example from Bank Indonesia consumer price index (CPI) data from December 2002 until April 2020.


Plan of Attack

Before we begin the analysis, I will give you what steps that we have to do. The steps are like this,

  • First, We have to gather and pre-process the data, and also, we should know the domain knowledge of the data that we use,
  • Then, We analyze the time series, visually and statistically,
  • Then, We **identify the perfect model **based on its autocorrelation,
  • Then, We **diagnose the model **whether it meets the independence assumption and finally,
  • We can use the model for doing forecasting

Pre-Process Data

As I said before, we will do time series analysis on Indonesia’s CPI data starts from December 2002 until April 2020. We can get the data from Bank Indonesia. Unfortunately, we have to copy the data first from the website onto the spreadsheet, then make a .csv data from it. The data looks like this after we import it,

library(tidyverse)
data <- read.csv("data.csv")
colnames(data) <- c("Time", "InflationRate")
head(data)

Before we can make an analysis, we have to pre-process it first. Especially on the “InflationRate” column where we have to remove the ‘%’ symbol and convert it to numeric type like this,

# Pre-process the data
data$InflationRate <- gsub(" %$", "", data$InflationRate)
data$InflationRate <- as.numeric(data$InflationRate)
data <- data[order(nrow(data):1), ]
tail(data)

Then, the data that we have will look like this,

With that, we can make a time-series object from the “InflationRate” column using ts function

cpi <- ts(data$InflationRate, frequency = 12, start = c(2002, 12))

With cpi variable, we can conduct the time series analysis.

Analysis

First, let’s introduce the consumer price index (CPI). CPI is an index that measures the price change of consumer goods at a certain time from its base year. The formula looks like this,

The Formula

Each CPI values is measured every month. Here is the code and also the plot as the results from this code,

library(ggplot2)

# Make the DataFrame
tidy_data <- data.frame(
  date = seq(as.Date("2002-12-01"), as.Date("2020-04-01"), by = "month"),
  cpi = cpi
)
tidy_data
# Make the plot
p <- ggplot(tidy_data, aes(x=date, y=cpi)) +
  geom_line(color="red", size=1.1) +
  theme_minimal() +
  xlab("") +
  ylab("Consumer Price Index") +
  ggtitle("Indonesia's Consumer Price Index", subtitle = "From December 2002 Until April 2020")
p
# Get the statistical summary
# Returns data frame and sort based on the CPI
tidy_data %>%
  arrange(desc(cpi))
tidy_data %>%
  arrange(cpi)

Based on the graph, We cannot see any trend or seasonal pattern. Despite it looks seasonal, the peak of each year is not in the same month, so it’s not seasonal. Then, this graph also don’t have an increasing or decreasing trend on it. Therefore, this graph is stationary because the statistical properties of the data, such as mean and variance, don’t have any effect because of the time.

Beside of the graph, we can measure statistical summary from the data. We can see that the maximum inflation occurs in November 2005 with the rate of CPI is 18.38. Then, the minimum inflation occurs in November 2009 with the rate of CPI is 2.41. With that information, we can conclude that the data doesn’t have a seasonal pattern on it.

#economics #data-analysis #r #data-science #statistics #data analysis

Introduction to Time Series Analysis with R
1.10 GEEK