Predicting number of Covid19 deaths using Time Series Analysis (ARIMA MODEL)

Predicting number of Covid19 deaths using Time Series Analysis (ARIMA MODEL)

In this paper, a time series analysis to predict the number of deaths in the United States starting from August 1st — August 21st and August 1st — November 1st is modeled and studied. The time series model that was selected to make the prediction is called Auto Regressive Integrated Moving Average (ARIMA) model.

Executive Summary:

On March 11, 2020 the World Health Organization (WHO) declared the novel coronavirus (Covid19) outbreak as a global pandemic. In this paper, a time series analysis to predict the number of deaths in the United States starting from August 1st — August 21st and August 1st — November 1st is modeled and studied. The time series model that was selected to make the prediction is called Auto Regressive Integrated Moving Average (ARIMA) model.

The paper is divided into the following sections:

  1. Why & What is Time Series Analysis
  2. When we can’t use Time Series Analysis
  3. Components of Time Series Analysis
  4. Demonstration of Time Series Analysis
  5. Conclusion

The data has been drawn from “Our World in Data” and consists of the necessary information to conduct the time series analysis. The variables that are relevant to answer our research question are the dates (2019/12/31–2020–08/01), total deaths, new deaths and location (USA). The data has been cleaned and adjusted to satisfy all the necessary assumptions to use ARIMA to make the prediction.

The forecast of new deaths for the next 21 and 90 days reaches 18,589 (Total Deaths 171,903) and 82,653 (Total Deaths 235,967) respectively. The result of our projection has been very close when comparing it to CNN’s projection. CNN projected on August 2nd that about 19,000 people could die between August 2nd and August 21st in the United States. In addition to that prediction, they also predicted on July 31st in their show “CNN Coronavirus Town Hall” the total numbers of death by November. CNN forecasted 231,000 death from Covid19 by November. The results of our ARIMA Model are very close when comparing it to CNN’s projection.

Result:

Image for post

Image by author

1. Why & What is Time Series Analysis:

Time series analysis (TSA) is a statistical technique that consists of data points listed in time order. The x axis is made up of equally spaced points in time and the y axis contains the outcome values that are going to be projected from our model based on previous observed values. This technique is suitable for research questions such as forecasting future sales. The reason why time series analysis exists, is due to the fact that the outcome variable in our model is dependent on one single explanatory variable only: time.

Suppose you run a shoe store and have the data available that tells you how many shoes you have sold in the past years. Given the data available, time series analysis would be applicable if you would like to predict how many shoes your store will sell in the future. In this case, the outcome variable would be the number of shoes sold and the one and only explanatory variable would be time.

Other forecasting algorithms such as linear regression or logistic regression use one or more explanatory variables. Further there is a difference when it comes to the assumptions when comparing linear regression, logistic regression and the time series technique ARIMA.

In Linear Regression the following assumptions have to be met:

  • Independence of observations.
  • Homoscedasticity of errors (equal variance).
  • A linear relationship.
  • Errors are normally distributed.

In Logistic Regression the following assumptions have to be met:

  • Dependent variable has to binary.
  • Independence of observations.
  • Linearity in the logit for continuous variables.
  • Lack of influential outliers
  • Absence of multicollinearity

In Time Series Analysis ARIMA the following assumptions have to be met:

  • Data has to be stationary.
  • Data should be univariate. As mentioned above TSA ARIMA works on a single variable only.
  • Data should be in time series data format.

data-analysis r time-series-analysis data-visualization data-science

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Cleaning in R for Data Science

A data scientist/analyst in the making needs to format and clean data before being able to perform any kind of exploratory data analysis.

Time Series In R | Time Series Analysis In R Step By Step | R Programming

Time Series analysis is a standard machine learning problem. We'll perform Time Series Analysis in R. It is a hands-on project where we will use time-series energy data. We will understand how techniques such as time-based indexing, resampling, and rolling window can help us explore electricity demand variations and renewable energy supply over time.

R For Data Science Full Course | Data Science With R Full Course |Data Science Tutorial

Learn the essential concepts in data science and understand the important packages in R for data science. You will look at some of the widely used data science algorithms such as Linear regression, logistic regression, decision trees, random forest, including time-series analysis. Finally, you will get an idea about the Salary structure, Skills, Jobs, and resume of a data scientist.

R Essentials: Time Series Basics, Part 1

Quick Tips for Getting Started with Temporal Data. Whether you are a data scientist building forecasting models for hospitalizations, or a financial analyst trying to predict stock prices.

Exploratory Data Analysis is a significant part of Data Science

Data science is omnipresent to advanced statistical and machine learning methods. For whatever length of time that there is data to analyse, the need to investigate is obvious.