Which Models to Use for Epidemic Prediction?

In the time of COVID-19 the need for accurate predictions of both long-term and short-term evolution of epidemics has been made apparent. We propose to compare traditional model-based methods, such as Susceptible Infected Recovered model (SIR), with emerging data-driven models including recurrent neural networks (RNN) for time-series prediction. We compare these methods on influenza (flu) data which is more robust and then examine applications to COVID-19 data.

Our findings show that (i) Commonly used model-based methods (i.e. SIR) and data-driven RNN methods (i.e. vanilla-LSTM) do not provide accurate long-term predictions on flu data and require a constant update to be more accurate. (ii) We identify the data-driven Seq2Seq RNN model as the most promising data-driven approach for both short term and long term predictions. Since epidemics follow similar patterns, we propose that Seq2Seq trained on flu data could be used as a model for COVID-19. Such a model will require only a ‘few-shots’ retraining (several samples) to provide predictions.

Image for post

right source:frankundfrei, via pixabay (CC0)

Code files are available from: https://github.com/shlizee/Seq2SeqEpidemics

The Flu Data

The CDC has surveillance systems in place for tracking the seasonal spread of influenza. One such network is the US Outpatient Influenza-like Illness Surveillance Network (ILINet). Each week, outpatient healthcare providers in ILINET report the number of patients with influenza-like illness (ILI) by age group. ILINet provides data on a national, state, and regional level as well as percentages of visits due to ILI weighted by population and unweighted. We will be looking at the national level data for weekly case counts.

Plotting the data we can see that influenza epidemics follow a similar pattern. Furthermore, we can see that we are dealing with time-series which exhibits a yearly seasonal pattern. To make the data easier to work with, we change the format to be seasonal (i.e. looking at each yearly period with a center at week 14 — first week of April ).

Image for post

Flu data from ILINet. Right: 2018–19 flu season number of cases used for evaluation of models

> python3 fludata.py

#time-series-forecasting #epidemic-prediction #neural-networks #data-science #sir-model

The Flu Data

towardsdatascience.com

Which Models to Use for Epidemic Prediction?