Proper validation of a Time-Series model

Proper validation of a Time-Series model

In this article, I’ll go through several set-ups on how to evaluate machine learning model performance on time-series data. There are several ways how to do it correctly. But there are also many ways how to do it wrong. I’ll give some examples of both.

In this article, I’ll go through several set-ups on how to evaluate machine learning model performance on time-series data. There are several ways how to do it correctly. But there are also many ways how to do it wrong. I’ll give some examples of both. For all the examples here I’ll be using time-series and model described in my previous article about Approaching time-series with a tree-based model, but approaches described here are general and work as well for other models.

Image for post

Photo by patricia serna on Unsplash

So we have a model for time-series predictions and now want to see, how good it is performing. The common approach to this is to use a validation set. This set is not used for training, just for assessing the quality of predictions, that way providing a trustful way to evaluate model performance on unseen future data.

For simple tabular data, a typical way is to choose validation holdout set randomly or to use cross-validation with several folds. However, doing this for time-series might not be what we want. The reason is that by selecting random data points for validation, we might end up having training data more recent than validation. Basically, we would train on the future data and predict the past. Which is not exactly what we are expecting from our model to do in practice.

Therefore a more typical approach for time-series is to make a selection by time, taking the most recent data as a validation set, as seen in the visualization below.

Image for post

So far everything seems simple and straight forward, however, let’s look closer at the validation part.

The gap in validation data

We have one month for validation data in a given example. Everything is clear for the first day of the validation period. We have full continuous historical data of previous data points, can calculate all the features the same way as for training data. But now let’s take a look at the second day in the validation period. If we would like to predict the value two days ahead, this means we are missing the truth value for one day. See the visualization below for better understanding of the problem.

time-series-forecasting data-science validation machine-learning

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

How to use Deep Learning for Time Series Forecasting

How to use Deep Learning for Time Series Forecasting. An application of the RNN family

What is Time Series Forecasting?

In this article, we will be discussing an algorithm that helps us analyze past trends and lets us focus on what is to unfold next so this algorithm is time series forecasting. In this analysis, you have one variable -TIME. A time series is a set of observations taken at a specified time usually equal in intervals. It is used to predict future value based on previously observed data points.

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

Time Series Forecasting: Limitations of LSTMs

While LSTMs have become increasingly popular for time series analysis, they do have limitations. Long-short term memory networks (LSTMs) are now frequently used for time series analysis.