Wearables and mobile devices are extensively used to record and provide live information on many activities related to Health and Physical Performance. In the following posts we describe the experiment we have performed to assess the impact of music on running. Through the statistical analysis of synchronized time series, and machine learning, we develop a personalized song recommender system that could improve your running performance.

In Part 1 (you are here) we cover the experimental setup, the data collection and processing, and a first assessment of the impact of music on running.

In Part 2 we present the feature engineering, the validation schema, the regression models, and the construction of the recommender system.


Have you ever wondered if the music you listen to while exercising has any measurable effect on your performance? Trying to find the answer to this question has been a good motivation to exercise through weeks of confinement, and the perspective of uncertain and predominantly solitary training sessions to come. In these blog posts we describe a case study that targets the development of a music recommender system to enhance sports performance. We combine the analysis of music records using a popular streaming app, and the physical data recorded during the running exercises via a well known sports watch brand. We focus on the data science building blocks behind the complete solution workflow, rather than on coding. However, we indicate some libraries and functions useful to reproduce the workflow within a Python environment. Well, only after you exercise your running capabilities for several weeks ;).

The data science principles we cover here apply to solutions that leverage time series with multiple purposes: to monitor the health status; to enhance driving safety; to improve focus during sensitive activities; to boost working performance; or to just get the most of your preferred sports training. They apply to solutions in fields like supply chain demand, sales forecasting, trading, or medical and drug testing to mention real world applications.

Experimental Setup

For the experiment we have recorded the data of running exercises for several weeks. The plots and numbers quoted here are based on a single runner (N=1 statistical analysis); we have followed an analogous setup for a second runner to further validate the process.

Running with smart devices, we are already dealing with one of the most challenging parts in any data science project: data recording. This study includes 52 different exercise records. The data from the sports watch contains GPS, accelerometer, and heart rate sensor data (optical wrist sensors). At the same time, we collect and synchronize the data of the music streamed during the exercise (if any). We also collect weather data from the closest station to the training location. This leads to a combination of time series with a sample rate of a second, except for the weather data whose variation within a running exercise we neglect.

After processing the data, we first attempt a simplified hypothesis test to assess the impact of music on running performance. Its results indicate that a multivariate analysis is needed for that. Consequently, we train a set of regressor algorithms that aim to predict the running performance in each next time interval of a running exercise. They base the predictions on all data recorded up to that point in time. We check if the addition of features related to the music played during the exercise improves the predictive power of the algorithms. We study the (adjusted) coefficient of determination for this, as the coefficient speaks about the strength of the dependence between a response variable (running performance) and the predictor features. By studying if the music features improve the coefficient, we can assess the actual impact of music on running. We take advantage of the regressors, and build a solution that selects which is the song that would maximize at each time the performance on a running exercise, recommending it for streaming.

Image for post

There are many caveats to consider in the process, we cover them throughout the post. The most limiting of them all is the sample of participants in the experiment. Only two runners have taken part in it, so the results here speak of the impact of music on their running performance. Furthermore, a generalizable experimental design would have targeted a repetition of running exercises, minimizing the variation of all conditions, except for changes on the music being streamed. This is, exercising at the same time of the day, with the same weather conditions, same preliminar fatigue, same track with fixed distance and elevation, and performing the same type of test. Keeping such a monotonous setup for a single person and literally thousands of running minutes would have defeated the motivation we had for experimenting with ourselves. Instead, we decided to train following a varied running plan. While this will have an effect when assessing the impact of music on performance, generalizing the experiment was not in the scope of this initial study. We still hope the pipeline, workflow, and discussed ideas, will inspire a future generalizable approach.

#time-series-analysis #sports-analytics #data-science #machine-learning #healthcare

Data Collection - Hypothesis Testing
1.10 GEEK