Analysis of My Spotify Streaming History

Spotisis /spo-ti-sis/

noun

The analysis of one’s Spotify streaming history using Python.

I was reading through a lot of data science related guides and project ideas when I came across an article in which the author compared his song choices with his friend’s. I wanted to do something similar, so set out to analyse my own streaming history and compare it with what the world listens to.

Through this, I aim to find out more about my music preferences and how that differs from the world’s genral picks.

I never really put much thought into my music preference before this project — it was always kind of dependent on my mood, and when someone asked me what type of music I like, I had no answer — because it varied from one hour to another.

I’ve split this project into 2 sections:

**Part A **is the analysis of my music streaming history.

Timeline of my streaming history
Day preference
Favorite artist
Favorite songs
Spirit of the songs
Diversity

**Part B **is the comparison of the top 50 songs streamed on my list with the top 50 songs streamed in 2019

The data

Spotify allows every user to request a download of all their streaming history, so Part A is completely dependent on that. They also have an amazing Developer Platform in which the public can use the data available for their own interest. Along with my personal data, I used the audio features option — which breaks down a song and gives it ‘score’ for a number of different attributes. The attributes are as follows:

Acousticness — A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic
Danceability — A description of how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
Energy — Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
Instrumentalness — Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.
Liveness — Detects the presence of an audience in the recording.
Loudness — The overall loudness of a track in decibels (dB). Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
Speechiness — Speechiness detects the presence of spoken words in a track.
Valence — A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.
Tempo — The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration
Mode — Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
Key — The estimated overall key of the track.

The dataset was a little messy, so I used Pandas to clean it up according to my need for each section. The entire code can be found on the GitHub link at the end of this article.

For Part B, I used this dataset from Kaggle.

Before we begin, I just want to say something… Don’t come at me for my music choice!

#python #analysis #data-science #spotify

The data

medium.com

Analysis of My Spotify Streaming History