Spotify User Analytics

Containerized end-to-end analytics of Spotify data using Python, dbt, Postgres, and Metabase.

In this project, we will be analyzing our listening history, top tracks & artists, and genres from Spotify. Here are the tools that we will be using:

  • Python - Scraping data from Spotify API endpoints and saving files to CSV
  • Postgres - Our database wherein data will be stored into and queried from
  • dbt (Data Build Tool) - Data modeling tool to transform our data in staging to fact, dimension tables, and views
  • Metabase - Dashboarding tool to analyze our data
  • Docker - Containerizing our applications i.e. Postgres, dbt, and Metabase

Project Files

  • app
  • main.py - Our main ETL script that fetches data from the Spotify API endpoints and saves them to CSV
  • util.py - Utility helper file that contains a custom class SpotifyUtil
  • config_template.py - This is where we will store our credentials
  • dbt
  • models - Contains the sql scripts and schema.yml files that will be used when we run our transformations
  • dbt_entrypoint.sh - Script that will server as our entrypoint when running the dbt container
  • Dockerfile - Contains the commands to create the custom Docker image
  • dbt_project.yml - YAML file to configure dbt
  • packages.yml - YAML file for test dependencies
  • profiles.yml - YAML file to configure connection of dbt to postgres
  • metabase
  • metabase.db - Metadata database of Metabase for the dashboard
  • docker-compose.yml - YAML file to orchestrate Docker containers composition

#data analysis #data-science

Containerized end-to-end analytics of Spotify data using Python
6.15 GEEK