When I started programming a few years back, I never had a purpose for what I would use my newfound skills for. I toyed around with building video games, making basic websites, iOS applications, but in a way, I was spreading myself thin by not specializing in a certain craft. Around this time, the Toronto Raptors were on a historic playoff run, and I had just started playing around with bet365 and happened to make a good profit betting exclusively on them during the entire playoffs.

Given my background in both Commerce and Computer Science, I wanted to learn more about the role of a Quantitative Analyst and Data Scientist. I began to explore the world of data science and started by learning the basics of the Scikit-learn package given my background in python. The rest of this article is going to outline how I went from knowing next to nothing about Data Science and Machine Learning to building my first NBA prediction model with a ~72% accuracy (more on this later but the results aren’t as great as they seem).

1. The Game Plan
2. Data Acquisition
3. Data Exploration & Processing
4. Choosing the Right Model
5. Testing and Results

## The Game Plan

The NBA, as well as many other sports, has seen the use of statistics exponentially grow over the last 10–20 years. I began my search on the most relevant NBA stats by reading _Which NBA Statistics Actually Translate to Wins_by Chinmay Vayda. His research discovered that the best predictors of wins in the NBA were a team’s Offensive Rating, Defensive Rating, Rebound Differential, 3-Point %, among other stats, which you can read more about by following the link above.

Photo by NeONBRAND on Unsplash

I planned to use more recent data, by leveraging the NBA’s monthly statistics and using those as the predictors for the matches that were played during that month. Since this was my first attempt at building a model, I wanted to keep the data simple and keep the mathematical complexity at a minimum.

The next step was figuring out how to acquire this data. I needed a data source for match results of the last ~10 years, as well as a source for a team’s statistics in any given month.

## Data Acquisition

For the data acquisition portion of this project, I used Selenium, a python package that facilitates web scraping, to get the data I needed off various websites. I also decided to limit my search to data beginning from the 2008–2009 season to the present.

#machine-learning #data-science #nba #data-visualization

3.95 GEEK