INTRODUCTION / BUSINESS PROBLEM

This essay details work that uses a publicly available dataset on vehicle accidents registered by the Seattle City Department of Transportation. This work benefits the American public by identifying those factors conducive to various types of traffic incidents, predicting accident severity, and using this information to minimize this risk.

The full code is found at my GitHub repo

OBJECTIVE

Predict the severity of accident as ‘1’ or ‘2’ using features including the number of cars and people, traffic, weather conditions, etc. This is a classification problem that — once modeled — gives Seattle City authorities previously unavailable insight into accident risk factors, and also gives this author the necessary completion to gain a Coursera Professional Certification.

DATA USED and STEPS TAKEN

The raw data used is provided by the Seattle DOT with the following steps applied:

Step 1: data loading and preliminary insights

Step 2: detailed analysis of feature visualization. The goal is to understand potentially leveraging input variables

Step 3: feature engineering and selection

Step 4: model fitting and training

Step 5: model evaluation, with sensitivity check using auto-machine learning (also referred to as AutoML)

Step 6: concluding takeaways for the client

#xgboost #capstone #ensemble-learning #automl #machine-learning

Predicting vehicle accident severity using ensemble classifiers and AutoML
1.35 GEEK