INTRODUCTION / BUSINESS PROBLEM
This essay details work that uses a publicly available dataset on vehicle accidents registered by the Seattle City Department of Transportation. This work benefits the American public by identifying those factors conducive to various types of traffic incidents, predicting accident severity, and using this information to minimize this risk.
The full code is found at my GitHub repo
OBJECTIVE
Predict the severity of accident as ‘1’ or ‘2’ using features including the number of cars and people, traffic, weather conditions, etc. This is a classification problem that — once modeled — gives Seattle City authorities previously unavailable insight into accident risk factors, and also gives this author the necessary completion to gain a Coursera Professional Certification.
DATA USED and STEPS TAKEN
The raw data used is provided by the Seattle DOT with the following steps applied:
Step 1: data loading and preliminary insights
Step 2: detailed analysis of feature visualization. The goal is to understand potentially leveraging input variables
Step 3: feature engineering and selection
Step 4: model fitting and training
Step 5: model evaluation, with sensitivity check using auto-machine learning (also referred to as AutoML)
Step 6: concluding takeaways for the client
#xgboost #capstone #ensemble-learning #automl #machine-learning