Project information
- Category: Machine Learning
- Type: Random Forest Classifier
- Client/Purpose: This project was aimed to apply preprocessing and understand feature selection as well as coding the Random Forest algorithm from scratch and compare to libraries' implementation
- Project date: July, 2020
- Project URL: github
Traffic accident severity prediction
The goal was to create an analytic model to predict the severity of a car accident by identifying similar conditions between accidents of different severities. A Random Forest Classifier was implemented from scratch in Python and compared the accuracy of a single CART model versus a Random Forest and also versus the implementations of Scikit-learn library.
A Random Forest model was chosen to predict severity due to its ability to handle multi-class classification problems and its easily interpreted results. The implementation makes use of the entropy method for information gain and uses the Out-Of-Bag (OOB) metric to determine accuracy for the model.
The data cleaning and wrangling consisted of balanced sampling, correlation, outliers, feature selection and considering possible interaction features. Experiments consisted of varying the number of trees in the Random Forest implementation, as well as the maximum depth of the trees. The best model achieved 99.8% accuracy.