In the first part of this series, I have discussed a basic simplified implementation of machine learning algorithms to predict the defect per cent of future purchase orders based on input parameters.

In this part, I will touch on the accuracy metrics of the trained models. We have a number of parameters (known as hyper-parameters of an estimator) which are passed in the model as arguments to perform the prediction. In practice, based on the accuracy metrics result of the trained model hyper-parameters are tweaked before the model is implemented for the prediction in production. Instead of tweaking the hyper-parameters manually with a trial and error approach for optimised accuracy score, it is possible for algorithms to search and recommend optimised hyper-parameters. I will discuss efficient parameter search strategies in later part of this series.

As mentioned in the earlier part of this series, it not pragmatic to train the model from scratch every time before prediction. I will also discuss on saving a trained model and importing it in another program directly for prediction.

Note: I will explain in detail the new areas and concepts, and will avoid repeating in details the parts explained in my earlier article. I will encourage you to please refer the earlier part for it.

Step 1

First, we will import the packages required for our model. StratifiedShuffleSplit import is required to build a training model with a sample set well represented from different subset value ranges. Pickle module will help us to save the trained model and then import the model in other programs directly for prediction. Finally, sklearn.metrics has sets of methods to measure the accuracy of any model.

import pandas as pd

import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit #import to have equal weigtage samples in training dataset
from sklearn.tree import DecisionTreeRegressor # import for Decision Tree Algorithm
import pickle
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR #import for support vector regressor
from sklearn.metrics import mean_squared_error  # import to calculate root mean square

Step 2

Read the sample dataset exported from ERP and other applications into a pandas DataFrame. Please refer to the earlier article to understand the structure of the dataset and other details.

SourceData=pd.read_excel("Supplier Past Performance.xlsx") # Load the data into Pandas DataFrame

Step 3

After the cursory analysis of the data sample, it seems that “PO Amount” has a close and strong influence on “Defect Per cent”, hence we would like to make sure that we train the model with “PO Amount” records from different ranges. If we have trained our model with datasets over-represented by “PO Amount” between 30,000 to 60, 000 GBP, then our model learning will not be accurate to a real-life scenario and may not predict accurately.

In the below code a new column “PO Category” is introduced to categorise “PO Amount” value from 0 to 30,000 GBP is classified as PO Category 1, from 30,000 to 60, 000 GBP as PO Category 2 and henceforth.

SourceData["PO Category"]=pd.cut(SourceData["PO Amount"],
                                     bins=[0., 30000, 60000, 90000,
np.inf],                                     
labels=[1, 2, 3, 4])

#scikit-learn #supply-chain #machine-learning #python #towards-data-science #deep learning

Machine Learning and Supply Chain Management: Hands-on Series
1.40 GEEK