With the number of Machine Learning algorithms constantly growing it is nice to have a reference point to brush up on some of the fundamental models, be it for an interview or just a quick refresher. I wanted to provide a resource of some of the most common models pros and cons and sample code implementations of each of these algorithms in Python.

Table of Contents

  1. Multiple Linear Regression
  2. Logistic Regression
  3. k-Nearest Neighbors (KNN)
  4. k-Means Clustering
  5. Decision Trees/Random Forest
  6. Support Vector Machine (SVM)
  7. Naive Bayes

1. Multiple Linear Regression

Pros

  • Easy to implement, theory is not complex, low computational power compared to other algorithms.
  • Easy to interpret coefficients for analysis.
  • Perfect for linearly separable datasets.
  • Susceptible to overfitting, but can avoid using dimensionality reduction techniques, cross-validation, and regularization methods.

Cons

  • Unlikely in the real world to have perfectly linearly separable datasets, model often **suffers from under-fitting **in real-word scenarios or is outperformed by other ML and Deep Learning algorithms.
  • Parametric, has a lot of assumptions that needs to be met for its data in regards to its distribution. Assumes a linear relationship between the dependent and independent variables.
  • Examples of assumptions: There is a linear relationship between the dependent variable and the independent variables. The independent variables aren’t too highly correlated with each other. Your observations for the dependent variable are selected independently and at random. Regression residuals are normally distributed.

#data-science #interview-preparation #machine-learning

Basic ML Models Pros & Cons & Code Demos
2.95 GEEK