You’ve cleaned up your data and done some exploratory data analysis. Now what? As data analysts we have a lot of tools in our toolkit, but just like a screwdriver _might _be used to hammer in a nail, it isn’t the best tool for the job. Our tools are models, or if you prefer the mathematical term, algorithms. They allow us to make sense of the data we have collected and to make predictions.

There are three basic types of models, depending on the type of data. For continuous numerical data we have a variety of regression techniques. These are our screwdrivers and wrenches. Fairly simple to understand and use, they bring data together to fit them to some sort of line or multidimensional plane. For categorical or discrete data, we have clustering and classification models. These are our saws and knives. They separate the data into different pieces of like versus unlike. With so many choices, it may be difficult to know which tool to use under which circumstance. So, let’s look at each in turn.

Numerical regression models seek to find the best line to fit continuous numerical data. They can be linear, in which the dependent variable (usually called y) is fit to one or more independent variables using some type of polynomial function. Nonlinear regression is used to fit one or more independent variables to a logarithmic, exponential, or sigmoid function.

Linear regressions include:

1)Single Linear Regression: one independent variable fit to a basic line:

  • y = mx + b, where m is the slope of the line and b is the value of y at x=0

**2) Multiple Linear Regression: **2 or more independent variables fit to a line of order 1:

  • y = mx + nz + c, where m and n are the slopes of the line in the x and z planes, and c is the value of y at x=z=0

#data #toolkit #data-science #machine-learning #analytics #data science

The Data Analysts’ Toolkit: Models
1.05 GEEK