# Regression Fantasies

Finding a model that fits a set of data is a common goal in data analysis. Even if there is no intention to mislead anyone, it does happen. Here are common reasons to doubt a regression model and how to diagnose the problems.

Finding a model that fits a set of data is one of the most common goals in data analysis. Least squares regression is the most commonly used tool for achieving this goal. It’s a relatively simple concept, it’s easy to do, and there’s a lot of readily available software to do the calculations. It’s even taught in many Statistics 101 courses. Everybody uses it … and therein lies the problem. Even if there is no intention to mislead anyone, it does happen.Here are eleven of the most common reasons to doubt a regression model and how to diagnose the problems.

### Not Enough Samples

Accuracy is a critical component for evaluating a model. The coefficient of determination, also known as R-squared, is the most often cited measure of accuracy. Now obviously, the more accurate a model is the better, so data analysts look for large values for R-squared.R-squared is designed to estimate the maximum relationship between the dependent and independent variables based on a set of samples (cases, observations, records, or whatever). If there aren’t enough samples compared to the number of independent variables in the model, the estimate of R-squared will be especially unstable. The effect is greatest when the R-squared value is small, the number of samples is small, and the number of independent variables is large, as shown in this figure.

