As an undergraduate I studied economics, which meant I studied a lot of regressions. It was basically 90% of the curriculum (when we’re not discussing supply and demand curves, of course). The effect of corruption on sumo wrestling?_ Regression. _Effect of minimum wage changes on a Wendy’s in NJ? _Regression. _Or maybe The Zombie Lawyer Apocalypse is more your speed (O.K., not a regression, but the title was cool).
Either way, my undergrad taught me three things: 1) supply-and-demand, 2) regressions are_ life_, and 3) economists think they are gosh darn hilarious.
**But what if your regression fails you? **What if it isn’t predicting the thing it’s supposed to predict, because your X is all tied up with things you don’t have data for?
Well that, my friends, is when you might want to contemplate using an IV.
_An instrumental variable is a third variable, Z, used in regression analysis when you have endogenous variables — variables that are influenced by other variables in the model. In other words, you use it to account for unexpected behavior between variables. Using an instrumental variable to identify the hidden (unobserved) correlation allows you to see the true correlation between the explanatory variable and response variable, Y. — _Statistics How To
Let’s break down some of this into pieces we can understand.
Let’s say you have two variables that you think are correlated, education and wages (X and Y). You would like to investigate if education _leads _to higher wages, i.e. X → Y. It makes sense enough. You write** _y = α + βx + _ε**, and, content with yourself, spend the rest of the night binging Game of Thrones.
Wait. Slow down. First let’s clarify some things.
Now that we’ve translated **_y = x _**to **_y = α + βx + _ε. **The problem now has to do with the theory of if it’s X truly leading to Y. Education leads to wages and that makes sense; but what if people who strive for higher education will also earn higher wages because they are a more energetic, ambitious, and driven subset of the population?
#programming #data-science #regression #econometrics #rstudio #data analysis