9 C1. Instrumental Variables
9.1 About
Topics covered:
- The endogeneity problem and sources of endogeneity
- IV identification conditions: relevance and exogeneity (exclusion restriction)
- Two-Stage Least Squares (2SLS) estimator
- Weak instruments: F-test and consequences
- Sargan-Hansen J test for overidentification
9.2 Lecture Notes
9.3 Slides
Click on the slide and use the keyboard arrows to navigate.
Note: Slides and notes are pending translation to English.
9.4 Introduction
In the first part of the econometrics sequence the key assumption is \(\text{Cov}(x,u)=0\). For example, the conditional expectation function
\[\mathbb{E}[\log(\text{Wage})_i|X_i]=\beta_0 + \beta_1\,\text{Schooling}_i+\beta_2\,\text{Experience}_i\]
The endogeneity problem arises when \(\text{Cov}(x,u)\neq 0\).

Common sources: omitted variables, measurement error, simultaneity.
Omitted variable bias. True model: \(Y=X_1\beta_1+X_2\beta_2+u_T\), estimated model: \(Y=X_1\beta_1+u_E\). Then
\[\mathbb{E}(\hat{\beta}_{1}|X)\equiv\beta_1+\beta_2\frac{\text{Cov}(X,\text{Omitted})}{\text{Var}(X)}\]
Suppose a randomly assigned variable \(z\) has a causal effect on \(x\). This variable — the instrument — can serve as a natural experiment.
9.5 Conditions for an Instrument
- Relevance (\(z\rightarrow x\)): \(\text{Cor}(z,x)\neq 0\).
- Exclusion (\(z\rightarrow x \rightarrow y\), \(z\not\rightarrow y\)): \(\text{Cor}(z,y|x)=0\).
- Exogeneity (\(u\not\rightarrow z\)): \(\text{Cor}(z,u)=0\).

Examples from the literature:
| \(y\) | \(x\) | Unobservable | Instrument \(z\) |
|---|---|---|---|
| wage | schooling | ability | father’s education |
| wage | schooling | ability | distance to college |
| wage | schooling | ability | random military assignment |
| health | smoking | behavior | tobacco tax |
| armed conflict | GDP growth | simultaneity | rainfall |
9.6 Estimation
Exact Identification (\(k = r = 1\))
Using \(\text{Cov}(z,y)=\beta_1\text{Cov}(z,x)\):
\[\hat{\beta}_{\text{IV}}=\frac{\sum(z_i-\bar{z})(y_i-\bar{y})}{\sum(z_i-\bar{z})(x_i-\bar{x})}\]
As a method-of-moments estimator: \(\hat{\beta}=(z'x)^{-1}(z'y)\).
Consistency requires: (i) relevance — \(\text{plim}(z'x/N)\neq 0\); (ii) validity — \(\text{plim}(z'u/N)=0\).
2SLS (Two Stage Least Squares)
First stage — recover the exogenous variation in \(x_1\):
\[x_1=\gamma_1 z+x_2\gamma_2+\epsilon \quad\Rightarrow\quad \hat{x}_1\]
Second stage — estimate the structural equation:
\[y=\hat{x}_1\beta_1 + x_2\beta_2+u \quad\Rightarrow\quad \hat{\beta}_1\]
Notes: (1) Standard errors from the second stage must be corrected. (2) Check instrument strength via the first-stage \(F\)-statistic (rule of thumb: \(F > 10\)) or the Kleibergen-Paap Wald \(F\).
IV as GMM
The standard IV estimator (\(k\leq r\)):
\[\hat{\beta}_{\text{IV}}=\left(X'Z(Z'Z)^{-1}Z'X\right)^{-1}\left(X'Z(Z'Z)^{-1}Z'y\right)\]
Asymptotic variance: \(V(\beta)=(1/N)\left(X'Z\mathbb{S}^{-1}Z'X\right)^{-1}\).