15 C2. Generalized Method of Moments
15.1 About
Topics covered:
- Population moment conditions and sample analogues
- The GMM estimator and optimal weighting matrix
- Asymptotic distribution of GMM
- Two-step GMM and efficient GMM
- J test for overidentification
- OLS and IV as special cases of GMM
15.2 Lecture Notes
15.3 Slides
Click on the slide and use the keyboard arrows to navigate.
Note: Slides and notes are pending translation to English.
15.4 Introduction
MLE selects \(\hat{\theta}\) by maximizing the likelihood — it requires the full probability distribution (pdf). GMM (Hansen 1982) is an alternative that only requires specification of certain moments, not the full pdf.
15.5 Method of Moments (MM)
Example. Let \(y_i \sim t(\nu)\). For \(\nu>2\), \(\mathbb{E}(y^2)=\nu/(\nu-2)\). A consistent estimator uses \(\hat{\mu}_2=(1/N)\sum_i y_i^2 \rightarrow_p \mathbb{E}(y^2)\), giving
\[\hat{\nu}_{\text{MM}} = \frac{2\hat{\mu}_2}{\hat{\mu}_2 - 1} \qquad (\text{for }\hat{\mu}_2 > 1)\]
General idea. Given unknown \(\theta_{k\times 1}\), suppose we can compute \(k\) population moments as a function of \(\theta\):
\[\mathbb{E}(y_i^j)=\mu_j(\theta) \quad \text{for } j=j_1,\ldots,j_k\]
The MM estimator \(\hat{\theta}_N\) equates population moments to sample moments: \(\mu_j(\hat{\theta}_N)=(1/N)\sum y_i^j\).
15.6 Generalized Method of Moments (GMM)
With more moment conditions than parameters (\(r > k\)), we cannot set all conditions exactly to zero. Instead, we minimize a weighted quadratic criterion:
\[Q(\theta;X_N)=\left[g(\theta,X_N)\right]' W_N \left[g(\theta,X_N)\right]\]
where \(g(\theta,X_N)=\frac{1}{N}\sum_i h(\theta;w_i)\) and the true parameters satisfy \(\mathbb{E}\{h(\theta_0,w_i)\}=0\).
Hansen’s Formulation
- \(w_i\): \(h\times 1\) vector of observed variables
- \(\theta\): \(k\times 1\) vector of unknown parameters
- \(h(\theta,w_i)\): \(r\times 1\) vector of functions (moment conditions)
- \(\theta_0\) characterized by \(\mathbb{E}\{h(\theta_0,w_i)\}=0\) (orthogonality conditions)
The GMM estimator \(\hat{\theta}_N\) minimizes \(Q(\theta;X_N)=[g(\theta,X_N)]'W_N[g(\theta,X_N)]\).
Identification
- Exact identification (\(k=r\)): set \(g(\hat{\theta}_N;X_N)=0\) and solve \(r\) equations for \(k\) unknowns.
- Over-identification (\(r>k\)): more conditions than parameters; use \(W_N\) to weight them.
Optimal Weighting Matrix
The optimal \(W_N = \mathbb{S}^{-1}\), where
\[\mathbb{S}=\lim_{N\rightarrow\infty}N\,\mathbb{E}\{[g(\theta_0;X_N)][g(\theta_0;X_N)]'\}\]
Practical approach (iterative):
- Start with \(W^{(0)}=I\).
- Obtain \(\hat{\theta}^{(0)}\) by minimizing \(Q\) with \(W^{(0)}\).
- Estimate \(\hat{\mathbb{S}}^{(0)}\) using \(\hat{\theta}^{(0)}\).
- Update \(W^{(1)}=(\hat{\mathbb{S}}^{(0)})^{-1}\) and re-estimate.
- Repeat until convergence.
Asymptotic Distribution
Under regularity conditions (consistency, CLT for \(g(\theta_0,X_N)\), differentiability):
\[\sqrt{N}(\hat{\theta}_N - \theta_0) \xrightarrow{d} \mathcal{N}(0,V)\]
with \(V=(D\mathbb{S}^{-1}D')^{-1}\), where \(D'=\operatorname{plim}(\partial g/\partial\theta)|_{\hat{\theta}}\).
Sargan–Hansen \(J\)-test (Over-identification)
Hansen (1982) proposes a test for whether all \(r\) moment conditions hold simultaneously:
\[J_N = N\left[g(\hat{\theta}_N;X_N)\right]'\hat{\mathbb{S}}_N^{-1}\left[g(\hat{\theta}_N;X_N)\right] \xrightarrow{d} \chi^2_{(r-k)}\]
Rejecting \(H_0\) implies the GMM estimator is inconsistent for \(\theta_0\).
Special Cases of GMM
Many standard estimators are special cases of GMM:
- OLS, IV, 2SLS
- Nonlinear simultaneous equations estimators
- Many cases of MLE