15 C2. Generalized Method of Moments

15.1 About

Topics covered:

Population moment conditions and sample analogues
The GMM estimator and optimal weighting matrix
Asymptotic distribution of GMM
Two-step GMM and efficient GMM
J test for overidentification
OLS and IV as special cases of GMM

15.2 Lecture Notes

Download lecture notes (PDF)

15.3 Slides

Click on the slide and use the keyboard arrows to navigate.

Links: View slides (full screen) · Download slides (PDF)

Note: Slides and notes are pending translation to English.

15.4 Introduction

MLE selects \(\hat{\theta}\) by maximizing the likelihood — it requires the full probability distribution (pdf). GMM (Hansen 1982) is an alternative that only requires specification of certain moments, not the full pdf.

15.5 Method of Moments (MM)

Example. Let \(y_i \sim t(\nu)\). For \(\nu>2\), \(\mathbb{E}(y^2)=\nu/(\nu-2)\). A consistent estimator uses \(\hat{\mu}_2=(1/N)\sum_i y_i^2 \rightarrow_p \mathbb{E}(y^2)\), giving

\[\hat{\nu}_{\text{MM}} = \frac{2\hat{\mu}_2}{\hat{\mu}_2 - 1} \qquad (\text{for }\hat{\mu}_2 > 1)\]

General idea. Given unknown \(\theta_{k\times 1}\), suppose we can compute \(k\) population moments as a function of \(\theta\):

\[\mathbb{E}(y_i^j)=\mu_j(\theta) \quad \text{for } j=j_1,\ldots,j_k\]

The MM estimator \(\hat{\theta}_N\) equates population moments to sample moments: \(\mu_j(\hat{\theta}_N)=(1/N)\sum y_i^j\).

15.6 Generalized Method of Moments (GMM)

With more moment conditions than parameters (\(r > k\)), we cannot set all conditions exactly to zero. Instead, we minimize a weighted quadratic criterion:

\[Q(\theta;X_N)=\left[g(\theta,X_N)\right]' W_N \left[g(\theta,X_N)\right]\]

where \(g(\theta,X_N)=\frac{1}{N}\sum_i h(\theta;w_i)\) and the true parameters satisfy \(\mathbb{E}\{h(\theta_0,w_i)\}=0\).

Hansen’s Formulation

\(w_i\): \(h\times 1\) vector of observed variables
\(\theta\): \(k\times 1\) vector of unknown parameters
\(h(\theta,w_i)\): \(r\times 1\) vector of functions (moment conditions)
\(\theta_0\) characterized by \(\mathbb{E}\{h(\theta_0,w_i)\}=0\) (orthogonality conditions)

The GMM estimator \(\hat{\theta}_N\) minimizes \(Q(\theta;X_N)=[g(\theta,X_N)]'W_N[g(\theta,X_N)]\).

Identification

Exact identification (\(k=r\)): set \(g(\hat{\theta}_N;X_N)=0\) and solve \(r\) equations for \(k\) unknowns.
Over-identification (\(r>k\)): more conditions than parameters; use \(W_N\) to weight them.

Optimal Weighting Matrix

The optimal \(W_N = \mathbb{S}^{-1}\), where

\[\mathbb{S}=\lim_{N\rightarrow\infty}N\,\mathbb{E}\{[g(\theta_0;X_N)][g(\theta_0;X_N)]'\}\]

Practical approach (iterative):

Start with \(W^{(0)}=I\).
Obtain \(\hat{\theta}^{(0)}\) by minimizing \(Q\) with \(W^{(0)}\).
Estimate \(\hat{\mathbb{S}}^{(0)}\) using \(\hat{\theta}^{(0)}\).
Update \(W^{(1)}=(\hat{\mathbb{S}}^{(0)})^{-1}\) and re-estimate.
Repeat until convergence.

Asymptotic Distribution

Under regularity conditions (consistency, CLT for \(g(\theta_0,X_N)\), differentiability):

\[\sqrt{N}(\hat{\theta}_N - \theta_0) \xrightarrow{d} \mathcal{N}(0,V)\]

with \(V=(D\mathbb{S}^{-1}D')^{-1}\), where \(D'=\operatorname{plim}(\partial g/\partial\theta)|_{\hat{\theta}}\).

Sargan–Hansen \(J\)-test (Over-identification)

Hansen (1982) proposes a test for whether all \(r\) moment conditions hold simultaneously:

\[J_N = N\left[g(\hat{\theta}_N;X_N)\right]'\hat{\mathbb{S}}_N^{-1}\left[g(\hat{\theta}_N;X_N)\right] \xrightarrow{d} \chi^2_{(r-k)}\]

Rejecting \(H_0\) implies the GMM estimator is inconsistent for \(\theta_0\).

Special Cases of GMM

Many standard estimators are special cases of GMM:

OLS, IV, 2SLS
Nonlinear simultaneous equations estimators
Many cases of MLE