18 D3. Panel Data Theory
Detailed notes on panel data theory (Fixed Effects, Random Effects, GMM-based dynamic panels) will be added here. See the Overview page for slides and a summary of topics covered.
18.1 About
Topics covered:
- One-way error component model: \(y_{it} = x_{it}'\beta + \alpha_i + u_{it}\)
- Fixed Effects (FE / within) estimator: demeaning to eliminate \(\alpha_i\)
- Random Effects (RE / GLS) estimator: feasible GLS
- Hausman test: FE vs. RE
- First-difference estimator
- Dynamic panels: Arellano-Bond GMM
18.2 Lecture Notes
18.3 The One-Way Error Component Model
The standard panel data model for \(i = 1,\ldots,N\) individuals and \(t = 1,\ldots,T\) periods:
\[y_{it} = x_{it}'\beta + \alpha_i + u_{it}\]
where: - \(\alpha_i\) is an individual fixed effect — time-invariant unobserved heterogeneity - \(u_{it}\) is an idiosyncratic error: \(E[u_{it}\mid x_{i},\alpha_i] = 0\) - \(x_{it}\) is a \(k \times 1\) vector of covariates
The key question is whether \(\alpha_i\) is correlated with \(x_{it}\) (Fixed Effects) or not (Random Effects).
18.4 Fixed Effects (Within) Estimator
Assumption: \(E[\alpha_i \mid x_{i1},\ldots,x_{iT}] \neq 0\) in general — \(\alpha_i\) may be correlated with \(x_{it}\).
Strategy: Demean to eliminate \(\alpha_i\). Define \(\ddot{y}_{it} = y_{it} - \bar{y}_i\), \(\ddot{x}_{it} = x_{it} - \bar{x}_i\). Then:
\[\ddot{y}_{it} = \ddot{x}_{it}'\beta + \ddot{u}_{it}\]
The FE (within) estimator is OLS on the demeaned data: \[\hat\beta_{FE} = \left(\sum_i \sum_t \ddot{x}_{it}\ddot{x}_{it}'\right)^{-1}\sum_i\sum_t \ddot{x}_{it}\ddot{y}_{it}\]
Note: Time-invariant variables are collinear with \(\alpha_i\) and are not identified under FE.
18.5 Random Effects (GLS) Estimator
Assumption: \(E[\alpha_i \mid x_{i1},\ldots,x_{iT}] = 0\) — \(\alpha_i\) is uncorrelated with all regressors.
Write \(v_{it} = \alpha_i + u_{it}\). The composite error has a structured covariance: \[\text{Cov}(v_{it},v_{is}) = \sigma_\alpha^2 + \sigma_u^2\,\mathbf{1}[t=s]\]
The RE estimator is GLS applied to this structure. Feasible GLS estimates \(\sigma_\alpha^2\) and \(\sigma_u^2\) first.
Advantage over FE: Time-invariant regressors are identified; more efficient when RE assumption holds.
18.6 Hausman Test
The Hausman (1978) test checks \(H_0\): RE is consistent (i.e., \(\text{Cov}(\alpha_i, x_{it})=0\)).
\[H = (\hat\beta_{FE}-\hat\beta_{RE})'\left[\widehat{\text{Var}}(\hat\beta_{FE})-\widehat{\text{Var}}(\hat\beta_{RE})\right]^{-1}(\hat\beta_{FE}-\hat\beta_{RE}) \xrightarrow{d} \chi^2_k\]
Under \(H_0\), both FE and RE are consistent but RE is efficient. Under \(H_1\), only FE is consistent.
18.7 First Difference Estimator
An alternative to within-demeaning: subtract \(t-1\) from \(t\): \[\Delta y_{it} = \Delta x_{it}'\beta + \Delta u_{it}\]
Eliminates \(\alpha_i\) (same as FE when \(T=2\)). May be preferred when \(u_{it}\) is serially correlated.
18.8 Dynamic Panel Models
When \(y_{i,t-1}\) is included as a regressor, FE is inconsistent (Nickell bias, \(O(1/T)\)). The Arellano-Bond (1991) GMM estimator uses lagged levels as instruments for the differenced equation.
18.9 References
Cameron y Trivedi (2005), chapters 21–22. Hansen (2022), chapter 17.