System GMM (Blundell-Bond)¶

Quick Reference

Class: panelbox.gmm.SystemGMM Import: from panelbox.gmm import SystemGMM Stata equivalent: xtabond2 y L.y x1 x2, gmm(y, lag(2 .)) iv(x1 x2) R equivalent: pgmm(y ~ lag(y, 1) + x1 + x2 | lag(y, 2:99), transformation = "ld")

Overview¶

System GMM, proposed by Blundell and Bond (1998), extends Difference GMM by combining two sets of equations in a stacked system: the first-differenced equations (with lagged levels as instruments) and the level equations (with lagged differences as instruments). This additional set of moment conditions addresses the weak instruments problem that affects Difference GMM when the dependent variable is highly persistent.

When the autoregressive coefficient \(\gamma\) approaches 1, lagged levels become poor predictors of first-differences, leading to large standard errors and imprecise estimates in Difference GMM. System GMM exploits the additional moment condition \(E[\Delta y_{i,t-1} \cdot (\alpha_i + \varepsilon_{it})] = 0\) to provide stronger instruments for the level equation, typically reducing standard errors by 20-50%.

The trade-off is an additional assumption: the stationarity of initial conditions, requiring that the initial deviations from steady state are uncorrelated with the fixed effects. This assumption is plausible when the panel data comes from an ongoing process observed well after its start.

Quick Example¶

from panelbox.gmm import SystemGMM
from panelbox.datasets import load_abdata

data = load_abdata()

model = SystemGMM(
    data=data,
    dep_var="n",
    lags=1,
    id_var="id",
    time_var="year",
    exog_vars=["w", "k"],
    collapse=True,
    two_step=True,
    robust=True,
    level_instruments={"max_lags": 1},
)
results = model.fit()
print(results.summary())

When to Use¶

Highly persistent series: AR coefficient \(\gamma > 0.8\) where Difference GMM instruments are weak
Small T, large N: Short panels where efficiency gains matter
Stationarity is plausible: The panel does not start at a special event (e.g., firm entry, policy change)
Difference GMM has large SEs: Standard errors from Difference GMM are much larger than expected

Key Assumptions

All Difference GMM assumptions, plus:

Stationarity of initial conditions: \(E[\Delta y_{i,1} \cdot \alpha_i] = 0\)
This requires the process generating \(y_{it}\) started long before the first observation
Violated when the panel begins at firm entry, policy implementation, or other event times

Detailed Guide¶

The Weak Instruments Problem¶

When \(y_{it}\) is highly persistent (\(\gamma\) close to 1):

\(\Delta y_{it} \approx 0\) (differences are small)
Lagged levels \(y_{i,t-2}\) are poor predictors of \(\Delta y_{i,t-1}\)
Instruments are weak, leading to large standard errors and biased estimates

The System GMM Solution¶

System GMM stacks two sets of equations:

1. Difference equations (same as Arellano-Bond):

\[\Delta y_{it} = \gamma \Delta y_{i,t-1} + \Delta X_{it}'\beta + \Delta \varepsilon_{it}\]

Instruments: lagged levels \(y_{i,t-2}, y_{i,t-3}, \ldots\)

2. Level equations (additional):

\[y_{it} = \gamma y_{i,t-1} + X_{it}'\beta + \alpha_i + \varepsilon_{it}\]

Instruments: lagged differences \(\Delta y_{i,t-1}\)

The additional moment condition for the level equation is:

\[E[\Delta y_{i,t-1} \cdot (\alpha_i + \varepsilon_{it})] = 0\]

The Bounds Rule¶

A useful validation check: the true coefficient should satisfy:

\[\hat{\gamma}_{FE} < \gamma_{true} < \hat{\gamma}_{OLS}\]

OLS overestimates \(\gamma\) (omitted variable bias from \(\alpha_i\))
FE underestimates \(\gamma\) (Nickell bias)
A valid GMM estimate should fall between these bounds

# Validate with bounds check
from panelbox.gmm import GMMOverfitDiagnostic

diag = GMMOverfitDiagnostic(model, results)
bounds = diag.coefficient_bounds_test()
print(f"OLS (upper): {bounds['ols_coef']:.4f}")
print(f"GMM:         {bounds['gmm_coef']:.4f}")
print(f"FE (lower):  {bounds['fe_coef']:.4f}")
print(f"Within bounds: {bounds['within_bounds']}")

Data Preparation¶

System GMM accepts the same data format as Difference GMM. No special preparation is needed beyond ensuring correct panel structure.

import pandas as pd
from panelbox.datasets import load_abdata

data = load_abdata()
print(f"N = {data['id'].nunique()}, T = {data['year'].nunique()}")

Estimation¶

from panelbox.gmm import SystemGMM

model = SystemGMM(
    data=data,
    dep_var="n",
    lags=1,
    id_var="id",
    time_var="year",
    exog_vars=["w", "k"],
    collapse=True,
    two_step=True,
    robust=True,
    level_instruments={"max_lags": 1},  # Use only first lag of differences
)
results = model.fit()

Comparing Difference vs System GMM¶

from panelbox.gmm import DifferenceGMM, SystemGMM

# Estimate both
diff_model = DifferenceGMM(
    data=data, dep_var="n", lags=1, id_var="id", time_var="year",
    exog_vars=["w", "k"], collapse=True, two_step=True, robust=True,
)
diff_results = diff_model.fit()

sys_model = SystemGMM(
    data=data, dep_var="n", lags=1, id_var="id", time_var="year",
    exog_vars=["w", "k"], collapse=True, two_step=True, robust=True,
    level_instruments={"max_lags": 1},
)
sys_results = sys_model.fit()

# Compare
coef = "L1.n"
diff_se = diff_results.std_errors[coef]
sys_se = sys_results.std_errors[coef]
efficiency_gain = (diff_se - sys_se) / diff_se * 100

print(f"Difference GMM: {diff_results.params[coef]:.4f} (SE: {diff_se:.4f})")
print(f"System GMM:     {sys_results.params[coef]:.4f} (SE: {sys_se:.4f})")
print(f"Efficiency gain: {efficiency_gain:.1f}% SE reduction")

Interpreting Results¶

System GMM results include the same diagnostics as Difference GMM, plus the Difference-in-Hansen test for the validity of level instruments.

# Standard diagnostics
print(f"AR(2) p-value: {results.ar2_test.pvalue:.4f}")
print(f"Hansen J p-value: {results.hansen_j.pvalue:.4f}")
print(f"Instruments: {results.n_instruments}, Ratio: {results.instrument_ratio:.3f}")

# System GMM-specific: Difference-in-Hansen test
if results.diff_hansen is not None:
    print(f"Diff-in-Hansen p-value: {results.diff_hansen.pvalue:.4f}")
    if results.diff_hansen.pvalue > 0.10:
        print("Level instruments appear valid")
    else:
        print("Level instruments rejected -- use Difference GMM instead")

Configuration Options¶

System GMM inherits all parameters from Difference GMM, plus:

Parameter	Type	Default	Description
`level_instruments`	`dict`	`{"max_lags": 1}`	Configuration for level equation instruments

The level_instruments dictionary controls the depth of lagged differences used as instruments for the level equation:

{"max_lags": 1} -- Use only \(\Delta y_{i,t-1}\) (most conservative, recommended)
{"max_lags": 2} -- Use \(\Delta y_{i,t-1}\) and \(\Delta y_{i,t-2}\)
Deeper lags rarely improve efficiency

Diagnostics¶

Decision Flowchart¶

Both ValidOnly Difference ValidOnly System ValidNeither Valid

If both Difference and System GMM pass diagnostics, prefer System GMM when it has substantially smaller standard errors (\(> 10\%\) reduction).

If System GMM fails the Difference-in-Hansen test (p < 0.10), the stationarity assumption is violated. Use Difference GMM.

If Difference GMM has very large SEs but System GMM diagnostics pass, the series may be too persistent for Difference GMM. Use System GMM.

Revisit the model specification. Consider adding lags, changing exogeneity assumptions, or reducing instruments.

System GMM Diagnostic Checklist¶

Test	Criterion	Action if Failed
AR(2)	p > 0.10	Add more lags, check specification
Hansen J	0.10 < p < 0.25	If too high: reduce instruments
Diff-in-Hansen	p > 0.10	If rejected: use Difference GMM
Instrument ratio	< 1.0	Use `collapse=True`
Coefficient bounds	FE < GMM < OLS	Check for overfitting

For comprehensive diagnostic guidance, see GMM Diagnostics.

Tutorials¶

Tutorial	Description	Link
Complete GMM Guide	Step-by-step applied tutorial	Complete Guide
Instrument Selection	Managing instruments in System GMM	Instruments

References¶

Blundell, R., & Bond, S. (1998). "Initial Conditions and Moment Restrictions in Dynamic Panel Data Models." Journal of Econometrics, 87(1), 115-143.
Arellano, M., & Bond, S. (1991). "Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations." Review of Economic Studies, 58(2), 277-297.
Roodman, D. (2009). "How to do xtabond2: An Introduction to Difference and System GMM in Stata." The Stata Journal, 9(1), 86-136.
Bond, S. R., Hoeffler, A., & Temple, J. (2001). "GMM Estimation of Empirical Growth Models." Economics Papers 2001-W21, Nuffield College, University of Oxford.
Windmeijer, F. (2005). "A Finite Sample Correction for the Variance of Linear Efficient Two-Step GMM Estimators." Journal of Econometrics, 126(1), 25-51.