GMM API Reference¶

Module

Import: from panelbox.gmm import DifferenceGMM, SystemGMM, ContinuousUpdatedGMM, BiasCorrectedGMM Source: panelbox/gmm/

Overview¶

The GMM module implements dynamic panel data estimators using the Generalized Method of Moments:

Estimator	Description	Reference
`DifferenceGMM`	Arellano-Bond first-difference GMM	Arellano & Bond (1991)
`SystemGMM`	Blundell-Bond system GMM (difference + level)	Blundell & Bond (1998)
`ContinuousUpdatedGMM`	Continuously-Updated Estimator	Hansen, Heaton & Yaron (1996)
`BiasCorrectedGMM`	Analytical bias correction	Kiviet (1995)

All estimators support one-step and two-step estimation with Windmeijer (2005) finite-sample correction.

Classes¶

DifferenceGMM¶

Arellano-Bond (1991) Difference GMM estimator. Uses lagged levels as instruments for the first-differenced equation.

Constructor¶

DifferenceGMM(
    data: pd.DataFrame,
    dep_var: str,
    lags: int | list[int],
    id_var: str = "id",
    time_var: str = "year",
    exog_vars: list[str] | None = None,
    endogenous_vars: list[str] | None = None,
    predetermined_vars: list[str] | None = None,
    time_dummies: bool = True,
    collapse: bool = False,
    two_step: bool = True,
    robust: bool = True,
    gmm_type: str = "two_step",
    gmm_max_lag: int | None = None,
    iv_max_lag: int = 0,
)

Parameter	Type	Default	Description
`data`	`pd.DataFrame`	required	Panel data
`dep_var`	`str`	required	Dependent variable name
`lags`	`int \\| list[int]`	required	Lag order(s) for the dependent variable
`id_var`	`str`	`"id"`	Entity identifier column
`time_var`	`str`	`"year"`	Time period column
`exog_vars`	`list[str] \\| None`	`None`	Strictly exogenous regressors
`endogenous_vars`	`list[str] \\| None`	`None`	Endogenous regressors (instrumented like dep_var)
`predetermined_vars`	`list[str] \\| None`	`None`	Predetermined regressors (weakly exogenous)
`time_dummies`	`bool`	`True`	Include time dummy variables
`collapse`	`bool`	`False`	Collapse instrument matrix (reduce instrument count)
`two_step`	`bool`	`True`	Use two-step estimation with Windmeijer correction
`robust`	`bool`	`True`	Robust standard errors
`gmm_type`	`str`	`"two_step"`	GMM estimation type
`gmm_max_lag`	`int \\| None`	`None`	Maximum lag for GMM-style instruments
`iv_max_lag`	`int`	`0`	Maximum lag for IV-style instruments

Methods¶

`.fit()`¶

def fit(self) -> GMMResults

Estimate the model and return GMMResults.

Example¶

from panelbox import DifferenceGMM, load_abdata

data = load_abdata()
model = DifferenceGMM(
    data=data,
    dep_var="n",
    lags=[1, 2],
    exog_vars=["w", "k"],
    id_var="id",
    time_var="year",
    two_step=True,
    collapse=False,
    time_dummies=True,
)
results = model.fit()
print(results.summary())

SystemGMM¶

Blundell-Bond (1998) System GMM estimator. Extends Difference GMM by adding the level equation with lagged differences as instruments. Generally more efficient than Difference GMM, especially when the dependent variable is persistent.

Constructor¶

SystemGMM(
    data: pd.DataFrame,
    dep_var: str,
    lags: int | list[int],
    id_var: str = "id",
    time_var: str = "year",
    exog_vars: list[str] | None = None,
    endogenous_vars: list[str] | None = None,
    predetermined_vars: list[str] | None = None,
    time_dummies: bool = True,
    collapse: bool = False,
    two_step: bool = True,
    robust: bool = True,
    gmm_type: str = "two_step",
    level_instruments: dict | None = None,
    gmm_max_lag: int | None = None,
    iv_max_lag: int = 0,
)

All parameters are the same as DifferenceGMM, plus:

Parameter	Type	Default	Description
`level_instruments`	`dict \\| None`	`None`	Additional instruments for the level equation

Example¶

from panelbox import SystemGMM, load_abdata

data = load_abdata()
model = SystemGMM(
    data=data,
    dep_var="n",
    lags=[1],
    exog_vars=["w", "k"],
    id_var="id",
    time_var="year",
    two_step=True,
    collapse=True,
)
results = model.fit()
print(results.summary())

ContinuousUpdatedGMM¶

Continuously-Updated Estimator (CUE). The weight matrix is updated at each iteration, making the objective function:

Q(beta) = g(beta)' W(beta)^{-1} g(beta)

CUE is more robust to weak instruments and misspecification than two-step GMM.

Constructor¶

ContinuousUpdatedGMM(
    data: pd.DataFrame,
    dep_var: str,
    exog_vars: list[str],
    instruments: list[str],
    weighting: str = "hac",
    bandwidth: str | int = "auto",
    se_type: str = "analytical",
    n_bootstrap: int = 999,
    bootstrap_method: str = "residual",
    max_iter: int = 100,
    tol: float = 1e-6,
    regularize: bool = True,
)

Parameter	Type	Default	Description
`data`	`pd.DataFrame`	required	Panel data
`dep_var`	`str`	required	Dependent variable
`exog_vars`	`list[str]`	required	Exogenous regressors
`instruments`	`list[str]`	required	Instrumental variables
`weighting`	`str`	`"hac"`	Weight matrix type: `"hac"`, `"cluster"`, `"homoskedastic"`
`bandwidth`	`str \\| int`	`"auto"`	Bandwidth for HAC (auto uses Newey-West rule)
`se_type`	`str`	`"analytical"`	Standard error type: `"analytical"` or `"bootstrap"`
`n_bootstrap`	`int`	`999`	Number of bootstrap replications
`bootstrap_method`	`str`	`"residual"`	Bootstrap method
`max_iter`	`int`	`100`	Maximum CUE iterations
`tol`	`float`	`1e-6`	Convergence tolerance
`regularize`	`bool`	`True`	Add ridge regularization to singular weight matrices

Methods¶

`.fit()`¶

def fit(
    self,
    start_params: np.ndarray | None = None,
    method: str = "L-BFGS-B",
    verbose: bool = False,
) -> GMMResults

Parameter	Type	Default	Description
`start_params`	`np.ndarray \\| None`	`None`	Starting values (uses 2SLS if None)
`method`	`str`	`"L-BFGS-B"`	Optimization method
`verbose`	`bool`	`False`	Print convergence information

Example¶

from panelbox.gmm import ContinuousUpdatedGMM

cue = ContinuousUpdatedGMM(
    data=df,
    dep_var="y",
    exog_vars=["x1", "x2"],
    instruments=["z1", "z2", "z3"],
    weighting="hac",
    bandwidth="auto",
)
results = cue.fit()
print(results.summary())

BiasCorrectedGMM¶

Analytical bias correction for dynamic panel models. Corrects the small-T bias in GMM estimators using the approach of Kiviet (1995).

Constructor¶

BiasCorrectedGMM(
    data: pd.DataFrame,
    dep_var: str,
    lags: list[int],
    id_var: str = "id",
    time_var: str = "year",
    exog_vars: list[str] | None = None,
    bias_order: int = 1,
    min_n: int = 50,
    min_t: int = 10,
)

Parameter	Type	Default	Description
`data`	`pd.DataFrame`	required	Panel data
`dep_var`	`str`	required	Dependent variable
`lags`	`list[int]`	required	Lag orders
`id_var`	`str`	`"id"`	Entity column
`time_var`	`str`	`"year"`	Time column
`exog_vars`	`list[str] \\| None`	`None`	Exogenous regressors
`bias_order`	`int`	`1`	Order of bias correction (1 or 2)
`min_n`	`int`	`50`	Minimum number of entities
`min_t`	`int`	`10`	Minimum time periods

Methods¶

`.fit()`¶

def fit(
    self,
    time_dummies: bool = True,
    use_system_gmm: bool = False,
    verbose: bool = False,
) -> GMMResults

Parameter	Type	Default	Description
`time_dummies`	`bool`	`True`	Include time dummies
`use_system_gmm`	`bool`	`False`	Use System GMM as initial estimator
`verbose`	`bool`	`False`	Print progress information

Result Classes¶

GMMResults¶

Dataclass holding all GMM estimation output.

Key Attributes¶

Attribute	Type	Description
`params`	`pd.Series`	Estimated coefficients
`std_errors`	`pd.Series`	Standard errors
`tvalues`	`pd.Series`	t-statistics
`pvalues`	`pd.Series`	p-values
`nobs`	`int`	Number of observations
`n_groups`	`int`	Number of entities
`n_instruments`	`int`	Number of instruments
`n_params`	`int`	Number of estimated parameters
`vcov`	`np.ndarray`	Variance-covariance matrix
`two_step`	`bool`	Whether two-step was used
`windmeijer_corrected`	`bool`	Whether Windmeijer correction was applied
`model_type`	`str`	`"difference"` or `"system"`
`converged`	`bool`	Convergence status

Specification Tests¶

Attribute	Type	Description
`hansen_j`	`TestResult`	Hansen J overidentification test (robust)
`sargan`	`TestResult`	Sargan test (not robust to heteroskedasticity)
`ar1_test`	`TestResult`	AR(1) test in differenced residuals (should reject)
`ar2_test`	`TestResult`	AR(2) test in differenced residuals (should NOT reject)
`diff_hansen`	`TestResult \\| None`	Difference-in-Hansen for level instruments (System GMM)

Interpreting AR tests

AR(1) should be negative and significant (expected by construction). AR(2) should be insignificant (p > 0.10) -- rejection indicates misspecification.

Methods¶

`.summary()`¶

Print formatted estimation results with coefficient table and diagnostic tests.

TestResult¶

Named container for specification test results.

Attribute	Type	Description
`statistic`	`float`	Test statistic value
`pvalue`	`float`	p-value
`df`	`int`	Degrees of freedom

Diagnostic Classes¶

GMMDiagnostics¶

Comprehensive diagnostic analysis for GMM estimation results.

GMMDiagnostics(model, results)

Provides methods for analyzing instrument validity, overidentification, and model specification.

GMMOverfitDiagnostic¶

Detects instrument proliferation and overfitting in GMM models.

GMMOverfitDiagnostic(model, results: GMMResults)

Instrument proliferation

When n_instruments > n_groups, GMM estimates become unreliable. Use collapse=True or limit gmm_max_lag to reduce instrument count.

Practical Guidance¶

Choosing Between Estimators¶

Scenario	Recommended Estimator
Moderate persistence (rho < 0.8)	`DifferenceGMM`
High persistence (rho close to 1)	`SystemGMM`
Weak instrument concerns	`ContinuousUpdatedGMM`
Large T, small N	`BiasCorrectedGMM`

Instrument Count Rule of Thumb¶

Keep instruments <= number of entities:

# Reduce instruments with collapse
model = SystemGMM(
    data=data, dep_var="y", lags=[1],
    exog_vars=["x1"], collapse=True,  # Collapse instruments
    gmm_max_lag=3,                     # Limit lag depth
)

Complete Diagnostic Workflow¶

from panelbox import SystemGMM, load_abdata

data = load_abdata()
model = SystemGMM(
    data=data, dep_var="n", lags=[1],
    exog_vars=["w", "k"],
    id_var="id", time_var="year",
    two_step=True,
)
results = model.fit()

# Check specification tests
print(f"Hansen J: stat={results.hansen_j.statistic:.3f}, p={results.hansen_j.pvalue:.3f}")
print(f"AR(1):    stat={results.ar1_test.statistic:.3f}, p={results.ar1_test.pvalue:.3f}")
print(f"AR(2):    stat={results.ar2_test.statistic:.3f}, p={results.ar2_test.pvalue:.3f}")
print(f"Instruments: {results.n_instruments}, Groups: {results.n_groups}")

# Instruments should be <= groups
assert results.n_instruments <= results.n_groups, "Too many instruments!"

References¶

Arellano, M. & Bond, S. (1991). "Some Tests of Specification for Panel Data." Review of Economic Studies, 58(2), 277-297.
Blundell, R. & Bond, S. (1998). "Initial Conditions and Moment Restrictions in Dynamic Panel Data Models." Journal of Econometrics, 87(1), 115-143.
Hansen, L., Heaton, J. & Yaron, A. (1996). "Finite-Sample Properties of Some Alternative GMM Estimators." Journal of Business & Economic Statistics, 14(3), 262-280.
Kiviet, J. (1995). "On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel Data Models." Journal of Econometrics, 68(1), 53-78.
Roodman, D. (2009). "How to do xtabond2." Stata Journal, 9(1), 86-136.
Windmeijer, F. (2005). "A Finite Sample Correction for the Variance of Linear Efficient Two-Step GMM Estimators." Journal of Econometrics, 126(1), 25-51.

GMM API Reference¶

Overview¶

Classes¶

DifferenceGMM¶

Constructor¶

Methods¶

.fit()¶

Example¶

SystemGMM¶

Constructor¶

Example¶

ContinuousUpdatedGMM¶

Constructor¶

Methods¶

.fit()¶

Example¶

BiasCorrectedGMM¶

Constructor¶

Methods¶

.fit()¶

Result Classes¶

GMMResults¶

Key Attributes¶

Specification Tests¶

Methods¶

.summary()¶

TestResult¶

Diagnostic Classes¶

GMMDiagnostics¶

GMMOverfitDiagnostic¶

Practical Guidance¶

Choosing Between Estimators¶

Instrument Count Rule of Thumb¶

Complete Diagnostic Workflow¶

References¶

See Also¶

`.fit()`¶

`.fit()`¶

`.fit()`¶

`.summary()`¶