Slope Heterogeneity in Panel Data¶

Summary

If slope coefficients vary across entities, Fixed Effects and Random Effects are inconsistent for the average effect. This guide helps you decide when to use the Mean Group (MG) or Pooled Mean Group (PMG) estimator, and how to interpret the results.

When Do Slopes Vary?¶

Slope heterogeneity is common in:

Cross-country regressions: different institutional structures, development levels
Firm-level studies: industry-specific production technologies
Household panels: heterogeneous preferences and constraints
Long-T panels: with \(T > 20\), entity-specific regressions become feasible and slope differences become estimable

Rule of thumb: If you have a large, heterogeneous sample and \(T > K\) (time periods exceed the number of regressors per entity), slope heterogeneity is worth investigating.

Decision Tree¶

Is T > K for most entities?
│
├── NO → Cannot estimate entity-specific slopes
│        → Use FE/RE with robust standard errors
│
└── YES → Run the Swamy test for slope homogeneity
          │
          ├── FAIL TO REJECT H₀ → Slopes are homogeneous
          │   → FE/RE are valid and more efficient
          │   → MG is consistent but less efficient
          │
          └── REJECT H₀ → Slopes are heterogeneous
              │
              ├── Is there a long-run equilibrium relationship?
              │   │
              │   ├── YES → Estimate both MG and PMG
              │   │         Run Hausman test (MG vs PMG)
              │   │         │
              │   │         ├── Fail to reject → Use PMG (more efficient)
              │   │         └── Reject → Use MG (robust to heterogeneity)
              │   │
              │   └── NO → Use MG
              │
              └── Use MG estimator

Quick Start¶

Mean Group Estimator¶

from panelbox.models.static import MeanGroupEstimator

model = MeanGroupEstimator(
    formula="y ~ x1 + x2",
    data=panel_df,
    entity_col="country",
    time_col="year",
    min_obs_per_entity=10,
)
result = model.fit()

# Average coefficients across entities
result.summary()

# Swamy test for slope homogeneity
print(result.swamy_test_result)

# Entity-specific coefficients
result.coefficient_table()

# Distribution of a single coefficient
result.plot_coefficient_distribution("x1")

Pooled Mean Group Estimator¶

from panelbox.models.static import PooledMeanGroupEstimator

pmg = PooledMeanGroupEstimator(
    formula="y ~ x1 + x2",
    data=panel_df,
    entity_col="country",
    time_col="year",
    lags=1,
)
pmg_result = pmg.fit()

# Long-run coefficients (homogeneous)
pmg_result.summary()

Hausman Test: MG vs PMG¶

from panelbox.models.static import hausman_mg_pmg

mg_result = MeanGroupEstimator(...).fit()
pmg_result = PooledMeanGroupEstimator(...).fit()

test = hausman_mg_pmg(mg_result, pmg_result)
print(f"Hausman stat: {test['statistic']:.3f}, p-value: {test['p_value']:.4f}")

Interpreting Results¶

MG Output¶

The MG estimator reports:

Output	Meaning
Coefficients	Average of entity-specific OLS coefficients
Std. Errors	Based on cross-entity dispersion of \(\hat{\beta}_i\)
Swamy test	Tests \(H_0\): all \(\beta_i\) are equal
N entities used	Entities meeting the `min_obs_per_entity` threshold
Entities excluded	Dropped due to insufficient observations or collinearity

What to Look For¶

Large standard errors relative to FE: Expected — MG is less efficient under homogeneity. If SEs are similar, heterogeneity was already inflating FE standard errors.
Swamy test rejects: Confirms slope heterogeneity. Check the coefficient table to see which variables have heterogeneous effects.
Excluded entities: Entities with \(T_i < K+1\) cannot be estimated. If many are excluded, consider reducing the number of regressors or using FE instead.
Coefficient distribution: Use plot_coefficient_distribution() to visualize how a coefficient varies across entities. A tight distribution suggests homogeneity for that variable; a wide or multimodal distribution suggests heterogeneity.

PMG-Specific Output¶

Output	Meaning
Long-run coefficients (\(\theta\))	Homogeneous across entities
Error-correction speed (\(\phi_i\))	Entity-specific; should be negative
Short-run coefficients (\(\delta_i\))	Entity-specific dynamics
Convergence	Whether the optimizer converged

Red flags for PMG:

\(\phi_i > 0\) for many entities: model may be misspecified
Non-convergence: try different starting values or increase max_iter
Hausman test rejects: long-run homogeneity assumption is invalid

Comparison Table¶

Feature	FE	RE	MG	PMG
Slope homogeneity	Required	Required	Not required	Long-run only
Correlated effects	Allowed	Not allowed	Allowed	Allowed
Min \(T\) per entity	2	2	\(> K\)	\(> K + \text{lags}\)
Efficiency	High	Highest	Lower	Medium
Long-run estimation	No	No	No	Yes
Dynamic model	No	No	No	Yes (ECM)

Common Pitfalls¶

1. Small T Bias¶

Entity-specific OLS with small \(T\) is noisy. The MG average is unbiased but the variance is large. Ensure min_obs_per_entity is set high enough (default: 10).

2. Unbalanced Panels¶

Entities with different \(T_i\) contribute equally to the MG average. If some entities have very few observations, their imprecise estimates inflate the MG variance. Consider using entity-level weights proportional to \(T_i\).

3. Over-Rejecting Homogeneity¶

The Swamy test has high power with large \(N\). Even tiny, economically irrelevant heterogeneity will be detected. Always examine the magnitude of coefficient variation using coefficient_table(), not just the p-value.

4. PMG Starting Values¶

PMG uses numerical optimization. Poor starting values can lead to local optima. If results are sensitive to initialization, the long-run relationship may be weak.

FAQ¶

Q: Can I use MG with time fixed effects?

A: The standard MG runs separate regressions per entity, so time effects would need to be included as regressors (e.g., time dummies) in the formula. This is different from the demeaning approach used in two-way FE.

Q: How many entities do I need?

A: MG requires \(N \to \infty\) for its asymptotic properties. In practice, \(N \geq 20\) is a rough minimum, with \(N \geq 30\) preferred. With very few entities, the cross-entity variance estimate is unreliable.

Q: What if some entities have collinear regressors?

A: PanelBox automatically excludes entities where the regressor matrix is rank-deficient. Check result.entities_excluded to see which entities were dropped and why.

Q: Should I demean the data before MG estimation?

A: No. Each entity-specific OLS includes an intercept, which absorbs the entity mean. Demeaning beforehand is unnecessary and would remove the entity-specific intercepts.

Q: When should I prefer PMG over MG?

A: When economic theory suggests a common long-run relationship (e.g., convergence models, PPP). PMG is more efficient than MG when the long-run homogeneity restriction holds. Use the Hausman test to verify.