Slope Heterogeneity in Panel Data¶
Summary
If slope coefficients vary across entities, Fixed Effects and Random Effects are inconsistent for the average effect. This guide helps you decide when to use the Mean Group (MG) or Pooled Mean Group (PMG) estimator, and how to interpret the results.
When Do Slopes Vary?¶
Slope heterogeneity is common in:
- Cross-country regressions: different institutional structures, development levels
- Firm-level studies: industry-specific production technologies
- Household panels: heterogeneous preferences and constraints
- Long-T panels: with \(T > 20\), entity-specific regressions become feasible and slope differences become estimable
Rule of thumb: If you have a large, heterogeneous sample and \(T > K\) (time periods exceed the number of regressors per entity), slope heterogeneity is worth investigating.
Decision Tree¶
Is T > K for most entities?
│
├── NO → Cannot estimate entity-specific slopes
│ → Use FE/RE with robust standard errors
│
└── YES → Run the Swamy test for slope homogeneity
│
├── FAIL TO REJECT H₀ → Slopes are homogeneous
│ → FE/RE are valid and more efficient
│ → MG is consistent but less efficient
│
└── REJECT H₀ → Slopes are heterogeneous
│
├── Is there a long-run equilibrium relationship?
│ │
│ ├── YES → Estimate both MG and PMG
│ │ Run Hausman test (MG vs PMG)
│ │ │
│ │ ├── Fail to reject → Use PMG (more efficient)
│ │ └── Reject → Use MG (robust to heterogeneity)
│ │
│ └── NO → Use MG
│
└── Use MG estimator
Quick Start¶
Mean Group Estimator¶
from panelbox.models.static import MeanGroupEstimator
model = MeanGroupEstimator(
formula="y ~ x1 + x2",
data=panel_df,
entity_col="country",
time_col="year",
min_obs_per_entity=10,
)
result = model.fit()
# Average coefficients across entities
result.summary()
# Swamy test for slope homogeneity
print(result.swamy_test_result)
# Entity-specific coefficients
result.coefficient_table()
# Distribution of a single coefficient
result.plot_coefficient_distribution("x1")
Pooled Mean Group Estimator¶
from panelbox.models.static import PooledMeanGroupEstimator
pmg = PooledMeanGroupEstimator(
formula="y ~ x1 + x2",
data=panel_df,
entity_col="country",
time_col="year",
lags=1,
)
pmg_result = pmg.fit()
# Long-run coefficients (homogeneous)
pmg_result.summary()
Hausman Test: MG vs PMG¶
from panelbox.models.static import hausman_mg_pmg
mg_result = MeanGroupEstimator(...).fit()
pmg_result = PooledMeanGroupEstimator(...).fit()
test = hausman_mg_pmg(mg_result, pmg_result)
print(f"Hausman stat: {test['statistic']:.3f}, p-value: {test['p_value']:.4f}")
Interpreting Results¶
MG Output¶
The MG estimator reports:
| Output | Meaning |
|---|---|
| Coefficients | Average of entity-specific OLS coefficients |
| Std. Errors | Based on cross-entity dispersion of \(\hat{\beta}_i\) |
| Swamy test | Tests \(H_0\): all \(\beta_i\) are equal |
| N entities used | Entities meeting the min_obs_per_entity threshold |
| Entities excluded | Dropped due to insufficient observations or collinearity |
What to Look For¶
-
Large standard errors relative to FE: Expected — MG is less efficient under homogeneity. If SEs are similar, heterogeneity was already inflating FE standard errors.
-
Swamy test rejects: Confirms slope heterogeneity. Check the coefficient table to see which variables have heterogeneous effects.
-
Excluded entities: Entities with \(T_i < K+1\) cannot be estimated. If many are excluded, consider reducing the number of regressors or using FE instead.
-
Coefficient distribution: Use
plot_coefficient_distribution()to visualize how a coefficient varies across entities. A tight distribution suggests homogeneity for that variable; a wide or multimodal distribution suggests heterogeneity.
PMG-Specific Output¶
| Output | Meaning |
|---|---|
| Long-run coefficients (\(\theta\)) | Homogeneous across entities |
| Error-correction speed (\(\phi_i\)) | Entity-specific; should be negative |
| Short-run coefficients (\(\delta_i\)) | Entity-specific dynamics |
| Convergence | Whether the optimizer converged |
Red flags for PMG:
- \(\phi_i > 0\) for many entities: model may be misspecified
- Non-convergence: try different starting values or increase
max_iter - Hausman test rejects: long-run homogeneity assumption is invalid
Comparison Table¶
| Feature | FE | RE | MG | PMG |
|---|---|---|---|---|
| Slope homogeneity | Required | Required | Not required | Long-run only |
| Correlated effects | Allowed | Not allowed | Allowed | Allowed |
| Min \(T\) per entity | 2 | 2 | \(> K\) | \(> K + \text{lags}\) |
| Efficiency | High | Highest | Lower | Medium |
| Long-run estimation | No | No | No | Yes |
| Dynamic model | No | No | No | Yes (ECM) |
Common Pitfalls¶
1. Small T Bias¶
Entity-specific OLS with small \(T\) is noisy. The MG average is unbiased but the variance is large. Ensure min_obs_per_entity is set high enough (default: 10).
2. Unbalanced Panels¶
Entities with different \(T_i\) contribute equally to the MG average. If some entities have very few observations, their imprecise estimates inflate the MG variance. Consider using entity-level weights proportional to \(T_i\).
3. Over-Rejecting Homogeneity¶
The Swamy test has high power with large \(N\). Even tiny, economically irrelevant heterogeneity will be detected. Always examine the magnitude of coefficient variation using coefficient_table(), not just the p-value.
4. PMG Starting Values¶
PMG uses numerical optimization. Poor starting values can lead to local optima. If results are sensitive to initialization, the long-run relationship may be weak.
FAQ¶
Q: Can I use MG with time fixed effects?
A: The standard MG runs separate regressions per entity, so time effects would need to be included as regressors (e.g., time dummies) in the formula. This is different from the demeaning approach used in two-way FE.
Q: How many entities do I need?
A: MG requires \(N \to \infty\) for its asymptotic properties. In practice, \(N \geq 20\) is a rough minimum, with \(N \geq 30\) preferred. With very few entities, the cross-entity variance estimate is unreliable.
Q: What if some entities have collinear regressors?
A: PanelBox automatically excludes entities where the regressor matrix is rank-deficient. Check result.entities_excluded to see which entities were dropped and why.
Q: Should I demean the data before MG estimation?
A: No. Each entity-specific OLS includes an intercept, which absorbs the entity mean. Demeaning beforehand is unnecessary and would remove the entity-specific intercepts.
Q: When should I prefer PMG over MG?
A: When economic theory suggests a common long-run relationship (e.g., convergence models, PPP). PMG is more efficient than MG when the long-run homogeneity restriction holds. Use the Hausman test to verify.
See Also¶
- Mean Group Theory — Derivations and asymptotic results
- Static Models API —
MeanGroupEstimatorandPooledMeanGroupEstimatorreference - FE vs RE (Hausman) — Testing under slope homogeneity