Choosing a Spatial Model¶
Quick Reference
Key decision tools: Moran's I, LM tests, LR tests, information criteria Default recommendation: Start with SDM (LeSage & Pace, 2009) Formal approach: Estimate GNS and test restrictions
Overview¶
Choosing the right spatial model is one of the most important decisions in spatial econometrics. Different models make different assumptions about how spatial dependence operates, and selecting the wrong model can lead to biased coefficients, incorrect standard errors, or misleading spillover estimates.
There are two complementary approaches to model selection:
- Theory-driven: match the model to the economic mechanism generating spatial dependence
- Data-driven: use statistical tests to let the data guide model choice
In practice, you should use both. Economic theory narrows the set of plausible models, and statistical tests help discriminate among them.
Decision Framework¶
Step 1: Is There Spatial Dependence?¶
Before fitting any spatial model, test whether spatial dependence exists:
from panelbox import FixedEffects
from panelbox.diagnostics.spatial import MoranIPanelTest
# Fit standard panel model
fe = FixedEffects("y ~ x1 + x2", data, "region", "year")
fe_results = fe.fit()
# Test for spatial autocorrelation
moran = MoranIPanelTest(fe_results.resid, W)
result = moran.run()
if result.pvalue >= 0.05:
print("No spatial autocorrelation detected.")
print("Standard panel model is adequate.")
else:
print(f"Moran's I = {result.statistic:.4f} (p = {result.pvalue:.4f})")
print("Spatial dependence detected. Proceed to Step 2.")
If Moran's I is not significant, there is no statistical evidence for spatial dependence and a standard panel model (FE/RE) is appropriate.
Step 2: What Type of Spatial Dependence?¶
Use Lagrange Multiplier tests to classify the spatial dependence:
from panelbox.diagnostics.spatial import (
LMSpatialLagTest,
LMSpatialErrorTest,
RobustLMSpatialLagTest,
RobustLMSpatialErrorTest,
)
# Run LM tests
lm_lag = LMSpatialLagTest(fe_results, W).run()
lm_error = LMSpatialErrorTest(fe_results, W).run()
rlm_lag = RobustLMSpatialLagTest(fe_results, W).run()
rlm_error = RobustLMSpatialErrorTest(fe_results, W).run()
# Decision logic
if lm_lag.pvalue < 0.05 and lm_error.pvalue >= 0.05:
print("Only LM-lag significant -> SAR")
elif lm_error.pvalue < 0.05 and lm_lag.pvalue >= 0.05:
print("Only LM-error significant -> SEM")
elif lm_lag.pvalue < 0.05 and lm_error.pvalue < 0.05:
# Both significant: check robust versions
if rlm_lag.pvalue < 0.05 and rlm_error.pvalue >= 0.05:
print("Robust LM-lag dominates -> SAR")
elif rlm_error.pvalue < 0.05 and rlm_lag.pvalue >= 0.05:
print("Robust LM-error dominates -> SEM")
else:
print("Both robust tests significant -> SDM or GNS")
Step 3: Do You Need Dynamics?¶
If your data has both temporal and spatial dependence:
# Check for temporal autocorrelation in the outcome
from panelbox.validation import WooldridgeARTest
ar_test = WooldridgeAutocorrelationTest("y ~ x1 + x2", data, "region", "year")
ar_result = ar_test.run()
if ar_result.pvalue < 0.05:
print("Temporal autocorrelation detected.")
print("Consider Dynamic Spatial Panel model.")
Step 4: Formal Model Selection with GNS¶
For a rigorous model selection procedure, estimate the GNS and test restrictions:
from panelbox.models.spatial import GeneralNestingSpatial
# Fit the full GNS model
gns = GeneralNestingSpatial("y ~ x1 + x2", data, "region", "year",
W1=W, W2=W, W3=W)
gns_results = gns.fit(effects='fixed', method='ml')
# Test restrictions to find the most parsimonious adequate model
tests = {
'SAR': {'theta': 0, 'lambda': 0}, # Only rho
'SEM': {'rho': 0, 'theta': 0}, # Only lambda
'SDM': {'lambda': 0}, # rho + theta
'SAC': {'theta': 0}, # rho + lambda
'SDEM': {'rho': 0}, # theta + lambda
'OLS': {'rho': 0, 'theta': 0, 'lambda': 0}, # No spatial
}
print(f"{'Model':<8} {'LR stat':>10} {'p-value':>10} {'Conclusion':>15}")
print("-" * 45)
for name, restrictions in tests.items():
test = gns.test_restrictions(restrictions=restrictions)
conclusion = "Reject" if test['p_value'] < 0.05 else "Accept"
print(f"{name:<8} {test['lr_statistic']:>10.2f} {test['p_value']:>10.4f} "
f"{conclusion:>15}")
# Automatic identification
model_type = gns.identify_model_type(gns_results)
print(f"\nIdentified model: {model_type}")
Theory-Driven Selection¶
Match Model to Mechanism¶
| Mechanism | Model | Examples |
|---|---|---|
| Outcome spillovers | SAR | Trade flows, migration, contagion, policy diffusion |
| Correlated shocks | SEM | Weather, regional policy, measurement error |
| Both outcome and covariate spillovers | SDM | Housing (neighbor prices + amenities), education (peer effects + background) |
| Outcome and error dependence | SAC | Simultaneous competition and shared shocks |
| Covariate spillovers and correlated shocks | SDEM | Neighbor characteristics matter but no outcome feedback |
| Temporal + spatial | Dynamic | GDP growth, epidemics, technology adoption |
| Unknown mechanism | GNS or SDM | Exploratory analysis |
Scenario-Based Recommendations¶
Regional Economics¶
- GDP growth: Dynamic Spatial Panel (persistence + spillovers)
- Unemployment: SAR (labor mobility creates outcome spillovers)
- Public spending: SAR or SDM (fiscal competition / yardstick competition)
Epidemiology¶
- Disease prevalence: Dynamic Spatial Panel (temporal persistence + geographic contagion)
- Health outcomes: SDM (neighbor health infrastructure AND neighbor health outcomes matter)
Housing Markets¶
- House prices: SDM (neighbor prices AND neighbor amenities affect value)
- Housing supply: SEM (shared regulatory or geographic constraints)
Trade and Migration¶
- Trade volumes: SAR (trade begets trade; gravity model with spatial lag)
- Migration flows: SAR (network effects in destination choice)
Environmental Economics¶
- Pollution: SEM (shared atmospheric conditions)
- Resource management: SAR (commons problems with spatial spillovers)
Model Comparison Table¶
| Model | Specification | Spatial Parameters | Indirect Effects | Estimation | Complexity |
|---|---|---|---|---|---|
| OLS/FE | \(y = X\beta + \alpha + \varepsilon\) | None | No | OLS | Lowest |
| SAR | \(y = \rho Wy + X\beta + \varepsilon\) | \(\rho\) | Yes | QML/ML | Low |
| SEM | \(y = X\beta + u\), \(u = \lambda Wu + \varepsilon\) | \(\lambda\) | No | GMM/ML | Low |
| SLX | \(y = X\beta + WX\theta + \varepsilon\) | None | \(\theta\) | OLS | Low |
| SDM | \(y = \rho Wy + X\beta + WX\theta + \varepsilon\) | \(\rho\), \(\theta\) | Yes | QML/ML | Medium |
| SAC | \(y = \rho Wy + X\beta + u\), \(u = \lambda Wu + \varepsilon\) | \(\rho\), \(\lambda\) | Yes | ML | Medium |
| SDEM | \(y = X\beta + WX\theta + u\), \(u = \lambda Wu + \varepsilon\) | \(\lambda\), \(\theta\) | \(\theta\) | ML | Medium |
| GNS | \(y = \rho Wy + X\beta + WX\theta + u\), \(u = \lambda Wu + \varepsilon\) | \(\rho\), \(\lambda\), \(\theta\) | Yes | ML | Highest |
| Dynamic | \(y_{it} = \gamma y_{i,t-1} + \rho Wy_{it} + X\beta + \varepsilon\) | \(\gamma\), \(\rho\) | Yes | GMM | High |
Practical Recommendations¶
Default Strategy¶
Start with SDM
LeSage and Pace (2009) recommend starting with the Spatial Durbin Model (SDM) because:
- It nests SAR (\(\theta = 0\)) and SEM (\(\theta = -\rho\beta\)) as special cases
- If the true model is SAR but you estimate SDM, you lose some efficiency but remain consistent
- If the true model is SEM but you estimate SAR, you get biased and inconsistent estimates
- SDM protects against omitted spatially lagged variable bias
Sensitivity Analysis¶
Always test your results' sensitivity to the weight matrix specification:
from panelbox.models.spatial import SpatialLag, SpatialWeights
# Fit with different weight matrices
W_queen = SpatialWeights.from_contiguity(gdf, criterion='queen')
W_knn5 = SpatialWeights.from_knn(coords, k=5)
W_knn10 = SpatialWeights.from_knn(coords, k=10)
results_queen = SpatialLag("y ~ x1 + x2", data, "region", "year",
W=W_queen.matrix).fit(effects='fixed')
results_knn5 = SpatialLag("y ~ x1 + x2", data, "region", "year",
W=W_knn5.matrix).fit(effects='fixed')
results_knn10 = SpatialLag("y ~ x1 + x2", data, "region", "year",
W=W_knn10.matrix).fit(effects='fixed')
# Compare key results
print(f"{'W specification':<20} {'rho':>8} {'beta_x1':>10} {'AIC':>10}")
print("-" * 50)
for name, r in [('Queen', results_queen), ('KNN-5', results_knn5),
('KNN-10', results_knn10)]:
print(f"{name:<20} {r.rho:>8.4f} {r.params['x1']:>10.4f} {r.aic:>10.1f}")
If results change substantially across \(W\) specifications, be cautious about drawing strong conclusions.
Complete Model Selection Workflow¶
import numpy as np
from panelbox import FixedEffects
from panelbox.models.spatial import (
SpatialLag, SpatialError, SpatialDurbin,
GeneralNestingSpatial, SpatialWeights
)
from panelbox.diagnostics.spatial import MoranIPanelTest
# ---- Step 1: Baseline FE model ----
fe = FixedEffects("y ~ x1 + x2", data, "region", "year")
fe_results = fe.fit()
# ---- Step 2: Test for spatial dependence ----
moran = MoranIPanelTest(fe_results.resid, W)
moran_result = moran.run()
print(f"Moran's I: {moran_result.statistic:.4f} (p = {moran_result.pvalue:.4f})")
if moran_result.pvalue >= 0.05:
print("No spatial dependence. Use standard FE model.")
else:
# ---- Step 3: Fit competing spatial models ----
sar = SpatialLag("y ~ x1 + x2", data, "region", "year", W=W)
sar_res = sar.fit(effects='fixed', method='qml')
sem = SpatialError("y ~ x1 + x2", data, "region", "year", W=W)
sem_res = sem.fit(effects='fixed', method='gmm')
sdm = SpatialDurbin("y ~ x1 + x2", data, "region", "year", W=W)
sdm_res = sdm.fit(effects='fixed', method='qml')
# ---- Step 4: Compare models ----
print(f"\n{'Model':<8} {'Log-lik':>10} {'AIC':>10} {'BIC':>10}")
print("-" * 40)
for name, res in [('SAR', sar_res), ('SEM', sem_res), ('SDM', sdm_res)]:
print(f"{name:<8} {res.llf:>10.1f} {res.aic:>10.1f} {res.bic:>10.1f}")
# ---- Step 5: Post-estimation check ----
best_model = sdm_res # Start with SDM
moran_post = MoranIPanelTest(best_model.resid, W)
post = moran_post.run()
print(f"\nPost-SDM Moran's I: {post.statistic:.4f} (p = {post.pvalue:.4f})")
# ---- Step 6: Effect decomposition ----
effects = best_model.spillover_effects
print("\nEffect Decomposition (SDM):")
print(f"{'Variable':<10} {'Direct':>10} {'Indirect':>10} {'Total':>10}")
print("-" * 42)
for var in effects['direct']:
print(f"{var:<10} {effects['direct'][var]:>10.4f} "
f"{effects['indirect'][var]:>10.4f} "
f"{effects['total'][var]:>10.4f}")
Common Pitfalls¶
1. Over-Parameterization¶
GNS and SDM may have too many parameters for small samples. Check:
- \(N > 30\) (cross-sectional units) for simple spatial models
- \(N > 50\) for SDM with many covariates
- \(N > 100\) for GNS with separate weight matrices
2. Weight Matrix Sensitivity¶
Results can be sensitive to \(W\) specification. Always:
- Test at least 2-3 different weight structures
- Report sensitivity analysis in publications
- Justify your choice based on economic theory
3. Ignoring Temporal Dynamics¶
Panel data often has temporal dependence. If the lagged dependent variable is significant:
- Omitting it biases spatial parameter estimates
- Use the Dynamic Spatial Panel model
- Or at minimum, include time dummies
4. Mechanical Model Selection¶
Do not rely purely on statistical tests:
- Economic theory should guide model choice
- Consider interpretability of results
- SDM is safer as a default than pure statistical selection
5. Reporting Only the Preferred Model¶
For publication:
- Report at least SAR, SEM, and SDM side-by-side
- Show diagnostic test results
- Include sensitivity to weight matrix
- Report direct, indirect, and total effects for SAR/SDM
Tutorials¶
| Tutorial | Description | Links |
|---|---|---|
| Spatial Econometrics | Full model selection workflow |
See Also¶
- Spatial Weight Matrices — Constructing the weight matrix
- Spatial Lag (SAR) — For outcome spillovers
- Spatial Error (SEM) — For correlated shocks
- Spatial Durbin (SDM) — Recommended starting point
- Dynamic Spatial Panel — When temporal dynamics matter
- General Nesting Spatial (GNS) — For formal restriction tests
- Direct, Indirect, and Total Effects — Interpreting spatial effects
- Spatial Diagnostics — Full diagnostic test suite
References¶
- LeSage, J. and Pace, R.K. (2009). Introduction to Spatial Econometrics. Chapman & Hall/CRC.
- Elhorst, J.P. (2014). Spatial Econometrics: From Cross-Sectional Data to Spatial Panels. Springer.
- Anselin, L. (1988). Spatial Econometrics: Methods and Models. Kluwer Academic.
- Lee, L.F. and Yu, J. (2010). Estimation of spatial autoregressive panel data models with fixed effects. Journal of Econometrics, 154(2), 165-185.
- Manski, C.F. (1993). Identification of endogenous social effects: The reflection problem. Review of Economic Studies, 60(3), 531-542.
- Gibbons, S. and Overman, H.G. (2012). Mostly pointless spatial econometrics? Journal of Regional Science, 52(2), 172-191.