Choosing a Spatial Model¶

Quick Reference

Key decision tools: Moran's I, LM tests, LR tests, information criteria Default recommendation: Start with SDM (LeSage & Pace, 2009) Formal approach: Estimate GNS and test restrictions

Overview¶

Choosing the right spatial model is one of the most important decisions in spatial econometrics. Different models make different assumptions about how spatial dependence operates, and selecting the wrong model can lead to biased coefficients, incorrect standard errors, or misleading spillover estimates.

There are two complementary approaches to model selection:

Theory-driven: match the model to the economic mechanism generating spatial dependence
Data-driven: use statistical tests to let the data guide model choice

In practice, you should use both. Economic theory narrows the set of plausible models, and statistical tests help discriminate among them.

Decision Framework¶

Step 1: Is There Spatial Dependence?¶

Before fitting any spatial model, test whether spatial dependence exists:

from panelbox import FixedEffects
from panelbox.diagnostics.spatial import MoranIPanelTest

# Fit standard panel model
fe = FixedEffects("y ~ x1 + x2", data, "region", "year")
fe_results = fe.fit()

# Test for spatial autocorrelation
moran = MoranIPanelTest(fe_results.resid, W)
result = moran.run()

if result.pvalue >= 0.05:
    print("No spatial autocorrelation detected.")
    print("Standard panel model is adequate.")
else:
    print(f"Moran's I = {result.statistic:.4f} (p = {result.pvalue:.4f})")
    print("Spatial dependence detected. Proceed to Step 2.")

If Moran's I is not significant, there is no statistical evidence for spatial dependence and a standard panel model (FE/RE) is appropriate.

Step 2: What Type of Spatial Dependence?¶

Use Lagrange Multiplier tests to classify the spatial dependence:

from panelbox.diagnostics.spatial import (
    LMSpatialLagTest,
    LMSpatialErrorTest,
    RobustLMSpatialLagTest,
    RobustLMSpatialErrorTest,
)

# Run LM tests
lm_lag = LMSpatialLagTest(fe_results, W).run()
lm_error = LMSpatialErrorTest(fe_results, W).run()
rlm_lag = RobustLMSpatialLagTest(fe_results, W).run()
rlm_error = RobustLMSpatialErrorTest(fe_results, W).run()

# Decision logic
if lm_lag.pvalue < 0.05 and lm_error.pvalue >= 0.05:
    print("Only LM-lag significant -> SAR")
elif lm_error.pvalue < 0.05 and lm_lag.pvalue >= 0.05:
    print("Only LM-error significant -> SEM")
elif lm_lag.pvalue < 0.05 and lm_error.pvalue < 0.05:
    # Both significant: check robust versions
    if rlm_lag.pvalue < 0.05 and rlm_error.pvalue >= 0.05:
        print("Robust LM-lag dominates -> SAR")
    elif rlm_error.pvalue < 0.05 and rlm_lag.pvalue >= 0.05:
        print("Robust LM-error dominates -> SEM")
    else:
        print("Both robust tests significant -> SDM or GNS")

Step 3: Do You Need Dynamics?¶

If your data has both temporal and spatial dependence:

# Check for temporal autocorrelation in the outcome
from panelbox.validation import WooldridgeARTest

ar_test = WooldridgeAutocorrelationTest("y ~ x1 + x2", data, "region", "year")
ar_result = ar_test.run()

if ar_result.pvalue < 0.05:
    print("Temporal autocorrelation detected.")
    print("Consider Dynamic Spatial Panel model.")

Step 4: Formal Model Selection with GNS¶

For a rigorous model selection procedure, estimate the GNS and test restrictions:

from panelbox.models.spatial import GeneralNestingSpatial

# Fit the full GNS model
gns = GeneralNestingSpatial("y ~ x1 + x2", data, "region", "year",
                             W1=W, W2=W, W3=W)
gns_results = gns.fit(effects='fixed', method='ml')

# Test restrictions to find the most parsimonious adequate model
tests = {
    'SAR':  {'theta': 0, 'lambda': 0},   # Only rho
    'SEM':  {'rho': 0, 'theta': 0},       # Only lambda
    'SDM':  {'lambda': 0},                 # rho + theta
    'SAC':  {'theta': 0},                  # rho + lambda
    'SDEM': {'rho': 0},                    # theta + lambda
    'OLS':  {'rho': 0, 'theta': 0, 'lambda': 0},  # No spatial
}

print(f"{'Model':<8} {'LR stat':>10} {'p-value':>10} {'Conclusion':>15}")
print("-" * 45)
for name, restrictions in tests.items():
    test = gns.test_restrictions(restrictions=restrictions)
    conclusion = "Reject" if test['p_value'] < 0.05 else "Accept"
    print(f"{name:<8} {test['lr_statistic']:>10.2f} {test['p_value']:>10.4f} "
          f"{conclusion:>15}")

# Automatic identification
model_type = gns.identify_model_type(gns_results)
print(f"\nIdentified model: {model_type}")

Theory-Driven Selection¶

Match Model to Mechanism¶

Mechanism	Model	Examples
Outcome spillovers	SAR	Trade flows, migration, contagion, policy diffusion
Correlated shocks	SEM	Weather, regional policy, measurement error
Both outcome and covariate spillovers	SDM	Housing (neighbor prices + amenities), education (peer effects + background)
Outcome and error dependence	SAC	Simultaneous competition and shared shocks
Covariate spillovers and correlated shocks	SDEM	Neighbor characteristics matter but no outcome feedback
Temporal + spatial	Dynamic	GDP growth, epidemics, technology adoption
Unknown mechanism	GNS or SDM	Exploratory analysis

Scenario-Based Recommendations¶

Regional Economics¶

GDP growth: Dynamic Spatial Panel (persistence + spillovers)
Unemployment: SAR (labor mobility creates outcome spillovers)
Public spending: SAR or SDM (fiscal competition / yardstick competition)

Epidemiology¶

Disease prevalence: Dynamic Spatial Panel (temporal persistence + geographic contagion)
Health outcomes: SDM (neighbor health infrastructure AND neighbor health outcomes matter)

Housing Markets¶

House prices: SDM (neighbor prices AND neighbor amenities affect value)
Housing supply: SEM (shared regulatory or geographic constraints)

Trade and Migration¶

Trade volumes: SAR (trade begets trade; gravity model with spatial lag)
Migration flows: SAR (network effects in destination choice)

Environmental Economics¶

Pollution: SEM (shared atmospheric conditions)
Resource management: SAR (commons problems with spatial spillovers)

Model Comparison Table¶

Model	Specification	Spatial Parameters	Indirect Effects	Estimation	Complexity
OLS/FE	\(y = X\beta + \alpha + \varepsilon\)	None	No	OLS	Lowest
SAR	\(y = \rho Wy + X\beta + \varepsilon\)	\(\rho\)	Yes	QML/ML	Low
SEM	\(y = X\beta + u\), \(u = \lambda Wu + \varepsilon\)	\(\lambda\)	No	GMM/ML	Low
SLX	\(y = X\beta + WX\theta + \varepsilon\)	None	\(\theta\)	OLS	Low
SDM	\(y = \rho Wy + X\beta + WX\theta + \varepsilon\)	\(\rho\), \(\theta\)	Yes	QML/ML	Medium
SAC	\(y = \rho Wy + X\beta + u\), \(u = \lambda Wu + \varepsilon\)	\(\rho\), \(\lambda\)	Yes	ML	Medium
SDEM	\(y = X\beta + WX\theta + u\), \(u = \lambda Wu + \varepsilon\)	\(\lambda\), \(\theta\)	\(\theta\)	ML	Medium
GNS	\(y = \rho Wy + X\beta + WX\theta + u\), \(u = \lambda Wu + \varepsilon\)	\(\rho\), \(\lambda\), \(\theta\)	Yes	ML	Highest
Dynamic	\(y_{it} = \gamma y_{i,t-1} + \rho Wy_{it} + X\beta + \varepsilon\)	\(\gamma\), \(\rho\)	Yes	GMM	High

Practical Recommendations¶

Default Strategy¶

Start with SDM

LeSage and Pace (2009) recommend starting with the Spatial Durbin Model (SDM) because:

It nests SAR (\(\theta = 0\)) and SEM (\(\theta = -\rho\beta\)) as special cases
If the true model is SAR but you estimate SDM, you lose some efficiency but remain consistent
If the true model is SEM but you estimate SAR, you get biased and inconsistent estimates
SDM protects against omitted spatially lagged variable bias

Sensitivity Analysis¶

Always test your results' sensitivity to the weight matrix specification:

from panelbox.models.spatial import SpatialLag, SpatialWeights

# Fit with different weight matrices
W_queen = SpatialWeights.from_contiguity(gdf, criterion='queen')
W_knn5 = SpatialWeights.from_knn(coords, k=5)
W_knn10 = SpatialWeights.from_knn(coords, k=10)

results_queen = SpatialLag("y ~ x1 + x2", data, "region", "year",
                            W=W_queen.matrix).fit(effects='fixed')
results_knn5 = SpatialLag("y ~ x1 + x2", data, "region", "year",
                           W=W_knn5.matrix).fit(effects='fixed')
results_knn10 = SpatialLag("y ~ x1 + x2", data, "region", "year",
                            W=W_knn10.matrix).fit(effects='fixed')

# Compare key results
print(f"{'W specification':<20} {'rho':>8} {'beta_x1':>10} {'AIC':>10}")
print("-" * 50)
for name, r in [('Queen', results_queen), ('KNN-5', results_knn5),
                ('KNN-10', results_knn10)]:
    print(f"{name:<20} {r.rho:>8.4f} {r.params['x1']:>10.4f} {r.aic:>10.1f}")

If results change substantially across \(W\) specifications, be cautious about drawing strong conclusions.

Complete Model Selection Workflow¶

import numpy as np
from panelbox import FixedEffects
from panelbox.models.spatial import (
    SpatialLag, SpatialError, SpatialDurbin,
    GeneralNestingSpatial, SpatialWeights
)
from panelbox.diagnostics.spatial import MoranIPanelTest

# ---- Step 1: Baseline FE model ----
fe = FixedEffects("y ~ x1 + x2", data, "region", "year")
fe_results = fe.fit()

# ---- Step 2: Test for spatial dependence ----
moran = MoranIPanelTest(fe_results.resid, W)
moran_result = moran.run()
print(f"Moran's I: {moran_result.statistic:.4f} (p = {moran_result.pvalue:.4f})")

if moran_result.pvalue >= 0.05:
    print("No spatial dependence. Use standard FE model.")
else:
    # ---- Step 3: Fit competing spatial models ----
    sar = SpatialLag("y ~ x1 + x2", data, "region", "year", W=W)
    sar_res = sar.fit(effects='fixed', method='qml')

    sem = SpatialError("y ~ x1 + x2", data, "region", "year", W=W)
    sem_res = sem.fit(effects='fixed', method='gmm')

    sdm = SpatialDurbin("y ~ x1 + x2", data, "region", "year", W=W)
    sdm_res = sdm.fit(effects='fixed', method='qml')

    # ---- Step 4: Compare models ----
    print(f"\n{'Model':<8} {'Log-lik':>10} {'AIC':>10} {'BIC':>10}")
    print("-" * 40)
    for name, res in [('SAR', sar_res), ('SEM', sem_res), ('SDM', sdm_res)]:
        print(f"{name:<8} {res.llf:>10.1f} {res.aic:>10.1f} {res.bic:>10.1f}")

    # ---- Step 5: Post-estimation check ----
    best_model = sdm_res  # Start with SDM
    moran_post = MoranIPanelTest(best_model.resid, W)
    post = moran_post.run()
    print(f"\nPost-SDM Moran's I: {post.statistic:.4f} (p = {post.pvalue:.4f})")

    # ---- Step 6: Effect decomposition ----
    effects = best_model.spillover_effects
    print("\nEffect Decomposition (SDM):")
    print(f"{'Variable':<10} {'Direct':>10} {'Indirect':>10} {'Total':>10}")
    print("-" * 42)
    for var in effects['direct']:
        print(f"{var:<10} {effects['direct'][var]:>10.4f} "
              f"{effects['indirect'][var]:>10.4f} "
              f"{effects['total'][var]:>10.4f}")

Common Pitfalls¶

1. Over-Parameterization¶

GNS and SDM may have too many parameters for small samples. Check:

\(N > 30\) (cross-sectional units) for simple spatial models
\(N > 50\) for SDM with many covariates
\(N > 100\) for GNS with separate weight matrices

2. Weight Matrix Sensitivity¶

Results can be sensitive to \(W\) specification. Always:

Test at least 2-3 different weight structures
Report sensitivity analysis in publications
Justify your choice based on economic theory

3. Ignoring Temporal Dynamics¶

Panel data often has temporal dependence. If the lagged dependent variable is significant:

Omitting it biases spatial parameter estimates
Use the Dynamic Spatial Panel model
Or at minimum, include time dummies

4. Mechanical Model Selection¶

Do not rely purely on statistical tests:

Economic theory should guide model choice
Consider interpretability of results
SDM is safer as a default than pure statistical selection

5. Reporting Only the Preferred Model¶

For publication:

Report at least SAR, SEM, and SDM side-by-side
Show diagnostic test results
Include sensitivity to weight matrix
Report direct, indirect, and total effects for SAR/SDM

Tutorials¶

Tutorial	Description	Links
Spatial Econometrics	Full model selection workflow

References¶

LeSage, J. and Pace, R.K. (2009). Introduction to Spatial Econometrics. Chapman & Hall/CRC.
Elhorst, J.P. (2014). Spatial Econometrics: From Cross-Sectional Data to Spatial Panels. Springer.
Anselin, L. (1988). Spatial Econometrics: Methods and Models. Kluwer Academic.
Lee, L.F. and Yu, J. (2010). Estimation of spatial autoregressive panel data models with fixed effects. Journal of Econometrics, 154(2), 165-185.
Manski, C.F. (1993). Identification of endogenous social effects: The reflection problem. Review of Economic Studies, 60(3), 531-542.
Gibbons, S. and Overman, H.G. (2012). Mostly pointless spatial econometrics? Journal of Regional Science, 52(2), 172-191.