Spatial Diagnostics¶

Quick Reference

Module: panelbox.diagnostics.spatial Key tests: Moran's I, LM-lag, LM-error, Robust LM, LR tests Purpose: Detect spatial dependence, select spatial model, validate estimation

Overview¶

Spatial diagnostics serve three purposes in the spatial econometrics workflow:

Detection: Is there spatial dependence in my data? (Moran's I)
Classification: What type of spatial dependence? Lag or error? (LM tests)
Validation: Did my spatial model adequately capture the dependence? (post-estimation tests)

This page covers the complete diagnostic workflow, from initial OLS-based tests to post-estimation validation. The testing strategy follows the classical Anselin (1988) approach, augmented with robust tests and model comparison metrics.

Diagnostic Workflow¶

The recommended workflow proceeds in stages:

1. Fit standard panel model (OLS/FE/RE)
          |
2. Test residuals: Moran's I
          |
    Significant? ──No──> Standard panel model is adequate
          |
         Yes
          |
3. Run LM tests: LM-lag and LM-error
          |
    ┌─────┴─────┐
    |           |
  Only LM-lag  Only LM-error  Both significant
  significant  significant         |
    |           |          Run Robust LM tests
    |           |               |
   SAR         SEM       ┌─────┴─────┐
                         |           |
                    Robust LM-lag  Robust LM-error
                    dominates      dominates
                         |           |
                        SAR         SEM
                                     |
                              Both significant → SDM or GNS

Moran's I Test¶

Theory¶

Moran's I tests for spatial autocorrelation in a variable or in model residuals. The test statistic is:

\[I = \frac{n}{S_0} \frac{e'We}{e'e}\]

where:

\(e\) is the vector of residuals (or variable values, centered)
\(W\) is the spatial weight matrix
\(S_0 = \sum_i \sum_j w_{ij}\) is the sum of all weights
\(n\) is the number of observations

Under \(H_0\) (no spatial autocorrelation):

\[E[I] = \frac{-1}{n-1}, \quad Z = \frac{I - E[I]}{\sqrt{\text{Var}(I)}} \sim N(0, 1)\]

Interpretation¶

Moran's I Value	Interpretation
\(I > E[I]\) (positive)	Clustering: similar values near each other
\(I \approx E[I]\)	Random: no spatial pattern
\(I < E[I]\) (negative)	Dispersion: dissimilar values near each other

Code Example¶

from panelbox import FixedEffects
from panelbox.diagnostics.spatial import MoranIPanelTest

# Step 1: Fit standard panel model
fe_model = FixedEffects("y ~ x1 + x2", data, "region", "year")
fe_results = fe_model.fit()

# Step 2: Test residuals for spatial autocorrelation
moran = MoranIPanelTest(fe_results.resid, W)
result = moran.run()

print(f"Moran's I statistic: {result.statistic:.4f}")
print(f"Expected value:      {result.expected:.4f}")
print(f"Z-score:             {result.z_score:.4f}")
print(f"p-value:             {result.pvalue:.4f}")

if result.pvalue < 0.05:
    print("Significant spatial autocorrelation detected.")
    print("Proceed with LM tests to determine model type.")
else:
    print("No significant spatial autocorrelation.")
    print("Standard panel model is adequate.")

Moran's I Scatterplot

The Moran's I scatterplot plots the variable against its spatial lag (\(Wy\)). A positive slope indicates positive spatial autocorrelation (clustering). The four quadrants correspond to High-High, Low-Low, High-Low, and Low-High spatial clusters.

LM Tests for Spatial Dependence¶

Theory¶

The Lagrange Multiplier (LM) tests are computed from OLS residuals and do not require estimating the spatial model. They test specific forms of spatial dependence:

LM-lag (\(H_0: \rho = 0\) in SAR model):

\[LM_{lag} = \frac{(e'Wy / \hat{\sigma}^2)^2}{J_\rho}\]

where \(J_\rho\) is a function of \(W\), \(X\), and \(\hat{\sigma}^2\).

LM-error (\(H_0: \lambda = 0\) in SEM model):

\[LM_{error} = \frac{(e'We / \hat{\sigma}^2)^2}{\text{tr}(W'W + W^2)}\]

Robust LM-lag (adjusts for potential spatial error):

\[RLM_{lag} = LM_{lag} - \text{correction for } \lambda\]

Robust LM-error (adjusts for potential spatial lag):

\[RLM_{error} = LM_{error} - \text{correction for } \rho\]

Decision Rule¶

Test Result	Recommendation
LM-lag significant, LM-error not	Use SAR
LM-error significant, LM-lag not	Use SEM
Both significant, Robust LM-lag > Robust LM-error	Use SAR (or SDM)
Both significant, Robust LM-error > Robust LM-lag	Use SEM (or SDM)
Both robust tests significant	Use SDM or GNS
Neither significant	No spatial model needed

Code Example¶

from panelbox.diagnostics.spatial import (
    LMSpatialLagTest,
    LMSpatialErrorTest,
    RobustLMSpatialLagTest,
    RobustLMSpatialErrorTest,
)

# Compute LM tests from OLS residuals
lm_lag = LMSpatialLagTest(fe_results, W)
lm_error = LMSpatialErrorTest(fe_results, W)
rlm_lag = RobustLMSpatialLagTest(fe_results, W)
rlm_error = RobustLMSpatialErrorTest(fe_results, W)

# Run all tests
results = {
    'LM-lag': lm_lag.run(),
    'LM-error': lm_error.run(),
    'Robust LM-lag': rlm_lag.run(),
    'Robust LM-error': rlm_error.run(),
}

# Print summary
print(f"{'Test':<20} {'Statistic':>10} {'p-value':>10} {'Significant':>12}")
print("-" * 55)
for name, r in results.items():
    sig = "***" if r.pvalue < 0.001 else "**" if r.pvalue < 0.01 else "*" if r.pvalue < 0.05 else ""
    print(f"{name:<20} {r.statistic:>10.4f} {r.pvalue:>10.4f} {sig:>12}")

Model Comparison¶

Information Criteria¶

After fitting multiple spatial models, compare them using AIC and BIC:

from panelbox.models.spatial import SpatialLag, SpatialError, SpatialDurbin

# Fit competing models
sar = SpatialLag("y ~ x1 + x2", data, "region", "year", W=W)
sar_res = sar.fit(effects='fixed', method='qml')

sem = SpatialError("y ~ x1 + x2", data, "region", "year", W=W)
sem_res = sem.fit(effects='fixed', method='gmm')

sdm = SpatialDurbin("y ~ x1 + x2", data, "region", "year", W=W)
sdm_res = sdm.fit(effects='fixed', method='qml')

# Compare
print(f"{'Model':<8} {'Log-lik':>10} {'AIC':>10} {'BIC':>10} {'Pseudo R2':>10}")
print("-" * 50)
for name, res in [('SAR', sar_res), ('SEM', sem_res), ('SDM', sdm_res)]:
    print(f"{name:<8} {res.llf:>10.1f} {res.aic:>10.1f} {res.bic:>10.1f} "
          f"{res.rsquared_pseudo:>10.4f}")

Selection rules:

Lower AIC favors better prediction (less penalty for complexity)
Lower BIC favors parsimony (stronger penalty for additional parameters)
When AIC and BIC disagree, consider the research goal (prediction vs. explanation)

Likelihood Ratio Tests¶

For nested models estimated by ML, use the LR test:

\[LR = 2(\ell_{\text{unrestricted}} - \ell_{\text{restricted}}) \sim \chi^2(q)\]

where \(q\) is the number of restrictions.

from scipy import stats

# LR test: SDM vs SAR (restriction: theta = 0)
lr_stat = 2 * (sdm_res.llf - sar_res.llf)
df = 2  # number of theta parameters (x1, x2)
p_value = 1 - stats.chi2.cdf(lr_stat, df)
print(f"SDM vs SAR: LR = {lr_stat:.2f}, df = {df}, p = {p_value:.4f}")

# Or use GNS test_restrictions for formal tests
from panelbox.models.spatial import GeneralNestingSpatial

gns = GeneralNestingSpatial("y ~ x1 + x2", data, "region", "year",
                             W1=W, W2=W, W3=W)
gns_res = gns.fit(effects='fixed', method='ml')

# Test nested models
test = gns.test_restrictions(restrictions={'theta': 0, 'lambda': 0})
print(f"GNS vs SAR: LR = {test['lr_statistic']:.2f}, p = {test['p_value']:.4f}")

Post-Estimation Diagnostics¶

Residual Spatial Autocorrelation¶

After fitting a spatial model, the residuals should be free of spatial autocorrelation. Re-run Moran's I on the spatial model residuals:

# Moran's I on spatial model residuals
moran_post = MoranIPanelTest(sar_res.resid, W)
post_result = moran_post.run()

print(f"Post-estimation Moran's I: {post_result.statistic:.4f}")
print(f"p-value: {post_result.pvalue:.4f}")

if post_result.pvalue > 0.05:
    print("No remaining spatial autocorrelation. Model is adequate.")
else:
    print("Spatial autocorrelation persists. Consider a different specification.")

Warning

If residual Moran's I is still significant after fitting a SAR model, the spatial dependence may be more complex. Consider switching to SDM or GNS.

Goodness of Fit¶

# Pseudo R-squared
print(f"Pseudo R-squared: {results.rsquared_pseudo:.4f}")

# Predicted vs actual
import numpy as np
y_actual = data["y"].values
y_pred = results.fitted_values
correlation = np.corrcoef(y_actual, y_pred)[0, 1]
print(f"Correlation (predicted vs actual): {correlation:.4f}")

Hansen J-Test (GMM Models)¶

For models estimated by GMM (SEM, Dynamic Spatial Panel), the Hansen J-test checks instrument validity:

\[J = n \cdot \hat{g}' \hat{W}^{-1} \hat{g} \sim \chi^2(L - K)\]

where \(L\) is the number of instruments and \(K\) is the number of parameters.

# Hansen J-test is reported in the GMM summary
sem_res = sem.fit(effects='fixed', method='gmm', n_lags=2)
print(sem_res.summary())  # Includes J-test if applicable

\(H_0\): instruments are valid (orthogonal to errors)
Do not reject (\(p > 0.05\)): instruments are valid
Reject (\(p < 0.05\)): instruments may be invalid; reconsider model or instruments

Diagnostic Checklist¶

Use this checklist to ensure a thorough spatial analysis:

Moran's I on OLS residuals: is spatial autocorrelation present?
LM tests: which type of spatial dependence (lag, error, both)?
Robust LM tests: which dominates when both LM tests are significant?
Model estimation: fit the recommended model (SAR, SEM, or SDM)
Post-estimation Moran's I: is spatial autocorrelation eliminated?
Model comparison: AIC/BIC across competing specifications
Effect decomposition: direct, indirect, total effects (for SAR/SDM)
Weight matrix sensitivity: do results hold with different \(W\)?
Hansen J-test: are instruments valid? (for GMM models)

Troubleshooting¶

Model Does Not Converge¶

Check the weight matrix: ensure it is properly row-standardized and has no islands
Simplify the model: start with SAR or SEM before trying SDM or GNS
Adjust optimizer settings: increase maxiter, try different optim_method
Check multicollinearity: VIF > 10 for any covariate may cause problems
Scale variables: very large or very small values can cause numerical issues

Spatial Parameter at Boundary (\(|\rho| \approx 1\))¶

If \(\rho\) or \(\lambda\) is estimated at or near the boundary:

Check weight matrix normalization
Consider a different \(W\) specification
May indicate model misspecification
Try a simpler model

Residual Autocorrelation Persists¶

If Moran's I is still significant after fitting a spatial model:

SAR residuals significant: try SDM (add \(WX\) terms) or SEM
SEM residuals significant: try SAR or SDM (spatial lag may be needed)
SDM residuals significant: try GNS (add spatial error term) or dynamic spatial

Estimation Is Slow¶

Problem	Solution
Large \(N\) (> 500)	Use sparse weight matrices: `W.to_sparse()`
Large \(N\) (> 5,000)	Model automatically uses sparse LU for log-det
Large \(N\) (> 10,000)	Model automatically uses Chebyshev approximation
Many covariates	Consider dimensionality reduction
Complex model (GNS)	Start with simpler models, use GNS only for model selection

Tutorials¶

Tutorial	Description	Links
Spatial Econometrics	Includes full diagnostic workflow

References¶

Anselin, L. (1988). Spatial Econometrics: Methods and Models. Kluwer Academic.
Anselin, L., Bera, A.K., Florax, R., and Yoon, M.J. (1996). Simple diagnostic tests for spatial dependence. Regional Science and Urban Economics, 26(1), 77-104.
Moran, P.A.P. (1950). Notes on continuous stochastic phenomena. Biometrika, 37(1-2), 17-23.
Elhorst, J.P. (2014). Spatial Econometrics: From Cross-Sectional Data to Spatial Panels. Springer.
LeSage, J. and Pace, R.K. (2009). Introduction to Spatial Econometrics. Chapman & Hall/CRC.

Spatial Diagnostics¶

Overview¶

Diagnostic Workflow¶

Moran's I Test¶

Theory¶

Interpretation¶

Code Example¶

LM Tests for Spatial Dependence¶

Theory¶

Decision Rule¶

Code Example¶

Model Comparison¶

Information Criteria¶

Likelihood Ratio Tests¶

Post-Estimation Diagnostics¶

Residual Spatial Autocorrelation¶

Goodness of Fit¶

Hansen J-Test (GMM Models)¶

Diagnostic Checklist¶

Troubleshooting¶

Model Does Not Converge¶

Spatial Parameter at Boundary (\(|\rho| \approx 1\))¶

Residual Autocorrelation Persists¶

Estimation Is Slow¶

Tutorials¶

See Also¶

References¶