Spatial Error Model (SEM)¶
Quick Reference
Class: panelbox.models.spatial.SpatialError
Import: from panelbox.models.spatial import SpatialError
Stata equivalent: xsmle y x1 x2, wmat(W) model(sem) fe
R equivalent: splm::spml(..., spatial.error=TRUE)
Overview¶
The Spatial Error Model (SEM) captures spatial correlation in unobservable factors. Unlike the SAR model, the SEM does not model direct outcome spillovers between units. Instead, it accounts for the fact that neighboring units may share common unobserved shocks — such as weather patterns, regional policy effects, or unmeasured local amenities — that create spatial dependence in the error term.
The model is specified as:
where:
- \(y\) is the \(NT \times 1\) dependent variable vector
- \(X\) is the \(NT \times K\) matrix of explanatory variables
- \(\beta\) is the \(K \times 1\) coefficient vector
- \(\alpha\) captures individual effects
- \(u\) is the spatially autocorrelated error
- \(\lambda\) is the spatial error parameter
- \(W\) is the \(N \times N\) spatial weight matrix
- \(\varepsilon \sim \text{iid}(0, \sigma^2)\) is the idiosyncratic error
The key distinction from SAR: in the SEM, coefficients \(\beta\) are directly interpretable as marginal effects. There are no indirect (spillover) effects because the spatial dependence operates through the error term, not through the outcome.
Quick Example¶
import numpy as np
from panelbox.models.spatial import SpatialError, SpatialWeights
# Create weight matrix
W = SpatialWeights.from_contiguity(gdf, criterion='queen')
# Fit SEM with fixed effects using GMM
model = SpatialError("y ~ x1 + x2", data, "region", "year", W=W.matrix)
results = model.fit(effects='fixed', method='gmm', n_lags=2)
# View results
print(results.summary())
# Lambda (spatial error parameter)
print(f"lambda = {results.params['lambda']:.4f}")
# Coefficients are directly interpretable as marginal effects
print(f"Effect of x1: {results.params['x1']:.4f}")
When to Use¶
- Shared unobserved shocks: neighboring units are affected by common factors not captured in \(X\)
- Weather shocks affecting agricultural yields across adjacent counties
- Unobserved regional amenities influencing housing prices
- Correlated measurement errors across units
- Moran's I is significant but LM-lag is not: spatial dependence is in errors, not outcomes
- Efficiency matters: ignoring spatial error correlation leads to inefficient OLS estimates (correct point estimates but wrong standard errors)
- No theoretical reason for outcome spillovers: when there is no mechanism for \(y_j\) to directly affect \(y_i\)
Key Assumptions
- Balanced panel: all entities must have observations for all time periods
- Stationarity: \(|\lambda| < 1\) (spatial error process is stable)
- Exogeneity of \(X\): all regressors are strictly exogenous
- No outcome spillovers: if outcomes do spillover, use SAR or SDM instead
- Correct \(W\) specification: the weight matrix captures the true spatial structure of the errors
Detailed Guide¶
Data Preparation¶
As with all spatial models, the panel must be balanced:
# Verify balance
entity_counts = data.groupby("region").size()
assert entity_counts.nunique() == 1, "Panel must be balanced for spatial models"
Estimation Methods¶
The default estimation approach uses two-step GMM with spatial instruments. Fixed effects are removed via within-transformation.
How it works:
- Within-transform \(y\) and \(X\) to remove entity effects
- Construct spatial instruments \(Z = [X, WX, W^2X, \ldots, W^{n\_lags}X]\)
- First stage: initial 2SLS estimates
- Second stage: optimal GMM with efficient weighting matrix
The n_lags parameter controls how many spatial lags of \(X\) are used as instruments. More lags = more instruments = potentially more efficient but also more risk of weak instruments.
GMM estimation without entity effects:
Interpreting Results¶
The Spatial Error Parameter \(\lambda\)¶
- \(\lambda > 0\): positive spatial correlation in unobservables (most common)
- \(\lambda < 0\): negative spatial correlation in unobservables
- \(\lambda = 0\): no spatial error correlation (reduces to standard panel model)
Unlike \(\rho\) in the SAR model, \(\lambda\) does not have a direct economic interpretation in terms of spillover magnitude. It indicates the presence of spatially correlated omitted variables.
Coefficients as Marginal Effects¶
A major advantage of the SEM: \(\beta\) coefficients are directly interpretable as marginal effects:
There are no indirect effects to compute, no spatial multiplier to worry about. This simplicity makes SEM attractive when the substantive interest is in the \(\beta\) coefficients rather than in spatial spillovers.
SEM vs OLS
If \(\lambda \neq 0\) but you use OLS:
- Point estimates \(\hat{\beta}_{OLS}\) are consistent but inefficient
- Standard errors are biased (typically too small), leading to incorrect inference
- The SEM corrects both problems
GMM Instruments¶
The GMM estimator uses spatial lags of \(X\) as instruments:
n_lags |
Instruments | Trade-off |
|---|---|---|
| 1 | \([X, WX]\) | Fewer instruments, may be less efficient |
| 2 | \([X, WX, W^2X]\) | Default, good balance |
| 3 | \([X, WX, W^2X, W^3X]\) | More instruments, better efficiency if valid |
Note
Higher n_lags increases the number of instruments. With too many instruments relative to \(N\), the GMM estimator can be biased toward OLS. A rule of thumb: keep the instrument count well below \(N\).
Configuration Options¶
fit() Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
effects |
str |
'fixed' |
'fixed', 'random', or 'pooled' |
method |
str |
'gmm' |
'gmm' (default) or 'ml' |
n_lags |
int |
2 |
Number of spatial lag instruments (GMM only) |
maxiter |
int |
1000 |
Maximum iterations |
verbose |
bool |
False |
Print optimization progress |
Results Attributes¶
| Attribute | Type | Description |
|---|---|---|
params |
pd.Series |
All estimated parameters (including \(\lambda\)) |
bse |
pd.Series |
Standard errors |
tvalues |
pd.Series |
t-statistics |
pvalues |
pd.Series |
Two-tailed p-values |
llf |
float |
Log-likelihood (ML) or quasi-log-likelihood |
aic |
float |
Akaike Information Criterion |
bic |
float |
Bayesian Information Criterion |
rsquared_pseudo |
float |
Pseudo R-squared |
resid |
np.ndarray |
Residuals |
fitted_values |
np.ndarray |
Fitted values |
method |
str |
Estimation method ('GMM' or 'ML') |
effects |
str |
Effects specification |
Predictions¶
For the SEM, prediction is straightforward because there is no spatial multiplier in the conditional mean:
Since \(E[u] = 0\), the predicted value is simply \(\hat{y} = X\hat{\beta}\) (plus fixed effects if applicable). The spatial structure only affects the error covariance, not the prediction.
Diagnostics¶
Motivation: Moran's I on OLS Residuals¶
The SEM is typically motivated by finding significant spatial autocorrelation in OLS residuals:
from panelbox import FixedEffects
from panelbox.diagnostics.spatial import MoranIPanelTest
# Step 1: Fit standard FE model
fe_model = FixedEffects("y ~ x1 + x2", data, "region", "year")
fe_results = fe_model.fit()
# Step 2: Test residuals for spatial autocorrelation
moran = MoranIPanelTest(fe_results.resid, W.matrix)
moran_result = moran.run()
print(f"Moran's I: {moran_result.statistic:.4f} (p = {moran_result.pvalue:.4f})")
# If significant: spatial model needed
Post-Estimation Checks¶
After fitting the SEM, verify that spatial autocorrelation in residuals has been eliminated:
# Moran's I on SEM residuals
moran_post = MoranIPanelTest(results.resid, W.matrix)
post_result = moran_post.run()
print(f"Post-SEM Moran's I: {post_result.statistic:.4f} (p = {post_result.pvalue:.4f})")
# Should be non-significant
See Spatial Diagnostics for the full diagnostic workflow.
Tutorials¶
| Tutorial | Description | Links |
|---|---|---|
| Spatial Econometrics | Complete SAR, SEM, SDM workflow |
See Also¶
- Spatial Weight Matrices — How to construct \(W\)
- Spatial Lag (SAR) — When dependence is in outcomes, not errors
- Spatial Durbin (SDM) — Nests both SAR and SEM as special cases
- Direct, Indirect, and Total Effects — SEM has no indirect effects
- Choosing a Spatial Model — When to use SEM vs SAR vs SDM
- Spatial Diagnostics — LM tests to distinguish SAR from SEM
References¶
- Anselin, L. (1988). Spatial Econometrics: Methods and Models. Kluwer Academic.
- Kelejian, H.H. and Prucha, I.R. (1998). A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. Journal of Real Estate Finance and Economics, 17(1), 99-121.
- Kapoor, M., Kelejian, H.H., and Prucha, I.R. (2007). Panel data models with spatially correlated error components. Journal of Econometrics, 140(1), 97-130.
- Elhorst, J.P. (2014). Spatial Econometrics: From Cross-Sectional Data to Spatial Panels. Springer.