Spatial Benchmarks¶

This page presents detailed performance benchmarks for PanelBox's spatial panel models: SAR (Spatial Autoregressive), SEM (Spatial Error), SDM (Spatial Durbin), and dynamic spatial models. Spatial models are the most computationally intensive in PanelBox due to log-determinant calculations over the N x N weight matrix.

Benchmark Environment

CPU: Intel i7-10700K (8 cores, 3.8 GHz) | RAM: 32 GB DDR4 | Python: 3.12 | NumPy: 2.0 (MKL)

Weight matrices use queen contiguity with ~5% sparsity unless noted. Each benchmark averaged over 10 runs.

ML Estimation by Model Type¶

SAR Model (Spatial Autoregressive / Spatial Lag)¶

Panel Size (N x T)	Time	Memory	Converged
100 x 10	0.82s	85 MB	Yes
500 x 10	8.23s	245 MB	Yes
1000 x 10	25.3s	687 MB	Yes
2000 x 10	98.4s	2,134 MB	Yes
5000 x 5	194s	3,892 MB	Yes

SEM Model (Spatial Error)¶

Panel Size (N x T)	Time	Memory	Converged
100 x 10	0.71s	82 MB	Yes
500 x 10	7.89s	238 MB	Yes
1000 x 10	24.1s	672 MB	Yes
2000 x 10	91.4s	2,089 MB	Yes
5000 x 5	182s	3,756 MB	Yes

SEM is slightly faster than SAR due to a simpler likelihood structure.

SDM Model (Spatial Durbin)¶

Panel Size (N x T)	Time	Memory	Converged
100 x 10	1.54s	98 MB	Yes
500 x 10	12.4s	312 MB	Yes
1000 x 10	38.7s	891 MB	Yes
2000 x 10	157s	2,567 MB	Yes
5000 x 5	391s	4,923 MB	Yes

SDM takes ~50% more time than SAR due to additional spatially lagged covariate parameters.

Cross-Model Summary (N=500, T=10)¶

Model	Time	vs SAR
SAR (Spatial Lag)	8.2s	1.0x
SEM (Spatial Error)	7.9s	0.96x
SDM (Spatial Durbin)	12.4s	1.51x
Dynamic Spatial	~40s	~4.9x

Scaling Analysis¶

Time Complexity¶

Log-log regression of estimation time vs N:

Model	Exponent (beta)	Approximate Complexity
SAR	2.83	O(N^2.8)
SEM	2.79	O(N^2.8)
SDM	2.91	O(N^2.9)

Near-cubic scaling is expected for direct ML methods due to the log-determinant computation involving eigenvalue decomposition of the N x N weight matrix.

Memory Scaling¶

Dense W matrix: O(N^2) — dominates for large N
Sparse W matrix (< 10% fill): O(N) — significant savings for large N

Memory breakdown for N=1000 (dense W):

Total: 687 MB
├── Weight matrix W: 382 MB (55.6%)
├── Data matrices:   124 MB (18.0%)
├── Eigenvalue cache: 89 MB (13.0%)
├── Optimization:     61 MB  (8.9%)
└── Other:            31 MB  (4.5%)

Sparse vs Dense Performance¶

Testing with N=500, T=10 at varying sparsity levels:

Sparsity	Dense Time	Sparse Time	Speedup	Memory Saved
5%	8.23s	3.42s	2.4x	68%
10%	8.23s	4.81s	1.7x	52%
20%	8.23s	6.92s	1.2x	31%
50%	8.23s	8.45s	0.97x	-5%

Sparse Threshold

Use sparse matrix operations when W has less than 10% non-zero entries. Above 10%, sparse overhead exceeds the benefit. Most contiguity-based weight matrices (queen, rook) have sparsity well below 10%.

Optimization Impact¶

Comparison of baseline vs fully optimized estimation (N=500, T=10):

Optimization	Improvement
Eigenvalue caching	15-20%
Sparse operations	30-60% (when applicable)
Numba JIT	20-30%
Parallel permutations	3-4x on 8 cores
Combined	3.0-3.5x faster

Panel Size	Baseline	Optimized	Speedup
500 x 10	24.3s	8.2s	3.0x
1000 x 10	89.4s	25.3s	3.5x

Effects Decomposition¶

Computing direct and indirect effects for SDM models requires matrix inversion, adding overhead proportional to N^2:

Panel Size	Direct Effects	Indirect Effects	Total
N=100	0.12s	0.23s	0.35s
N=500	1.42s	2.81s	4.23s
N=1000	4.92s	9.83s	14.8s

Diagnostic Test Performance¶

Moran's I Test¶

Data Size	Global Moran	Local Moran (LISA)	Permutation (999 reps)
N=100	0.08s	0.42s	1.8s
N=500	0.41s	1.89s	9.2s
N=1000	0.93s	4.72s	28.4s
N=5000	8.91s	29.8s	4.2 min

LM Tests (LM-Lag, LM-Error)¶

LM tests are very fast because they only require OLS residuals:

Data Size	LM-Lag	LM-Error	All LM Tests
N=100	0.02s	0.02s	0.08s
N=500	0.08s	0.09s	0.31s
N=1000	0.21s	0.23s	0.82s
N=5000	2.14s	2.38s	8.42s

Diagnostic Workflow

Run LM tests first (< 1s) to determine if spatial modeling is needed. Only proceed to full ML estimation if LM tests reject the null.

Parallel Processing¶

Permutation test with 999 permutations (N=500):

CPU Cores	Time	Speedup	Efficiency
1	36.4s	1.0x	100%
2	18.9s	1.93x	96%
4	9.82s	3.71x	93%
8	5.23s	6.97x	87%

Near-linear speedup up to 8 cores, with ~87% parallel efficiency.

Bottleneck Analysis¶

Primary computational bottlenecks for spatial ML estimation:

Bottleneck	% of Runtime	Mitigation
Log-determinant calculation	40-50%	Eigenvalue caching, Chebyshev approximation
Matrix operations	20-30%	Sparse matrices, BLAS optimization
Likelihood optimization	20-25%	Better starting values, adaptive algorithms
Memory allocation	5-10%	Pre-allocation, memory pooling

Recommendations by Panel Size¶

Small (N < 500)Medium (500 <= N < 2000)Large (2000 <= N < 5000)Very Large (N >= 5000)

Use default settings — all models run in < 10s
Full ML estimation recommended
Dense W matrix is acceptable
All diagnostics (Moran's I, LM tests) are fast

Enable sparse operations if W has < 10% fill
Eigenvalue caching provides 15-20% speedup
Consider parallel permutation tests
Estimation takes 10-100s depending on model

Always use sparse W matrices (memory constraint)
Enable all optimizations (sparse + caching + Numba)
Consider GMM over ML for faster estimation
Use Chebyshev approximation for log-determinant
Estimation takes 2-10 minutes

Sparse W is required (dense W needs > 200 MB for W alone)
Use approximation methods
Consider spatial subsampling for diagnostics
May need distributed computing for full estimation

Weight Matrix Construction¶

Method	N=500	N=1000	N=5000
Queen contiguity	0.1s	0.3s	2s
K-nearest neighbors (k=10)	0.2s	0.5s	3s
Distance-based (threshold)	0.5s	2s	15s
Inverse distance	0.5s	2s	15s

K-NN for Large N

Use K-nearest neighbors instead of distance-based weights for large N. K-NN produces consistently sparse matrices (exactly k*N non-zero entries) regardless of the spatial distribution.

Hardware Recommendations¶

Panel Size	Minimum RAM	Recommended RAM	CPU Cores
N < 500	4 GB	8 GB	2+
N < 2000	8 GB	16 GB	4+
N < 5000	16 GB	32 GB	6+
N >= 5000	32 GB	64 GB	8+

References¶

LeSage, J. and Pace, R. K. (2009). Introduction to Spatial Econometrics. Chapman & Hall/CRC.
Elhorst, J. P. (2014). Spatial Econometrics: From Cross-Sectional Data to Spatial Panels. Springer.
Bivand, R. and Piras, G. (2015). "Comparing implementations of estimation methods for spatial econometrics." Journal of Statistical Software, 63(18).