Skip to content

Spatial Benchmarks

This page presents detailed performance benchmarks for PanelBox's spatial panel models: SAR (Spatial Autoregressive), SEM (Spatial Error), SDM (Spatial Durbin), and dynamic spatial models. Spatial models are the most computationally intensive in PanelBox due to log-determinant calculations over the N x N weight matrix.

Benchmark Environment

CPU: Intel i7-10700K (8 cores, 3.8 GHz) | RAM: 32 GB DDR4 | Python: 3.12 | NumPy: 2.0 (MKL)

Weight matrices use queen contiguity with ~5% sparsity unless noted. Each benchmark averaged over 10 runs.

ML Estimation by Model Type

SAR Model (Spatial Autoregressive / Spatial Lag)

Panel Size (N x T) Time Memory Converged
100 x 10 0.82s 85 MB Yes
500 x 10 8.23s 245 MB Yes
1000 x 10 25.3s 687 MB Yes
2000 x 10 98.4s 2,134 MB Yes
5000 x 5 194s 3,892 MB Yes

SEM Model (Spatial Error)

Panel Size (N x T) Time Memory Converged
100 x 10 0.71s 82 MB Yes
500 x 10 7.89s 238 MB Yes
1000 x 10 24.1s 672 MB Yes
2000 x 10 91.4s 2,089 MB Yes
5000 x 5 182s 3,756 MB Yes

SEM is slightly faster than SAR due to a simpler likelihood structure.

SDM Model (Spatial Durbin)

Panel Size (N x T) Time Memory Converged
100 x 10 1.54s 98 MB Yes
500 x 10 12.4s 312 MB Yes
1000 x 10 38.7s 891 MB Yes
2000 x 10 157s 2,567 MB Yes
5000 x 5 391s 4,923 MB Yes

SDM takes ~50% more time than SAR due to additional spatially lagged covariate parameters.

Cross-Model Summary (N=500, T=10)

Model Time vs SAR
SAR (Spatial Lag) 8.2s 1.0x
SEM (Spatial Error) 7.9s 0.96x
SDM (Spatial Durbin) 12.4s 1.51x
Dynamic Spatial ~40s ~4.9x

Scaling Analysis

Time Complexity

Log-log regression of estimation time vs N:

Model Exponent (beta) Approximate Complexity
SAR 2.83 O(N^2.8)
SEM 2.79 O(N^2.8)
SDM 2.91 O(N^2.9)

Near-cubic scaling is expected for direct ML methods due to the log-determinant computation involving eigenvalue decomposition of the N x N weight matrix.

Memory Scaling

  • Dense W matrix: O(N^2) — dominates for large N
  • Sparse W matrix (< 10% fill): O(N) — significant savings for large N

Memory breakdown for N=1000 (dense W):

Total: 687 MB
├── Weight matrix W: 382 MB (55.6%)
├── Data matrices:   124 MB (18.0%)
├── Eigenvalue cache: 89 MB (13.0%)
├── Optimization:     61 MB  (8.9%)
└── Other:            31 MB  (4.5%)

Sparse vs Dense Performance

Testing with N=500, T=10 at varying sparsity levels:

Sparsity Dense Time Sparse Time Speedup Memory Saved
5% 8.23s 3.42s 2.4x 68%
10% 8.23s 4.81s 1.7x 52%
20% 8.23s 6.92s 1.2x 31%
50% 8.23s 8.45s 0.97x -5%

Sparse Threshold

Use sparse matrix operations when W has less than 10% non-zero entries. Above 10%, sparse overhead exceeds the benefit. Most contiguity-based weight matrices (queen, rook) have sparsity well below 10%.

Optimization Impact

Comparison of baseline vs fully optimized estimation (N=500, T=10):

Optimization Improvement
Eigenvalue caching 15-20%
Sparse operations 30-60% (when applicable)
Numba JIT 20-30%
Parallel permutations 3-4x on 8 cores
Combined 3.0-3.5x faster
Panel Size Baseline Optimized Speedup
500 x 10 24.3s 8.2s 3.0x
1000 x 10 89.4s 25.3s 3.5x

Effects Decomposition

Computing direct and indirect effects for SDM models requires matrix inversion, adding overhead proportional to N^2:

Panel Size Direct Effects Indirect Effects Total
N=100 0.12s 0.23s 0.35s
N=500 1.42s 2.81s 4.23s
N=1000 4.92s 9.83s 14.8s

Diagnostic Test Performance

Moran's I Test

Data Size Global Moran Local Moran (LISA) Permutation (999 reps)
N=100 0.08s 0.42s 1.8s
N=500 0.41s 1.89s 9.2s
N=1000 0.93s 4.72s 28.4s
N=5000 8.91s 29.8s 4.2 min

LM Tests (LM-Lag, LM-Error)

LM tests are very fast because they only require OLS residuals:

Data Size LM-Lag LM-Error All LM Tests
N=100 0.02s 0.02s 0.08s
N=500 0.08s 0.09s 0.31s
N=1000 0.21s 0.23s 0.82s
N=5000 2.14s 2.38s 8.42s

Diagnostic Workflow

Run LM tests first (< 1s) to determine if spatial modeling is needed. Only proceed to full ML estimation if LM tests reject the null.

Parallel Processing

Permutation test with 999 permutations (N=500):

CPU Cores Time Speedup Efficiency
1 36.4s 1.0x 100%
2 18.9s 1.93x 96%
4 9.82s 3.71x 93%
8 5.23s 6.97x 87%

Near-linear speedup up to 8 cores, with ~87% parallel efficiency.

Bottleneck Analysis

Primary computational bottlenecks for spatial ML estimation:

Bottleneck % of Runtime Mitigation
Log-determinant calculation 40-50% Eigenvalue caching, Chebyshev approximation
Matrix operations 20-30% Sparse matrices, BLAS optimization
Likelihood optimization 20-25% Better starting values, adaptive algorithms
Memory allocation 5-10% Pre-allocation, memory pooling

Recommendations by Panel Size

  • Use default settings — all models run in < 10s
  • Full ML estimation recommended
  • Dense W matrix is acceptable
  • All diagnostics (Moran's I, LM tests) are fast
  • Enable sparse operations if W has < 10% fill
  • Eigenvalue caching provides 15-20% speedup
  • Consider parallel permutation tests
  • Estimation takes 10-100s depending on model
  • Always use sparse W matrices (memory constraint)
  • Enable all optimizations (sparse + caching + Numba)
  • Consider GMM over ML for faster estimation
  • Use Chebyshev approximation for log-determinant
  • Estimation takes 2-10 minutes
  • Sparse W is required (dense W needs > 200 MB for W alone)
  • Use approximation methods
  • Consider spatial subsampling for diagnostics
  • May need distributed computing for full estimation

Weight Matrix Construction

Method N=500 N=1000 N=5000
Queen contiguity 0.1s 0.3s 2s
K-nearest neighbors (k=10) 0.2s 0.5s 3s
Distance-based (threshold) 0.5s 2s 15s
Inverse distance 0.5s 2s 15s

K-NN for Large N

Use K-nearest neighbors instead of distance-based weights for large N. K-NN produces consistently sparse matrices (exactly k*N non-zero entries) regardless of the spatial distribution.

Hardware Recommendations

Panel Size Minimum RAM Recommended RAM CPU Cores
N < 500 4 GB 8 GB 2+
N < 2000 8 GB 16 GB 4+
N < 5000 16 GB 32 GB 6+
N >= 5000 32 GB 64 GB 8+

References

  • LeSage, J. and Pace, R. K. (2009). Introduction to Spatial Econometrics. Chapman & Hall/CRC.
  • Elhorst, J. P. (2014). Spatial Econometrics: From Cross-Sectional Data to Spatial Panels. Springer.
  • Bivand, R. and Piras, G. (2015). "Comparing implementations of estimation methods for spatial econometrics." Journal of Statistical Software, 63(18).

See Also