Inference & Standard Errors¶
Correct statistical inference in panel data models depends on choosing appropriate standard errors. Classical OLS standard errors assume homoskedastic, independent errors --- assumptions that rarely hold in panel data. PanelBox provides 8 types of standard errors, all built on a unified sandwich estimator framework.
Why Standard Errors Matter¶
Standard errors determine hypothesis tests, confidence intervals, and p-values. Using the wrong standard errors leads to:
- Overly optimistic inference (too many "significant" results) when errors are correlated but treated as independent
- Overly conservative inference (too few "significant" results) when using an unnecessarily complex SE estimator with limited data
- Invalid confidence intervals that don't achieve their nominal coverage
The most common mistake
Using classical (non-robust) standard errors in panel data almost always understates uncertainty. At minimum, use clustered standard errors by entity.
The Sandwich Estimator Framework¶
All robust standard errors in PanelBox follow the sandwich estimator structure:
The bread \((X'X)^{-1}\) is the same across all types. What changes is the meat \(\hat{\Omega}\), which captures the assumed error structure:
| SE Type | Meat \(\hat{\Omega}\) | Handles |
|---|---|---|
| Classical | \(\hat{\sigma}^2 X'X\) | Nothing (homoskedastic, independent) |
| Robust (HC) | \(X' \text{diag}(\hat{e}_i^2) X\) | Heteroskedasticity |
| Clustered | \(\sum_g (X_g' \hat{u}_g)(X_g' \hat{u}_g)'\) | Within-cluster correlation |
| Driscoll-Kraay | HAC on time-averaged moments | Heteroskedasticity + autocorrelation + cross-sectional dependence |
| Newey-West | \(\hat{\Gamma}_0 + \sum_j w_j(\hat{\Gamma}_j + \hat{\Gamma}_j')\) | Heteroskedasticity + autocorrelation |
| PCSE | \(X'(\hat{\Sigma} \otimes I_T)X\) | Cross-sectional correlation (macro panels) |
| Spatial HAC | Distance-weighted cross-products | Spatial + temporal correlation |
| MLE Sandwich | \(\sum s_i s_i'\) (score outer product) | Misspecification in nonlinear models |
Decision Tree: Choosing the Right Standard Errors¶
Use the following guide to select the appropriate SE type for your data:
| Data Characteristic | Recommended SE | PanelBox cov_type |
Page |
|---|---|---|---|
| Heteroskedasticity only | Robust HC1 | "robust" or "hc1" |
Robust |
| Within-entity correlation | Clustered (entity) | "clustered" |
Clustered |
| Entity + time correlation | Two-way clustered | "twoway" |
Clustered |
| Cross-sectional dependence | Driscoll-Kraay | "driscoll_kraay" |
Driscoll-Kraay |
| Autocorrelation (time series) | Newey-West | "newey_west" |
Newey-West |
| Macro panels (\(T > N\)) | PCSE | "pcse" |
PCSE |
| Spatial correlation | Spatial HAC | via SpatialHAC class |
Spatial HAC |
| Nonlinear models (MLE) | MLE sandwich | "robust" |
MLE Variance |
Default recommendation
For most panel data applications, cluster by entity is a safe default. It allows for arbitrary within-entity correlation across time, which is the most common dependence structure.
How PanelBox Integrates Standard Errors¶
High-Level API: model.fit(cov_type=...)¶
The simplest way to use robust standard errors is through the cov_type parameter:
from panelbox.models import FixedEffects
model = FixedEffects("y ~ x1 + x2", data, entity="firm", time="year")
# Classical standard errors (default)
results = model.fit()
# Robust HC1 standard errors
results = model.fit(cov_type="robust")
# Clustered by entity
results = model.fit(cov_type="clustered")
# Driscoll-Kraay with custom lags
results = model.fit(cov_type="driscoll_kraay", max_lags=3)
Low-Level API: Direct Classes and Functions¶
For more control, use the classes and convenience functions directly:
from panelbox.standard_errors import (
RobustStandardErrors, robust_covariance,
ClusteredStandardErrors, cluster_by_entity, twoway_cluster,
DriscollKraayStandardErrors, driscoll_kraay,
NeweyWestStandardErrors, newey_west,
PanelCorrectedStandardErrors, pcse,
SpatialHAC,
StandardErrorComparison,
)
# All result objects share a common interface:
result = robust_covariance(X, resid, method="HC1")
result.cov_matrix # np.ndarray (k x k) - Covariance matrix
result.std_errors # np.ndarray (k,) - Standard errors
result.n_obs # int - Number of observations
result.n_params # int - Number of parameters
Comparing SE Methods¶
Use StandardErrorComparison to systematically compare different SE types:
from panelbox.standard_errors import StandardErrorComparison
comparison = StandardErrorComparison(results)
comp = comparison.compare_all()
# Examine how inference changes across SE types
print(comp.se_comparison) # SEs by type
print(comp.significance) # Significance stars
# Visualize differences
comparison.plot_comparison(comp, alpha=0.05)
See the Comparison page for details.
Quick Example¶
import panelbox as pb
# Load panel data
data = pb.datasets.load_grunfeld()
# Fit Fixed Effects model with different SE types
model = pb.FixedEffects("invest ~ value + capital", data, entity="firm", time="year")
# Compare standard errors
results_classical = model.fit()
results_robust = model.fit(cov_type="robust")
results_clustered = model.fit(cov_type="clustered")
# Print comparison
print("Classical SE: ", results_classical.std_errors.values)
print("Robust SE: ", results_robust.std_errors.values)
print("Clustered SE: ", results_clustered.std_errors.values)
Common Mistakes¶
Pitfall 1: Too few clusters
Clustered standard errors require a sufficient number of clusters (rule of thumb: \(G \geq 50\)). With fewer clusters, SEs are biased downward. Consider wild cluster bootstrap when \(G < 50\).
Pitfall 2: Wrong clustering dimension
Cluster at the level where treatment varies. If a policy intervention varies by state, cluster by state --- not by individual.
Pitfall 3: Ignoring cross-sectional dependence
Entity-clustered SEs do not account for cross-sectional dependence (common shocks). If entities are affected by common factors (e.g., macroeconomic shocks), use Driscoll-Kraay or two-way clustering.
Pitfall 4: PCSE with micro panels
PCSE requires \(T > N\). Using PCSE with typical micro panels (\(N \gg T\)) produces unreliable results. Use clustered SEs instead.
Software Equivalents¶
| PanelBox | Stata | R |
|---|---|---|
cov_type="robust" |
vce(robust) |
sandwich::vcovHC() |
cov_type="clustered" |
vce(cluster id) |
plm::vcovHC(cluster="group") |
cov_type="driscoll_kraay" |
xtscc (Hoechle 2007) |
plm::vcovSCC() |
cov_type="newey_west" |
newey |
sandwich::NeweyWest() |
cov_type="pcse" |
xtpcse |
pcse::pcse() |
SpatialHAC |
acreg (Colella et al.) |
conleyreg::conley() |
Learning Path¶
-
Start here
Robust (HC0-HC3) --- Heteroskedasticity-robust standard errors. The foundation for all other methods.
-
Most common
Clustered --- One-way and two-way clustering. The workhorse for panel data applications.
-
Time dependence
Driscoll-Kraay and Newey-West --- HAC estimators for autocorrelation and cross-sectional dependence.
-
Spatial data
Spatial HAC --- Conley (1999) for geographically correlated errors.
-
Macro panels
PCSE --- Beck & Katz (1995) for time-series cross-section data with \(T > N\).
-
Nonlinear models
MLE Variance --- Sandwich, delta method, and bootstrap for MLE estimators.
-
Compare all
Comparison --- Systematically compare SE methods and assess inference sensitivity.
References¶
- White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817-838.
- Arellano, M. (1987). Computing robust standard errors for within-groups estimators. Oxford Bulletin of Economics and Statistics, 49(4), 431-434.
- Newey, W. K., & West, K. D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3), 703-708.
- Beck, N., & Katz, J. N. (1995). What to do (and not to do) with time-series cross-section data. American Political Science Review, 89(3), 634-647.
- Driscoll, J. C., & Kraay, A. C. (1998). Consistent covariance matrix estimation with spatially dependent panel data. Review of Economics and Statistics, 80(4), 549-560.
- Conley, T. G. (1999). GMM estimation with cross sectional dependence. Journal of Econometrics, 92(1), 1-45.
- Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011). Robust inference with multiway clustering. Journal of Business & Economic Statistics, 29(2), 238-249.
- Petersen, M. A. (2009). Estimating standard errors in finance panel data sets: Comparing approaches. Review of Financial Studies, 22(1), 435-480.