Skip to content

Inference & Standard Errors

Correct statistical inference in panel data models depends on choosing appropriate standard errors. Classical OLS standard errors assume homoskedastic, independent errors --- assumptions that rarely hold in panel data. PanelBox provides 8 types of standard errors, all built on a unified sandwich estimator framework.

Why Standard Errors Matter

Standard errors determine hypothesis tests, confidence intervals, and p-values. Using the wrong standard errors leads to:

  • Overly optimistic inference (too many "significant" results) when errors are correlated but treated as independent
  • Overly conservative inference (too few "significant" results) when using an unnecessarily complex SE estimator with limited data
  • Invalid confidence intervals that don't achieve their nominal coverage

The most common mistake

Using classical (non-robust) standard errors in panel data almost always understates uncertainty. At minimum, use clustered standard errors by entity.

The Sandwich Estimator Framework

All robust standard errors in PanelBox follow the sandwich estimator structure:

\[ V = \underbrace{(X'X)^{-1}}_{\text{Bread}} \times \underbrace{\hat{\Omega}}_{\text{Meat}} \times \underbrace{(X'X)^{-1}}_{\text{Bread}} \]

The bread \((X'X)^{-1}\) is the same across all types. What changes is the meat \(\hat{\Omega}\), which captures the assumed error structure:

SE Type Meat \(\hat{\Omega}\) Handles
Classical \(\hat{\sigma}^2 X'X\) Nothing (homoskedastic, independent)
Robust (HC) \(X' \text{diag}(\hat{e}_i^2) X\) Heteroskedasticity
Clustered \(\sum_g (X_g' \hat{u}_g)(X_g' \hat{u}_g)'\) Within-cluster correlation
Driscoll-Kraay HAC on time-averaged moments Heteroskedasticity + autocorrelation + cross-sectional dependence
Newey-West \(\hat{\Gamma}_0 + \sum_j w_j(\hat{\Gamma}_j + \hat{\Gamma}_j')\) Heteroskedasticity + autocorrelation
PCSE \(X'(\hat{\Sigma} \otimes I_T)X\) Cross-sectional correlation (macro panels)
Spatial HAC Distance-weighted cross-products Spatial + temporal correlation
MLE Sandwich \(\sum s_i s_i'\) (score outer product) Misspecification in nonlinear models

Decision Tree: Choosing the Right Standard Errors

Use the following guide to select the appropriate SE type for your data:

Data Characteristic Recommended SE PanelBox cov_type Page
Heteroskedasticity only Robust HC1 "robust" or "hc1" Robust
Within-entity correlation Clustered (entity) "clustered" Clustered
Entity + time correlation Two-way clustered "twoway" Clustered
Cross-sectional dependence Driscoll-Kraay "driscoll_kraay" Driscoll-Kraay
Autocorrelation (time series) Newey-West "newey_west" Newey-West
Macro panels (\(T > N\)) PCSE "pcse" PCSE
Spatial correlation Spatial HAC via SpatialHAC class Spatial HAC
Nonlinear models (MLE) MLE sandwich "robust" MLE Variance

Default recommendation

For most panel data applications, cluster by entity is a safe default. It allows for arbitrary within-entity correlation across time, which is the most common dependence structure.

How PanelBox Integrates Standard Errors

High-Level API: model.fit(cov_type=...)

The simplest way to use robust standard errors is through the cov_type parameter:

from panelbox.models import FixedEffects

model = FixedEffects("y ~ x1 + x2", data, entity="firm", time="year")

# Classical standard errors (default)
results = model.fit()

# Robust HC1 standard errors
results = model.fit(cov_type="robust")

# Clustered by entity
results = model.fit(cov_type="clustered")

# Driscoll-Kraay with custom lags
results = model.fit(cov_type="driscoll_kraay", max_lags=3)

Low-Level API: Direct Classes and Functions

For more control, use the classes and convenience functions directly:

from panelbox.standard_errors import (
    RobustStandardErrors, robust_covariance,
    ClusteredStandardErrors, cluster_by_entity, twoway_cluster,
    DriscollKraayStandardErrors, driscoll_kraay,
    NeweyWestStandardErrors, newey_west,
    PanelCorrectedStandardErrors, pcse,
    SpatialHAC,
    StandardErrorComparison,
)

# All result objects share a common interface:
result = robust_covariance(X, resid, method="HC1")
result.cov_matrix   # np.ndarray (k x k) - Covariance matrix
result.std_errors   # np.ndarray (k,)    - Standard errors
result.n_obs        # int                - Number of observations
result.n_params     # int                - Number of parameters

Comparing SE Methods

Use StandardErrorComparison to systematically compare different SE types:

from panelbox.standard_errors import StandardErrorComparison

comparison = StandardErrorComparison(results)
comp = comparison.compare_all()

# Examine how inference changes across SE types
print(comp.se_comparison)   # SEs by type
print(comp.significance)    # Significance stars

# Visualize differences
comparison.plot_comparison(comp, alpha=0.05)

See the Comparison page for details.

Quick Example

import panelbox as pb

# Load panel data
data = pb.datasets.load_grunfeld()

# Fit Fixed Effects model with different SE types
model = pb.FixedEffects("invest ~ value + capital", data, entity="firm", time="year")

# Compare standard errors
results_classical = model.fit()
results_robust = model.fit(cov_type="robust")
results_clustered = model.fit(cov_type="clustered")

# Print comparison
print("Classical SE: ", results_classical.std_errors.values)
print("Robust SE:    ", results_robust.std_errors.values)
print("Clustered SE: ", results_clustered.std_errors.values)

Common Mistakes

Pitfall 1: Too few clusters

Clustered standard errors require a sufficient number of clusters (rule of thumb: \(G \geq 50\)). With fewer clusters, SEs are biased downward. Consider wild cluster bootstrap when \(G < 50\).

Pitfall 2: Wrong clustering dimension

Cluster at the level where treatment varies. If a policy intervention varies by state, cluster by state --- not by individual.

Pitfall 3: Ignoring cross-sectional dependence

Entity-clustered SEs do not account for cross-sectional dependence (common shocks). If entities are affected by common factors (e.g., macroeconomic shocks), use Driscoll-Kraay or two-way clustering.

Pitfall 4: PCSE with micro panels

PCSE requires \(T > N\). Using PCSE with typical micro panels (\(N \gg T\)) produces unreliable results. Use clustered SEs instead.

Software Equivalents

PanelBox Stata R
cov_type="robust" vce(robust) sandwich::vcovHC()
cov_type="clustered" vce(cluster id) plm::vcovHC(cluster="group")
cov_type="driscoll_kraay" xtscc (Hoechle 2007) plm::vcovSCC()
cov_type="newey_west" newey sandwich::NeweyWest()
cov_type="pcse" xtpcse pcse::pcse()
SpatialHAC acreg (Colella et al.) conleyreg::conley()

Learning Path

  • Start here


    Robust (HC0-HC3) --- Heteroskedasticity-robust standard errors. The foundation for all other methods.

  • Most common


    Clustered --- One-way and two-way clustering. The workhorse for panel data applications.

  • Time dependence


    Driscoll-Kraay and Newey-West --- HAC estimators for autocorrelation and cross-sectional dependence.

  • Spatial data


    Spatial HAC --- Conley (1999) for geographically correlated errors.

  • Macro panels


    PCSE --- Beck & Katz (1995) for time-series cross-section data with \(T > N\).

  • Nonlinear models


    MLE Variance --- Sandwich, delta method, and bootstrap for MLE estimators.

  • Compare all


    Comparison --- Systematically compare SE methods and assess inference sensitivity.

References

  • White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817-838.
  • Arellano, M. (1987). Computing robust standard errors for within-groups estimators. Oxford Bulletin of Economics and Statistics, 49(4), 431-434.
  • Newey, W. K., & West, K. D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3), 703-708.
  • Beck, N., & Katz, J. N. (1995). What to do (and not to do) with time-series cross-section data. American Political Science Review, 89(3), 634-647.
  • Driscoll, J. C., & Kraay, A. C. (1998). Consistent covariance matrix estimation with spatially dependent panel data. Review of Economics and Statistics, 80(4), 549-560.
  • Conley, T. G. (1999). GMM estimation with cross sectional dependence. Journal of Econometrics, 92(1), 1-45.
  • Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011). Robust inference with multiway clustering. Journal of Business & Economic Statistics, 29(2), 238-249.
  • Petersen, M. A. (2009). Estimating standard errors in finance panel data sets: Comparing approaches. Review of Financial Studies, 22(1), 435-480.