Comparing Standard Error Methods¶

Quick Reference

Class: panelbox.standard_errors.StandardErrorComparison Result: panelbox.standard_errors.ComparisonResult Stata equivalent: estimates table, suest R equivalent: modelsummary::modelsummary(), lmtest::coeftest()

Overview¶

Different standard error methods can lead to different conclusions about the same model. A coefficient that appears statistically significant with classical SEs may become insignificant with clustered SEs, or vice versa. PanelBox's StandardErrorComparison class provides a systematic way to compare SE types and identify sensitivity in inference.

Comparing SEs is not just a robustness exercise --- it reveals information about the error structure in your data. If clustered SEs are much larger than robust SEs, it signals substantial within-cluster correlation. If Driscoll-Kraay SEs differ from clustered SEs, cross-sectional dependence may be present.

When to Use¶

As a robustness check in any empirical study
When reviewers request alternative SE specifications
To diagnose the error structure (heteroskedasticity, clustering, cross-sectional dependence)
When inference changes across SE types and you need to decide which to report

Quick Example¶

from panelbox.standard_errors import StandardErrorComparison
from panelbox.models import FixedEffects

# Fit model
model = FixedEffects("y ~ x1 + x2", data, entity="firm", time="year")
results = model.fit()

# Compare all SE types
comparison = StandardErrorComparison(results)
comp = comparison.compare_all()

# Print summary
comparison.summary(comp)

# Visualize
comparison.plot_comparison(comp, alpha=0.05)

StandardErrorComparison Class¶

Initialization¶

comparison = StandardErrorComparison(model_results=results)

The class requires a fitted PanelResults object. It extracts the model, coefficients, and residual degrees of freedom to refit the model with different SE types.

compare_all()¶

Compare multiple SE types simultaneously:

comp = comparison.compare_all(
    se_types=["nonrobust", "robust", "hc3", "clustered", "driscoll_kraay"],
)

If se_types is not specified, the default list includes "nonrobust", "robust", "hc3", and "clustered". Advanced types ("driscoll_kraay", "newey_west") are added automatically when the sample is large enough.

Parameter	Type	Default	Description
`se_types`	`list` or `None`	`None`	SE types to compare
`**kwargs`	---	---	Additional params (e.g., `max_lags` for DK/NW)

compare_pair()¶

Compare two specific SE types:

comp = comparison.compare_pair("robust", "clustered")

plot_comparison()¶

Visualize standard errors and confidence intervals across SE types:

fig = comparison.plot_comparison(comp, alpha=0.05, figsize=(12, 8))

Produces two panels:

Bar chart: Standard errors by coefficient and SE type
Coefficient plot: Point estimates with confidence intervals for each SE type

summary()¶

Print a detailed text summary:

comparison.summary(comp)

The summary includes:

Standard errors by type
SE ratios relative to baseline
Significance levels across SE types
Inference sensitivity analysis (which coefficients change significance)

ComparisonResult¶

The compare_all() and compare_pair() methods return a ComparisonResult dataclass:

Attribute	Type	Description
`se_comparison`	`pd.DataFrame`	Standard errors by type (rows = coefficients, columns = SE types)
`se_ratios`	`pd.DataFrame`	Ratios relative to baseline (typically `nonrobust`)
`t_stats`	`pd.DataFrame`	t-statistics under each SE type
`p_values`	`pd.DataFrame`	Two-tailed p-values
`ci_lower`	`pd.DataFrame`	Lower bounds of 95% confidence intervals
`ci_upper`	`pd.DataFrame`	Upper bounds of 95% confidence intervals
`significance`	`pd.DataFrame`	Significance stars (``, ``, `**`)
`summary_stats`	`pd.DataFrame`	Summary statistics (mean, std, min, max, range, CV)

Interpretation Guidelines¶

Reading SE Ratios¶

The se_ratios DataFrame shows how each SE type compares to the baseline:

Pattern	Likely Diagnosis	Action
Robust/Classical \(\approx\) 1	Homoskedastic errors	Classical SE are fine
Robust/Classical \(> 1.2\)	Heteroskedasticity	Use robust or clustered SE
Clustered/Robust \(> 1.3\)	Within-cluster correlation	Use clustered SE
DK/Clustered \(> 1.2\)	Cross-sectional dependence	Use Driscoll-Kraay
All SE types similar	Well-behaved errors	Any method works; report clustered
Large variation across types	Complex error structure	Investigate further; consider multiple methods

When Inference Changes¶

If a coefficient is significant under some SE types but not others, this signals fragile inference. The summary() method flags such coefficients:

comparison.summary(comp)
# Output includes:
# "Coefficients with inconsistent inference across SE types:
#   x2:
#     Significant with: nonrobust, robust
#     Not significant with: clustered, driscoll_kraay"

In such cases:

Report the most conservative SE (usually clustered or DK)
Acknowledge sensitivity in your paper
Investigate the error structure to understand why SEs differ

Full Comparison Example¶

import panelbox as pb
from panelbox.standard_errors import StandardErrorComparison

# Load data and fit model
data = pb.datasets.load_grunfeld()
model = pb.FixedEffects("invest ~ value + capital", data, entity="firm", time="year")
results = model.fit()

# Comprehensive comparison
comparison = StandardErrorComparison(results)
comp = comparison.compare_all(
    se_types=["nonrobust", "robust", "hc3", "clustered"],
)

# View SE comparison table
print("Standard Errors by Type:")
print(comp.se_comparison.to_string(float_format=lambda x: f"{x:.4f}"))
print()

# View significance across methods
print("Significance:")
print(comp.significance)
print()

# View SE ratios (relative to nonrobust)
print("SE Ratios (vs nonrobust):")
print(comp.se_ratios.to_string(float_format=lambda x: f"{x:.3f}"))

# Plot
fig = comparison.plot_comparison(comp)

Best Practices for Reporting¶

Default: Report clustered SE by entity as your primary specification
Robustness table: Show results under 2-3 alternative SE types
Diagnose first: Run diagnostic tests (Breusch-Pagan, Pesaran CD, Wooldridge AR) to justify your SE choice
Document: State which SE type you use and why in your methodology section
Sensitivity: If inference is sensitive to SE choice, acknowledge this and present results under multiple specifications

Common Pitfalls¶

Pitfall 1: Cherry-picking SEs

Do not choose the SE type that gives you the results you want. Choose based on the data structure and diagnostic tests, then report robustness to alternatives.

Pitfall 2: Comparing SEs without context

Large differences between SE types are informative, not alarming. They reveal information about the error structure. Use diagnostic tests to understand why SEs differ.

Pitfall 3: Ignoring the comparison

Reporting only one SE type without robustness checks is increasingly viewed as insufficient by journals. Always include at least one alternative specification.

References¶

Petersen, M. A. (2009). Estimating standard errors in finance panel data sets: Comparing approaches. Review of Financial Studies, 22(1), 435-480.
Thompson, S. B. (2011). Simple formulas for standard errors that cluster by both firm and time. Journal of Financial Economics, 99(1), 1-10.
Cameron, A. C., & Miller, D. L. (2015). A practitioner's guide to cluster-robust inference. Journal of Human Resources, 50(2), 317-372.
Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics. Princeton University Press.