Sensitivity Analysis¶

Quick Reference

Class: panelbox.validation.robustness.SensitivityAnalysis Import: from panelbox.validation.robustness import SensitivityAnalysis Key methods: leave_one_out_entities(), leave_one_out_periods(), subset_sensitivity() Stata equivalent: Custom iteration R equivalent: sensemakr::sensemakr()

What It Tests¶

Sensitivity analysis evaluates whether parameter estimates are stable when the sample composition changes. If dropping one entity, one time period, or a random 20% of observations shifts a coefficient from significant to insignificant, the finding is fragile and should be reported as such.

PanelBox provides three complementary sensitivity modes:

Mode	What It Does	Question Answered
Leave-one-out entities	Drop each entity, re-estimate	"Does one country/firm drive the results?"
Leave-one-out periods	Drop each time period, re-estimate	"Does one year/quarter drive the results?"
Subset sensitivity	Random 80% subsamples, re-estimate	"Are results stable across random splits?"

Quick Example¶

from panelbox import FixedEffects
from panelbox.validation.robustness import SensitivityAnalysis
from panelbox.datasets import load_grunfeld

data = load_grunfeld()
model = FixedEffects("invest ~ value + capital", data, "firm", "year")
results = model.fit()

# Sensitivity analysis
sa = SensitivityAnalysis(results, show_progress=False)

# Leave-one-out entities
loo_entities = sa.leave_one_out_entities(influence_threshold=2.0)
print(sa.summary(loo_entities))

# Which entities are influential?
print(f"Influential entities: {loo_entities.influential_units}")

# Visualize coefficient stability
fig = sa.plot_sensitivity(loo_entities, params=["value", "capital"])

Three Analysis Modes¶

Leave-One-Out Entities¶

Drop each entity \(i\) from the dataset and re-estimate the model \(N\) times. An entity is flagged as influential if removing it shifts any coefficient by more than influence_threshold standard errors.

sa = SensitivityAnalysis(results)
loo_entities = sa.leave_one_out_entities(influence_threshold=2.0)

# Results
print(loo_entities.method)            # "leave_one_out_entities"
print(loo_entities.estimates)          # pd.DataFrame (N × K)
print(loo_entities.statistics)         # dict with per-parameter stats
print(loo_entities.influential_units)  # list of flagged entity labels
print(loo_entities.subsample_info)     # pd.DataFrame (excluded, n_obs, converged)

Leave-One-Out Periods¶

Drop each time period and re-estimate. Reveals whether a single year or event drives the results.

loo_periods = sa.leave_one_out_periods(influence_threshold=2.0)

print(f"Influential periods: {loo_periods.influential_units}")
print(sa.summary(loo_periods))

Subset Sensitivity¶

Draw n_subsamples random subsamples (each containing a fraction subsample_size of entities) and re-estimate. This assesses overall stability without focusing on individual units.

subsets = sa.subset_sensitivity(
    n_subsamples=20,
    subsample_size=0.8,
    stratify=True,
    random_state=42,
)

print(sa.summary(subsets))

Parameter	Default	Description
`n_subsamples`	20	Number of random subsamples
`subsample_size`	0.8	Fraction of entities per subsample (0 to 1)
`stratify`	`True`	Maintain temporal balance by sampling entities (not observations)
`random_state`	`None`	Seed for reproducibility

SensitivityResults¶

All three modes return a SensitivityResults object:

Attribute	Type	Description
`method`	`str`	`"leave_one_out_entities"`, `"leave_one_out_periods"`, or `"subset_sensitivity"`
`estimates`	`pd.DataFrame`	Coefficient estimates per subsample (rows) and parameter (columns)
`std_errors`	`pd.DataFrame`	Standard errors per subsample
`statistics`	`dict`	Per-parameter summary: mean, std, range, max deviation, % beyond threshold
`influential_units`	`list`	Labels of entities/periods flagged as influential
`subsample_info`	`pd.DataFrame`	Metadata per subsample (excluded unit, n_obs, converged)

Statistics Dictionary¶

For each parameter, statistics[param] contains:

Key	Description
`mean`	Mean estimate across subsamples
`std`	Standard deviation of estimates
`min` / `max`	Range of estimates
`range`	max - min
`max_abs_deviation`	Largest absolute deviation from original
`mean_abs_deviation`	Average absolute deviation from original
`max_std_deviation`	Largest deviation in SE units
`n_beyond_threshold`	Count of subsamples exceeding threshold
`pct_beyond_threshold`	Percentage exceeding threshold

Visualization¶

# Plot coefficient stability
fig = sa.plot_sensitivity(loo_entities, params=["value", "capital"])

# Customize
fig = sa.plot_sensitivity(
    loo_entities,
    params=["value"],
    figsize=(12, 6),
    reference_line=True,      # Red dashed line at original estimate
    confidence_band=True,     # Blue band at mean +/- 1.96*std
)

The plot shows each subsample's estimate as a point, with the original estimate as a red reference line and a 95% confidence band in blue. Narrow bands indicate robust estimates; wide bands suggest sensitivity to sample composition.

Summary Table¶

summary_df = sa.summary(loo_entities)
print(summary_df)

Returns a pd.DataFrame with columns: Parameter, Original, Mean, Std, Min, Max, Range, Max Deviation, Max Dev (SE), N Valid.

Interpretation¶

Reading Sensitivity Results

Narrow range: Estimates vary little across subsamples -- the finding is robust.
Wide range: Results are sensitive to sample composition -- proceed with caution.
Specific influential units: If one entity/period is flagged, investigate why. Is it an outlier? A structural break? Or does it carry unique information?
pct_beyond_threshold > 5%: More than 5% of subsamples produce estimates beyond 2 SE -- the finding may not be robust.
Sign changes: If the coefficient changes sign across subsamples, the result is unreliable.

Sensitivity vs Jackknife¶

Feature	SensitivityAnalysis	PanelJackknife
LOO entities	Yes	Yes
LOO periods	Yes	No
Random subsets	Yes	No
Bias estimation	No	Yes (\(\text{Bias}_{JK}\))
Influence formula	Standardized deviation	\((N-1)(\hat\theta - \hat\theta_{(-i)})\)
Visualization	`plot_sensitivity()`	--
Summary table	`summary()` returns DataFrame	`summary()` returns string

Use PanelJackknife when you need formal bias estimates and influence decomposition. Use SensitivityAnalysis for a broader assessment including time periods and random subsets.

Common Pitfalls¶

Watch Out

Threshold sensitivity: The influence_threshold=2.0 default may be too strict or too lenient depending on the application. Report results for multiple thresholds.
Small N problems: With \(N < 10\) entities, leave-one-out removes 10-20% of the data each time, amplifying natural variation. Interpret with care.
Convergence failures: Check subsample_info["converged"]. If many subsamples fail to converge, the model may be poorly identified.
Multiple testing: With many entities and parameters, some will appear "influential" by chance. Focus on substantively meaningful deviations.

References¶

Leamer, E. E. (1983). Let's take the con out of econometrics. American Economic Review, 73(1), 31-43.
Lu, X., & White, H. (2014). Robustness checks and robustness tests in applied economics. Journal of Econometrics, 178(P1), 194-206.
Cinelli, C., & Hazlett, C. (2020). Making sense of sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society: Series B, 82(1), 39-67.