Jackknife Analysis¶
Quick Reference
Class: panelbox.validation.robustness.PanelJackknife
Import: from panelbox.validation.robustness import PanelJackknife
Key method: jk.run() returns JackknifeResults
Stata equivalent: jackknife prefix
R equivalent: boot::jack.after.boot()
What It Does¶
The panel jackknife systematically drops one entity at a time from the dataset and re-estimates the model on the remaining \(N-1\) entities. For a panel with \(N\) entities, this produces \(N\) re-estimations, each revealing how much a single entity contributes to the overall results.
The jackknife answers three questions:
- Bias: Is the estimator biased, and by how much?
- Variance: What are the standard errors under leave-one-out resampling?
- Influence: Which entities have disproportionate impact on the coefficients?
Quick Example¶
from panelbox import FixedEffects
from panelbox.validation.robustness import PanelJackknife
from panelbox.datasets import load_grunfeld
data = load_grunfeld()
model = FixedEffects("invest ~ value + capital", data, "firm", "year")
results = model.fit()
# Jackknife analysis
jk = PanelJackknife(results, verbose=True)
jk_results = jk.run()
# View summary
print(jk.summary())
# Bias-corrected estimates
bias_corrected = jk.bias_corrected_estimates()
print(bias_corrected)
# Find influential entities
influential = jk.influential_entities(threshold=2.0)
print(influential)
Mathematical Details¶
Given \(N\) entities and original estimate \(\hat\theta\), the jackknife computes:
Jackknife mean:
where \(\hat\theta_{(-i)}\) is the estimate with entity \(i\) removed.
Jackknife bias:
Jackknife standard error:
Influence of entity \(i\):
Bias-corrected estimate:
API Reference¶
Constructor¶
PanelJackknife(
results=results, # PanelResults from model.fit()
verbose=True, # Print progress information
)
Methods¶
| Method | Returns | Description |
|---|---|---|
run() |
JackknifeResults |
Execute leave-one-out procedure |
bias_corrected_estimates() |
pd.Series |
Original estimates minus jackknife bias |
confidence_intervals(alpha, method) |
pd.DataFrame |
CIs using jackknife SE ("normal" or "percentile") |
influential_entities(threshold, metric) |
pd.DataFrame |
Entities with aggregate influence above threshold |
summary() |
str |
Formatted summary string |
JackknifeResults Attributes¶
| Attribute | Type | Description |
|---|---|---|
jackknife_estimates |
pd.DataFrame |
Parameter estimates per entity excluded (\(N \times K\)) |
original_estimates |
pd.Series |
Original full-sample estimates |
jackknife_mean |
pd.Series |
Mean of jackknife estimates |
jackknife_bias |
pd.Series |
\((N-1) \times (\bar\theta_{JK} - \hat\theta)\) |
jackknife_se |
pd.Series |
Jackknife standard errors |
influence |
pd.DataFrame |
Per-entity influence on each parameter |
n_jackknife |
int |
Number of successful jackknife samples |
Identifying Influential Entities¶
# Default: flag entities with max absolute influence > 2x the mean
influential = jk.influential_entities(threshold=2.0, metric="max")
print(influential)
# Alternative aggregation metrics
influential_mean = jk.influential_entities(threshold=2.0, metric="mean")
influential_sum = jk.influential_entities(threshold=2.0, metric="sum")
The metric parameter controls how influence is aggregated across parameters:
| Metric | Aggregation | Use When |
|---|---|---|
max |
Maximum absolute influence across parameters | Default; catches entity affecting any single parameter |
mean |
Mean absolute influence across parameters | Detects entities with broad but moderate influence |
sum |
Sum of absolute influences across parameters | Emphasizes entities affecting many parameters |
An entity is flagged as influential if its aggregate influence exceeds threshold times the mean aggregate influence across all entities.
Confidence Intervals¶
# Normal approximation (using jackknife SE)
ci = jk.confidence_intervals(alpha=0.05, method="normal")
# Percentile method (using jackknife distribution)
ci = jk.confidence_intervals(alpha=0.05, method="percentile")
Interpretation¶
Reading Jackknife Results
- Large bias: If
jackknife_biasis large relative to the standard error, the original estimator may be biased. Usebias_corrected_estimates(). - Large SE ratio: If jackknife SE is much larger than asymptotic SE, inference based on asymptotic SE may be too liberal.
- Influential entities: If removing one entity changes coefficients substantially, results are fragile. Report sensitivity to that entity.
- Clustered influence: If influential entities share characteristics (e.g., all large firms), consider model specification issues.
Jackknife vs Bootstrap¶
| Feature | Jackknife | Bootstrap |
|---|---|---|
| Deterministic | Yes | No (random resampling) |
| Computational cost | \(N\) re-estimations | \(B\) re-estimations (typically \(B \gg N\)) |
| Entity influence | Directly reveals which entity matters | Not directly available |
| CI accuracy | Normal approximation | Multiple CI methods available |
| Flexibility | Leave-one-out only | Multiple resampling schemes |
| Best for | Identifying influential entities | Distribution-free inference |
The jackknife is a natural complement to bootstrap: use the jackknife to identify influential entities, then use bootstrap to validate inference.
Common Pitfalls¶
Watch Out
- Small N: With very few entities (e.g., \(N < 10\)), dropping one entity removes a substantial fraction of the data, making jackknife estimates noisy.
- Failed estimations: If a model fails to converge when one entity is removed, that entity may be essential for identification. Check the
n_jackknifeattribute. - Bias correction with small N: Jackknife bias correction can increase variance. For small \(N\), the uncorrected estimate may be preferable.
- Comparison across models: Jackknife results are model-specific. Different model specifications may identify different influential entities.
See Also¶
- Bootstrap Inference -- Stochastic resampling alternative
- Sensitivity Analysis -- Generalized leave-one-out (entities, periods, subsets)
- Influence Diagnostics -- Observation-level influence (Cook's D, DFFITS)
- Robustness Overview -- Full robustness toolkit
References¶
- Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap. Chapman and Hall/CRC, Chapter 11.
- Shao, J., & Tu, D. (1995). The Jackknife and Bootstrap. Springer Science & Business Media.
- Quenouille, M. H. (1956). Notes on bias in estimation. Biometrika, 43(3-4), 353-360.
- Miller, R. G. (1974). The jackknife -- a review. Biometrika, 61(1), 1-15.