Jackknife Analysis¶

Quick Reference

Class: panelbox.validation.robustness.PanelJackknife Import: from panelbox.validation.robustness import PanelJackknife Key method: jk.run() returns JackknifeResults Stata equivalent: jackknife prefix R equivalent: boot::jack.after.boot()

What It Does¶

The panel jackknife systematically drops one entity at a time from the dataset and re-estimates the model on the remaining \(N-1\) entities. For a panel with \(N\) entities, this produces \(N\) re-estimations, each revealing how much a single entity contributes to the overall results.

The jackknife answers three questions:

Bias: Is the estimator biased, and by how much?
Variance: What are the standard errors under leave-one-out resampling?
Influence: Which entities have disproportionate impact on the coefficients?

Quick Example¶

from panelbox import FixedEffects
from panelbox.validation.robustness import PanelJackknife
from panelbox.datasets import load_grunfeld

data = load_grunfeld()
model = FixedEffects("invest ~ value + capital", data, "firm", "year")
results = model.fit()

# Jackknife analysis
jk = PanelJackknife(results, verbose=True)
jk_results = jk.run()

# View summary
print(jk.summary())

# Bias-corrected estimates
bias_corrected = jk.bias_corrected_estimates()
print(bias_corrected)

# Find influential entities
influential = jk.influential_entities(threshold=2.0)
print(influential)

Mathematical Details¶

Given \(N\) entities and original estimate \(\hat\theta\), the jackknife computes:

Jackknife mean:

\[\bar\theta_{JK} = \frac{1}{N} \sum_{i=1}^{N} \hat\theta_{(-i)}\]

where \(\hat\theta_{(-i)}\) is the estimate with entity \(i\) removed.

Jackknife bias:

\[\text{Bias}_{JK} = (N-1)(\bar\theta_{JK} - \hat\theta)\]

Jackknife standard error:

\[SE_{JK} = \sqrt{\frac{N-1}{N} \sum_{i=1}^{N} (\hat\theta_{(-i)} - \bar\theta_{JK})^2}\]

Influence of entity \(i\):

\[\text{Influence}_i = (N-1)(\hat\theta - \hat\theta_{(-i)})\]

Bias-corrected estimate:

\[\hat\theta_{corrected} = \hat\theta - \text{Bias}_{JK} = N\hat\theta - (N-1)\bar\theta_{JK}\]

API Reference¶

Constructor¶

PanelJackknife(
    results=results,   # PanelResults from model.fit()
    verbose=True,      # Print progress information
)

Methods¶

Method	Returns	Description
`run()`	`JackknifeResults`	Execute leave-one-out procedure
`bias_corrected_estimates()`	`pd.Series`	Original estimates minus jackknife bias
`confidence_intervals(alpha, method)`	`pd.DataFrame`	CIs using jackknife SE (`"normal"` or `"percentile"`)
`influential_entities(threshold, metric)`	`pd.DataFrame`	Entities with aggregate influence above threshold
`summary()`	`str`	Formatted summary string

JackknifeResults Attributes¶

Attribute	Type	Description
`jackknife_estimates`	`pd.DataFrame`	Parameter estimates per entity excluded (\(N \times K\))
`original_estimates`	`pd.Series`	Original full-sample estimates
`jackknife_mean`	`pd.Series`	Mean of jackknife estimates
`jackknife_bias`	`pd.Series`	\((N-1) \times (\bar\theta_{JK} - \hat\theta)\)
`jackknife_se`	`pd.Series`	Jackknife standard errors
`influence`	`pd.DataFrame`	Per-entity influence on each parameter
`n_jackknife`	`int`	Number of successful jackknife samples

Identifying Influential Entities¶

# Default: flag entities with max absolute influence > 2x the mean
influential = jk.influential_entities(threshold=2.0, metric="max")
print(influential)

# Alternative aggregation metrics
influential_mean = jk.influential_entities(threshold=2.0, metric="mean")
influential_sum = jk.influential_entities(threshold=2.0, metric="sum")

The metric parameter controls how influence is aggregated across parameters:

Metric	Aggregation	Use When
`max`	Maximum absolute influence across parameters	Default; catches entity affecting any single parameter
`mean`	Mean absolute influence across parameters	Detects entities with broad but moderate influence
`sum`	Sum of absolute influences across parameters	Emphasizes entities affecting many parameters

An entity is flagged as influential if its aggregate influence exceeds threshold times the mean aggregate influence across all entities.

Confidence Intervals¶

# Normal approximation (using jackknife SE)
ci = jk.confidence_intervals(alpha=0.05, method="normal")

# Percentile method (using jackknife distribution)
ci = jk.confidence_intervals(alpha=0.05, method="percentile")

Interpretation¶

Reading Jackknife Results

Large bias: If jackknife_bias is large relative to the standard error, the original estimator may be biased. Use bias_corrected_estimates().
Large SE ratio: If jackknife SE is much larger than asymptotic SE, inference based on asymptotic SE may be too liberal.
Influential entities: If removing one entity changes coefficients substantially, results are fragile. Report sensitivity to that entity.
Clustered influence: If influential entities share characteristics (e.g., all large firms), consider model specification issues.

Jackknife vs Bootstrap¶

Feature	Jackknife	Bootstrap
Deterministic	Yes	No (random resampling)
Computational cost	\(N\) re-estimations	\(B\) re-estimations (typically \(B \gg N\))
Entity influence	Directly reveals which entity matters	Not directly available
CI accuracy	Normal approximation	Multiple CI methods available
Flexibility	Leave-one-out only	Multiple resampling schemes
Best for	Identifying influential entities	Distribution-free inference

The jackknife is a natural complement to bootstrap: use the jackknife to identify influential entities, then use bootstrap to validate inference.

Common Pitfalls¶

Watch Out

Small N: With very few entities (e.g., \(N < 10\)), dropping one entity removes a substantial fraction of the data, making jackknife estimates noisy.
Failed estimations: If a model fails to converge when one entity is removed, that entity may be essential for identification. Check the n_jackknife attribute.
Bias correction with small N: Jackknife bias correction can increase variance. For small \(N\), the uncorrected estimate may be preferable.
Comparison across models: Jackknife results are model-specific. Different model specifications may identify different influential entities.

References¶

Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap. Chapman and Hall/CRC, Chapter 11.
Shao, J., & Tu, D. (1995). The Jackknife and Bootstrap. Springer Science & Business Media.
Quenouille, M. H. (1956). Notes on bias in estimation. Biometrika, 43(3-4), 353-360.
Miller, R. G. (1974). The jackknife -- a review. Biometrika, 61(1), 1-15.