MLE Variance Estimation¶

Quick Reference

Module: panelbox.standard_errors.mle Functions: sandwich_estimator(), cluster_robust_mle(), delta_method(), bootstrap_mle() Model integration: model.fit(cov_type="robust") on MLE models Stata equivalent: vce(robust) on logit, probit, poisson R equivalent: sandwich::sandwich(), car::deltaMethod()

Overview¶

Nonlinear models estimated by Maximum Likelihood Estimation (MLE) --- such as Logit, Probit, Poisson, and Tobit --- require specialized variance estimators. PanelBox provides three complementary approaches:

Sandwich estimator (Huber-White): Robust to distributional misspecification
Delta method: For standard errors of transformed parameters (marginal effects, odds ratios)
Bootstrap: Distribution-free inference when analytical SEs are difficult

When to Use¶

Method	Use Case
Classical (`method="nonrobust"`)	Correctly specified model with known distribution
Sandwich (`method="robust"`)	Robust to distributional misspecification
Cluster-robust	MLE with panel data (clustered observations)
Delta method	Standard errors for nonlinear functions of parameters
Bootstrap	Complex models where analytical SEs are unavailable

The Sandwich Estimator¶

Classical MLE Variance¶

Under correct specification, the MLE variance is the inverse of the information matrix:

\[ V_{\text{classical}} = -H^{-1} = \left[ -\sum_{i=1}^{n} \frac{\partial^2 \ell_i}{\partial \theta \partial \theta'} \right]^{-1} \]

where \(H\) is the Hessian of the log-likelihood evaluated at the MLE.

Robust (Sandwich) MLE Variance¶

The sandwich estimator is robust to misspecification of the likelihood:

\[ V_{\text{robust}} = H^{-1} S H^{-1} \]

where \(S\) is the outer product of scores (OPG):

\[ S = \sum_{i=1}^{n} s_i s_i', \quad s_i = \frac{\partial \ell_i}{\partial \theta} \]

This is also known as the Huber-White or QMLE (Quasi-MLE) estimator.

Quick Example¶

from panelbox.standard_errors.mle import sandwich_estimator

# Classical (non-robust) MLE standard errors
result = sandwich_estimator(hessian=H, scores=scores, method="nonrobust")
print(f"Classical SE: {result.std_errors}")
print(f"Method: {result.method}")

# Robust sandwich standard errors
result = sandwich_estimator(hessian=H, scores=scores, method="robust")
print(f"Robust SE: {result.std_errors}")
print(f"n_obs: {result.n_obs}, n_params: {result.n_params}")

Configuration¶

Parameter	Type	Default	Description
`hessian`	`np.ndarray`	---	Hessian at MLE \((k \times k)\), negative definite
`scores`	`np.ndarray`	---	Score vectors per observation \((n \times k)\)
`method`	`str`	`"robust"`	`"nonrobust"` (classical) or `"robust"` (sandwich)

Returns: MLECovarianceResult with .cov_matrix, .std_errors, .method, .n_obs, .n_params

Cluster-Robust MLE¶

For panel data with MLE models, cluster-robust standard errors aggregate scores within clusters before computing the sandwich:

\[ V_{\text{cluster}} = H^{-1} \left[ \sum_{g=1}^{G} \left( \sum_{i \in g} s_i \right) \left( \sum_{i \in g} s_i \right)' \right] H^{-1} \]

Quick Example¶

from panelbox.standard_errors.mle import cluster_robust_mle

result = cluster_robust_mle(
    hessian=H,
    scores=scores,
    cluster_ids=entity_ids,
    df_correction=True,
)
print(f"Cluster-robust SE: {result.std_errors}")

Configuration¶

Parameter	Type	Default	Description
`hessian`	`np.ndarray`	---	Hessian at MLE \((k \times k)\)
`scores`	`np.ndarray`	---	Score vectors \((n \times k)\)
`cluster_ids`	`np.ndarray`	---	Cluster identifiers \((n,)\)
`df_correction`	`bool`	`True`	Apply \(\frac{G}{G-1} \cdot \frac{N-1}{N-K}\) correction

The Delta Method¶

The delta method provides approximate standard errors for nonlinear transformations of estimated parameters. If \(g(\hat{\theta})\) is a smooth function of the MLE \(\hat{\theta}\), then:

\[ \text{Var}[g(\hat{\theta})] \approx J \cdot V(\hat{\theta}) \cdot J' \]

where \(J = \frac{\partial g}{\partial \theta'}\) is the Jacobian matrix, computed numerically via central differences.

Use Cases¶

Odds ratios: \(g(\beta) = \exp(\beta)\) in logistic regression
Marginal effects: \(g(\beta) = \beta \cdot f(X\beta)\) evaluated at means
Elasticities: \(g(\beta) = \beta \cdot \bar{X} / \bar{Y}\)
Predictions at specific values: \(g(\beta) = F(x_0' \beta)\)

Quick Example¶

from panelbox.standard_errors.mle import delta_method
import numpy as np

# Parameters and covariance from logit model
params = np.array([0.5, 1.0, -0.3])
vcov = np.diag([0.01, 0.02, 0.005])

# Transform to odds ratios: g(beta) = exp(beta)
def odds_ratio(beta):
    return np.exp(beta)

vcov_or = delta_method(vcov, odds_ratio, params)
se_or = np.sqrt(np.diag(vcov_or))

print(f"Odds ratios: {odds_ratio(params)}")
print(f"SE (delta method): {se_or}")

Configuration¶

Parameter	Type	Default	Description
`vcov`	`np.ndarray`	---	Covariance matrix of parameters \((k \times k)\)
`transform_func`	`callable`	---	Function \(g: \mathbb{R}^k \to \mathbb{R}^m\)
`params`	`np.ndarray`	---	Parameter estimates \((k,)\)
`epsilon`	`float`	`1e-7`	Step size for numerical Jacobian

Returns: np.ndarray --- Covariance matrix of transformed parameters \((m \times m)\)

Bootstrap MLE¶

When analytical standard errors are difficult to derive or the delta method approximation is poor, bootstrap provides a distribution-free alternative:

Resample observations (or clusters) with replacement
Re-estimate the model on each bootstrap sample
Compute the sample covariance of bootstrap estimates

Quick Example¶

from panelbox.standard_errors.mle import bootstrap_mle

def estimate_logit(y, X):
    from scipy.optimize import minimize
    def neg_ll(beta):
        eta = X @ beta
        return -np.sum(y * eta - np.log1p(np.exp(eta)))
    result = minimize(neg_ll, np.zeros(X.shape[1]))
    return result.x

# Standard bootstrap
result = bootstrap_mle(
    estimate_func=estimate_logit,
    y=y, X=X,
    n_bootstrap=999,
    seed=42,
)
print(f"Bootstrap SE: {result.std_errors}")

# Cluster bootstrap
result = bootstrap_mle(
    estimate_func=estimate_logit,
    y=y, X=X,
    n_bootstrap=999,
    cluster_ids=entity_ids,
    seed=42,
)
print(f"Cluster bootstrap SE: {result.std_errors}")

Configuration¶

Parameter	Type	Default	Description
`estimate_func`	`callable`	---	Function `(y, X) -> params`
`y`	`np.ndarray`	---	Dependent variable \((n,)\)
`X`	`np.ndarray`	---	Design matrix \((n \times k)\)
`n_bootstrap`	`int`	`999`	Number of bootstrap replications
`cluster_ids`	`np.ndarray` or `None`	`None`	Cluster IDs for cluster bootstrap
`seed`	`int`	`42`	Random seed

Returns: MLECovarianceResult with .cov_matrix, .std_errors, .method="bootstrap"

Bootstrap vs Analytical SE¶

Feature	Sandwich	Bootstrap
Speed	Fast (closed-form)	Slow (\(B\) re-estimations)
Distributional assumptions	Asymptotic normality	None
Small sample	May be imprecise	Better coverage
Implementation	Requires Hessian + scores	Requires re-estimation function
Cluster-robust	Yes	Yes (cluster bootstrap)

MLECovarianceResult¶

All MLE variance functions return an MLECovarianceResult object:

Attribute	Type	Description
`cov_matrix`	`np.ndarray`	Covariance matrix \((k \times k)\)
`std_errors`	`np.ndarray`	Standard errors \((k,)\)
`method`	`str`	`"nonrobust"`, `"robust"`, `"cluster"`, or `"bootstrap"`
`n_obs`	`int`	Number of observations
`n_params`	`int`	Number of parameters

Common Pitfalls¶

Pitfall 1: Hessian sign convention

The hessian parameter should be the actual Hessian of the log-likelihood (negative definite at a maximum). PanelBox internally negates it: \(-H^{-1}\). Do not pre-negate it.

Pitfall 2: Summed vs per-observation scores

The scores parameter must contain per-observation score vectors \((n \times k)\), not the summed gradient. Each row \(s_i\) is \(\partial \ell_i / \partial \theta\).

Pitfall 3: Delta method with highly nonlinear transformations

The delta method is a first-order approximation. For highly nonlinear transformations or parameters near boundary values, bootstrap may be more reliable.

Pitfall 4: Too few bootstrap replications

Use at least \(B = 999\) for reliable standard errors and \(B = 9999\) for confidence intervals. PanelBox warns if more than 50% of replications fail.

References¶

White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50(1), 1-25.
Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications. Cambridge University Press.
Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2^nd ed.). MIT Press.

MLE Variance Estimation¶

Overview¶

When to Use¶

The Sandwich Estimator¶

Classical MLE Variance¶

Robust (Sandwich) MLE Variance¶

Quick Example¶

Configuration¶

Cluster-Robust MLE¶

Quick Example¶

Configuration¶

The Delta Method¶

Use Cases¶

Quick Example¶

Configuration¶

Bootstrap MLE¶

Quick Example¶

Configuration¶

Bootstrap vs Analytical SE¶

MLECovarianceResult¶

Common Pitfalls¶

See Also¶

References¶