Theory Guide: Panel Selection Models¶
Overview¶
Sample selection models address a fundamental econometric problem: outcomes are observed only for a non-random subsample. This creates selection bias when the selection process is correlated with the outcome of interest.
Classic Example: Wage determination
- We observe wages only for employed individuals
- Employment decisions may correlate with potential wages
- Simply analyzing employed workers yields biased wage estimates
The Panel Heckman model (Wooldridge 1995) extends Heckman's (1979) two-step correction to panel data, accounting for individual heterogeneity and correlation over time.
The Selection Problem¶
Basic Setup¶
Consider a panel dataset with \(N\) individuals observed over \(T\) periods.
Outcome Equation (of interest):
where:
- \(y_{it}\): outcome of interest (e.g., wage)
- \(X_{it}\): outcome regressors
- \(\alpha_i\): individual fixed/random effect
- \(\varepsilon_{it}\): idiosyncratic error
Selection Equation:
where:
- \(d_{it} = 1\) if outcome observed, 0 otherwise
- \(W_{it}\): selection regressors
- \(\eta_i\): individual random effect in selection
- \(v_{it}\): selection shock
Key Assumption: Outcome is observed only when \(d_{it} = 1\):
When Does Selection Bias Occur?¶
Selection bias arises when:
Intuition:
- \(\rho > 0\) (Positive selection): High-outcome individuals more likely to be selected
-
Example: High-wage workers more likely to be employed
-
\(\rho < 0\) (Negative selection): Low-outcome individuals more likely to be selected
-
Example: Low-skilled workers more likely to participate in training programs
-
\(\rho = 0\) (No selection bias): Selection independent of outcome
- OLS on selected sample is unbiased
Panel Heckman Model (Wooldridge 1995)¶
Model Specification¶
Joint Distribution:
Assume:
with:
where:
- Selection error variance normalized to 1
- \(\rho\) is the key parameter (selection bias)
- \(\sigma_\varepsilon^2\) is outcome error variance
Conditional Distribution:
Given selection (\(d_{it} = 1\)), the outcome expectation is:
where:
is the Inverse Mills Ratio (IMR).
Key Insight: The IMR \(\lambda_{it}\) captures the selection correction. If we include it as a regressor, we can obtain unbiased estimates of \(\beta\).
Estimation Methods¶
1. Heckman Two-Step Estimator¶
Step 1: Estimate Selection Equation
Estimate \(\gamma\) via probit (or random effects probit for panel):
Obtain: \(\hat{\gamma}\)
Step 2: Augmented Outcome Regression
Compute IMR:
Estimate outcome equation with IMR:
via OLS, where \(\theta = \rho\sigma_\varepsilon\).
Recover \(\rho\):
where \(\hat{\sigma}_\varepsilon\) is estimated from residuals.
Standard Errors:
Naive OLS standard errors are incorrect because they ignore estimation error in \(\hat{\gamma}\) (Step 1).
Murphy-Topel Correction adjusts for this:
where the correction accounts for uncertainty in \(\hat{\gamma}\).
Pros:
- Simple and fast
- Transparent two-stage process
- Robust initial estimator
Cons:
- Standard errors require Murphy-Topel correction
- Less efficient than MLE
2. Full Information Maximum Likelihood (FIML)¶
Joint Likelihood:
For each individual \(i\):
where:
Parameters to estimate:
- \(\beta\): outcome coefficients
- \(\gamma\): selection coefficients
- \(\sigma_\varepsilon\): outcome error SD
- \(\rho\): correlation
- \(\Sigma_u\): random effects variance
Numerical Integration:
The integral over \((\alpha_i, \eta_i)\) is computed via Gauss-Hermite quadrature:
Typically use \(m \times m\) grid (e.g., \(10 \times 10 = 100\) evaluation points).
Optimization:
Maximize joint log-likelihood:
Use Newton-Raphson or BFGS with:
- Starting values from two-step
- Parameter transformations to ensure constraints:
- \(\sigma_\varepsilon > 0\): use \(\log(\sigma_\varepsilon)\)
- \(\rho \in (-1, 1)\): use Fisher z-transform
Pros:
- Asymptotically efficient
- Correct standard errors automatically
- Joint estimation of all parameters
Cons:
- Computationally expensive
- May have convergence issues
- Requires good starting values
Identification¶
Exclusion Restriction¶
Critical for identification: At least one variable in \(W_{it}\) (selection equation) should be excluded from \(X_{it}\) (outcome equation).
Why?
Without exclusion restriction, identification relies solely on:
- Nonlinearity of IMR
- Distributional assumptions (normality)
This is weak and often fails in practice.
Good Exclusion Restrictions:
Variables that affect selection but not outcome directly:
| Application | Selection Var | Outcome Var | Exclusion Restriction |
|---|---|---|---|
| Wage determination | Employment | Wage | Number of children, non-labor income |
| Training effects | Training participation | Earnings | Program availability, distance to center |
| Insurance choice | Purchase insurance | Medical costs | State regulations, premium subsidies |
Testing:
While there's no formal test for exclusion restriction validity, you can:
- Test joint significance in outcome equation (should be insignificant)
- Check economic plausibility
- Perform sensitivity analysis
Inverse Mills Ratio (IMR)¶
Definition¶
The IMR is the ratio of PDF to CDF of standard normal:
Properties:
-
Always positive: \(\lambda(z) > 0\) for all \(z\)
-
Decreasing in \(z\):
\[\frac{d\lambda}{dz} = -\lambda(\lambda + z) < 0\] -
Asymptotic behavior:
\[\lambda(z) \to 0 \quad \text{as } z \to +\infty \quad \text{(high selection probability)}\]\[\lambda(z) \to +\infty \quad \text{as } z \to -\infty \quad \text{(low selection probability)}\] -
At \(z = 0\): \(\lambda(0) \approx 0.7979\)
Interpretation¶
In the outcome equation:
The term \(\rho\sigma_\varepsilon \lambda(W'\gamma)\) is the selection correction.
Magnitude:
- High \(\lambda\) → Strong selection effect
- Low \(\lambda\) → Weak selection effect
Sign:
- \(\rho > 0\) and \(\lambda > 0\) → Positive correction (selected sample has higher \(E[\varepsilon]\))
- \(\rho < 0\) and \(\lambda > 0\) → Negative correction (selected sample has lower \(E[\varepsilon]\))
Diagnostics¶
High IMR values (\(\lambda > 2\)) indicate:
- Very low selection probabilities
- Strong selection effects
- Potentially problematic observations
Example:
diag = result.imr_diagnostics()
print(f"Mean IMR: {diag['imr_mean']:.3f}")
print(f"High IMR count: {diag['high_imr_count']}")
Testing for Selection Bias¶
\(H_0\): \(\rho = 0\) (No Selection Bias)¶
Two equivalent approaches:
1. t-test on IMR coefficient¶
In Step 2:
Test: \(H_0\!: \theta = 0\)
Use standard t-test with Murphy-Topel corrected SE.
2. Wald test on \(\rho\)¶
From MLE, directly test:
Interpretation:
- Reject \(H_0\): Selection bias present → Use Heckman correction
- Fail to reject: No significant selection bias → OLS may be adequate
Example:
test = result.selection_effect()
print(test['interpretation'])
# Output: "Selection bias detected (ρ ≠ 0, p=0.0012).
# OLS would be biased. Heckman correction is necessary."
Comparison with OLS¶
OLS on Selected Sample (Biased)¶
If we ignore selection and run OLS only on observations with \(d_{it} = 1\):
Bias:
where:
Direction of bias depends on:
- Sign of \(\rho\)
- Correlation between \(X\) and \(\lambda\)
Heckman Corrected (Unbiased)¶
Including IMR removes the bias:
Empirical Comparison¶
comparison = result.compare_ols_heckman()
print(f"Max difference: {comparison['max_abs_difference']:.3f}")
print(comparison['interpretation'])
Example Output:
Coefficient Estimates:
Variable OLS Heckman Difference % Diff
--------------------------------------------------------
Intercept 1.6234 1.5123 0.1111 6.8%
Experience 0.0245 0.0298 -0.0053 -21.6%
Education 0.0923 0.0789 0.0134 14.5%
Substantial selection bias detected (max diff: 0.134).
OLS estimates are biased. Heckman correction is necessary.
Practical Considerations¶
When to Use Heckman Model?¶
Use when:
- Outcome observed only for subset of sample
- Selection likely correlated with outcome
- Valid exclusion restriction available
- Selection probabilities vary sufficiently
- Normality assumption reasonable
Don't use when:
- Selection purely random (\(\rho = 0\))
- No exclusion restriction
- Perfect or near-perfect selection (everyone/no one selected)
- Selection mechanism complex (use alternative methods)
Common Issues¶
1. Collinearity¶
If \(X\) and \(W\) are very similar (weak exclusion restriction):
- IMR highly collinear with \(X\)
- Estimates unstable
- Large standard errors
Solution: Find better exclusion restriction
2. Extreme Selection Probabilities¶
If some \(\Phi(W'\gamma)\) very close to 0 or 1:
- IMR explodes (\(\lambda \to \infty\))
- Numerical instability
Solution:
- Check for outliers
- Trim extreme observations
- Use robust estimation
3. Non-Normality¶
Heckman model assumes:
If violated:
- Estimates inconsistent
- Test results unreliable
Solutions:
- Semi-parametric estimators (e.g., Kyriazidou 1997)
- Transformation to normality
- Robustness checks
4. MLE Convergence¶
MLE may fail to converge if:
- Starting values poor
- Likelihood flat
- Parameters at boundary (\(|\rho|\) near 1)
Solutions:
- Use two-step as starting values
- Try multiple starting points
- Check parameter bounds
- Simplify model if needed
Robustness Checks¶
- Sensitivity to exclusion restriction:
- Try different exclusion variables
-
Check coefficient stability
-
Distributional assumptions:
- Test normality of residuals
-
Compare with semi-parametric estimators
-
Specification:
- Test functional form (quadratic terms, interactions)
-
Check for heteroskedasticity
-
Sample selection:
- Bootstrap standard errors
- Jackknife leave-one-out
Extensions¶
1. Kyriazidou (1997) Semi-Parametric Estimator¶
Avoids distributional assumptions using pairwise differences:
where:
- \(K_h\): kernel function with bandwidth \(h\)
- Uses observations with similar \(W_{it}\), \(W_{is}\)
Pros: No normality assumption
Cons:
- Computationally intensive
- Bandwidth selection critical
- Less efficient than MLE
2. Dynamic Selection Models¶
Extend to:
Accounts for state dependence in selection.
3. Multiple Selection Rules¶
Multiple types of selection:
Requires nested or sequential selection models.
References¶
Key Papers¶
- Heckman, J.J. (1979). "Sample Selection Bias as a Specification Error." Econometrica, 47(1), 153-161.
-
Original two-step correction for cross-section
-
Wooldridge, J.M. (1995). "Selection Corrections for Panel Data Models Under Conditional Mean Independence Assumptions." Journal of Econometrics, 68(1), 115-132.
-
Panel extension with random effects
-
Kyriazidou, E. (1997). "Estimation of a Panel Data Sample Selection Model." Econometrica, 65(6), 1335-1364.
-
Semi-parametric approach
-
Murphy, K.M., & Topel, R.H. (1985). "Estimation and Inference in Two-Step Econometric Models." Journal of Business & Economic Statistics, 3(4), 370-379.
- Variance correction for two-step estimators
Textbooks¶
- Wooldridge, J.M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press.
-
Chapter 19: Sample selection models
-
Cameron, A.C., & Trivedi, P.K. (2005). Microeconometrics: Methods and Applications. Cambridge University Press.
- Chapter 16: Sample selection