Censored & Selection Models¶
Censored and selection models address two related problems in panel data:
-
Censoring: The outcome variable is observed only within a limited range. For example, hours worked are censored at zero (we do not observe negative hours), or test scores are capped at 100. Standard linear models ignore the pile-up at the boundary, producing biased estimates.
-
Sample selection: The outcome is observed only for a non-random subset of the population. For example, wages are observed only for employed individuals. Analyzing only the observed subsample produces selection bias.
PanelBox provides three estimators covering the main approaches for panel data with censoring or selection.
Available Models¶
| Model | Class | Reference | When to Use |
|---|---|---|---|
| Pooled Tobit | PooledTobit |
Tobin (1958) | Censored outcome, no entity effects |
| Random Effects Tobit | RandomEffectsTobit |
-- | Censored outcome with RE |
| Honore Trimmed | HonoreTrimmedEstimator |
Honore (1992) | Censored outcome with FE |
| Panel Heckman | PanelHeckman |
Wooldridge (1995) | Sample selection bias |
Quick Example¶
from panelbox.models.censored import PooledTobit
# Hours worked, censored at 0
model = PooledTobit(
"hours ~ age + education + children",
data, "id", "year",
lower=0 # Left-censored at 0
)
results = model.fit()
print(results.summary())
Key Concepts¶
Censoring vs. Truncation vs. Selection¶
| Problem | Definition | Model |
|---|---|---|
| Censoring | \(y^* = X\beta + \epsilon\); observe \(y = \max(0, y^*)\) | Tobit |
| Truncation | Only observe cases where \(y > 0\) (others missing entirely) | Truncated regression |
| Selection | \(y\) observed only if selection equation \(z > 0\) | Heckman |
Tobit: Censored Outcomes¶
The Tobit model handles outcomes that are censored at a known boundary (typically zero):
from panelbox.models.censored import RandomEffectsTobit
model = RandomEffectsTobit(
"hours ~ age + education + children",
data, "id", "year",
lower=0
)
results = model.fit()
The Tobit model estimates three types of marginal effects:
| Effect | Interpretation |
|---|---|
| Unconditional | Effect on \(E(y)\) including the censored region |
| Conditional | Effect on \(E(y \mid y > 0)\) for the uncensored subpopulation |
| Probability | Effect on \(P(y > 0)\) |
Honore Trimmed Estimator: FE Tobit¶
Standard Tobit with fixed effects suffers from the incidental parameters problem. The Honore (1992) trimmed estimator provides consistent FE estimates by exploiting the panel structure:
from panelbox.models.censored import HonoreTrimmedEstimator
model = HonoreTrimmedEstimator(
"hours ~ age + education + children",
data, "id", "year",
lower=0
)
results = model.fit()
Requires T >= 2
The Honore estimator uses pairwise differences across time periods within each entity. It requires at least 2 time periods per entity and works best with balanced panels.
Panel Heckman: Sample Selection¶
When the outcome is observed only for a non-random subsample, the Heckman model corrects for selection bias using a two-equation system:
from panelbox.models.selection import PanelHeckman
model = PanelHeckman(
outcome_formula="log_wage ~ education + experience",
selection_formula="employed ~ education + experience + children",
data=data,
entity_col="id",
time_col="year"
)
results = model.fit()
print(results.summary())
# Selection correction term (inverse Mills ratio)
print(f"Lambda: {results.lambda_coef:.4f}")
print(f"Lambda p-value: {results.lambda_pvalue:.4f}")
Exclusion restriction
For identification, the selection equation should include at least one variable that affects selection but not the outcome (an exclusion restriction). In the wage example, children affects employment but may not directly affect wages.
Detailed Guides¶
- Tobit Models -- Pooled and RE Tobit (detailed guide coming soon)
- Honore Estimator -- FE Tobit with trimmed estimator (detailed guide coming soon)
- Panel Heckman -- Sample selection correction (detailed guide coming soon)
Tutorials¶
See Censored & Selection Tutorial for interactive notebooks with Google Colab.
API Reference¶
See Censored Models API for complete technical reference.
References¶
- Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica, 26(1), 24-36.
- Honore, B. E. (1992). Trimmed LAD and least squares estimation of truncated and censored regression models with fixed effects. Econometrica, 60(3), 533-565.
- Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47(1), 153-161.
- Wooldridge, J. M. (1995). Selection corrections for panel data models under conditional mean independence assumptions. Journal of Econometrics, 68(1), 115-132.