Tobit Models¶
Quick Reference
Classes: panelbox.models.censored.PooledTobit, panelbox.models.censored.RandomEffectsTobit
Import: from panelbox.models.censored import PooledTobit, RandomEffectsTobit
Stata equivalent: tobit (pooled), xttobit (RE)
R equivalent: censReg::censReg(), pglm::pglm()
Overview¶
The Tobit model (Tobin, 1958) is the standard approach for regression with censored dependent variables -- outcomes that are observed at a boundary value rather than their true latent value. Common examples include hours worked (censored at zero), expenditure on durable goods, and insurance claims.
The key insight is that the observed outcome is a censored version of a latent variable:
where \(c\) is the censoring point. Standard OLS ignores the pile-up of observations at the boundary, producing biased and inconsistent estimates. The Tobit model accounts for censoring by modeling the likelihood of being at the boundary versus being uncensored.
PanelBox provides two Tobit specifications: PooledTobit (ignores panel structure) and RandomEffectsTobit (accounts for entity-level heterogeneity via random effects with Gauss-Hermite quadrature integration).
Quick Example¶
import numpy as np
from panelbox.models.censored import PooledTobit
# Simulate censored data
np.random.seed(42)
n = 500
X = np.column_stack([np.ones(n), np.random.randn(n, 2)])
y_star = X @ np.array([1.0, 0.5, -0.3]) + np.random.randn(n)
y = np.maximum(0, y_star) # Left-censored at 0
groups = np.repeat(np.arange(50), 10)
# Fit Pooled Tobit
model = PooledTobit(endog=y, exog=X, groups=groups, censoring_point=0.0)
result = model.fit(method="BFGS")
print(result.summary())
When to Use¶
- Your dependent variable is censored at a known boundary (e.g., 0, 100)
- You observe the boundary value for censored observations (not missing)
- You want to estimate effects on the latent (uncensored) outcome
- The censoring mechanism is exogenous (not related to unobservables)
Key Assumptions
- Normality: errors \(\varepsilon_{it} \sim N(0, \sigma^2)\) (pooled) or \(\varepsilon_{it} \sim N(0, \sigma_\varepsilon^2)\) (RE)
- Exogenous censoring: the censoring point is fixed and known
- Linearity: the latent variable is linear in \(X\)
- RE Tobit additionally assumes: \(\alpha_i \sim N(0, \sigma_\alpha^2)\), independent of \(X\)
Censoring vs. Truncation¶
It is important to distinguish censoring from truncation and sample selection:
| Problem | What happens | Observed data | Model |
|---|---|---|---|
| Censoring | \(y^*\) is limited to \([c, \infty)\); observe \(y = \max(c, y^*)\) | All observations, some at boundary | Tobit |
| Truncation | Observations with \(y^* \leq c\) are dropped entirely | Only uncensored observations | Truncated regression |
| Selection | \(y\) observed only if a separate condition holds | Outcome missing for some | Heckman |
With censoring, you still observe the censored value (e.g., zero hours worked). With truncation, those observations are completely absent from the data.
Detailed Guide¶
Censoring Types¶
PanelBox supports three types of censoring via the censoring_type parameter:
| Type | Formula | Example |
|---|---|---|
"left" (default) |
\(y = \max(c, y^*)\) | Hours worked \(\geq 0\) |
"right" |
\(y = \min(c, y^*)\) | Test scores \(\leq 100\) |
"both" |
\(y = \max(l, \min(u, y^*))\) | Satisfaction score in \([1, 5]\) |
PooledTobit¶
The Pooled Tobit ignores the panel structure, treating all observations as independent. It is suitable when entity-level heterogeneity is not a concern or as a baseline model.
from panelbox.models.censored import PooledTobit
model = PooledTobit(
endog=y, # Dependent variable (censored)
exog=X, # Regressors (n x k)
groups=entity, # Entity IDs (for clustered SEs)
censoring_point=0.0, # Censoring threshold
censoring_type="left", # 'left', 'right', or 'both'
)
result = model.fit(method="BFGS", maxiter=1000)
Key attributes after fitting:
| Attribute | Description |
|---|---|
result.beta |
Coefficient vector \(\hat{\beta}\) |
result.sigma |
Error standard deviation \(\hat{\sigma}\) |
result.llf |
Log-likelihood value |
result.bse |
Standard errors |
result.converged |
Whether optimization converged |
RandomEffectsTobit¶
The RE Tobit adds entity-specific random effects to account for unobserved heterogeneity:
where \(\alpha_i \sim N(0, \sigma_\alpha^2)\) and \(\varepsilon_{it} \sim N(0, \sigma_\varepsilon^2)\). The random effect is integrated out using Gauss-Hermite quadrature.
from panelbox.models.censored import RandomEffectsTobit
model = RandomEffectsTobit(
endog=y, # Dependent variable
exog=X, # Regressors
groups=entity, # Entity IDs
time=time, # Time IDs
censoring_point=0.0, # Censoring threshold
censoring_type="left", # Censoring type
quadrature_points=12, # Integration accuracy
)
result = model.fit(method="BFGS", maxiter=1000)
Additional attributes for RE Tobit:
| Attribute | Description |
|---|---|
result.sigma_eps |
Idiosyncratic error SD \(\hat{\sigma}_\varepsilon\) |
result.sigma_alpha |
Random effect SD \(\hat{\sigma}_\alpha\) |
Quadrature points
The quadrature_points parameter controls the accuracy of the numerical integration over the random effect distribution. Higher values (e.g., 20) increase accuracy but slow computation. The default of 12 is adequate for most applications.
Double Censoring¶
For outcomes censored at both ends, use censoring_type="both" with explicit limits:
model = PooledTobit(
endog=y,
exog=X,
groups=entity,
censoring_type="both",
lower_limit=1.0, # Lower bound
upper_limit=5.0, # Upper bound
)
result = model.fit()
Predictions¶
Tobit models offer three types of predictions, each with a different interpretation:
The expected value of the latent variable, as if there were no censoring. This can produce values below the censoring point.
The expected value of the observed (censored) outcome. For left censoring at \(c\):
Marginal Effects¶
In nonlinear models, coefficients \(\beta\) do not directly represent marginal effects. PanelBox computes three types of marginal effects:
# Average Marginal Effects (AME) on conditional mean
ame = result.marginal_effects(at="overall", which="conditional")
# Marginal Effects at Means (MEM) on probability
mem = result.marginal_effects(at="mean", which="probability")
See Marginal Effects for Censored Models for details.
Configuration Options¶
PooledTobit Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
endog |
array-like | required | Dependent variable |
exog |
array-like | required | Regressors |
groups |
array-like | None |
Entity IDs |
censoring_point |
float | 0.0 |
Censoring threshold |
censoring_type |
str | "left" |
"left", "right", or "both" |
lower_limit |
float | None |
Lower bound (for "both") |
upper_limit |
float | None |
Upper bound (for "both") |
RandomEffectsTobit Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
endog |
array-like | required | Dependent variable |
exog |
array-like | required | Regressors |
groups |
array-like | required | Entity IDs |
time |
array-like | None |
Time IDs |
censoring_point |
float | 0.0 |
Censoring threshold |
censoring_type |
str | "left" |
"left", "right", or "both" |
lower_limit |
float | None |
Lower bound (for "both") |
upper_limit |
float | None |
Upper bound (for "both") |
quadrature_points |
int | 12 |
Gauss-Hermite quadrature points |
fit() Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
start_params |
array-like | None |
Starting values (auto-computed from OLS if None) |
method |
str | "BFGS" |
Optimization method |
maxiter |
int | 1000 |
Maximum iterations |
Diagnostics¶
Percentage Censored¶
n_censored = np.sum(np.abs(y - 0.0) < 1e-10)
pct_censored = n_censored / len(y) * 100
print(f"Censored observations: {n_censored} ({pct_censored:.1f}%)")
Censoring rate
If the censoring rate is very high (>80%) or very low (<5%), the Tobit model may be poorly identified. Very high censoring means little information about the latent variable, and very low censoring means OLS may be adequate.
Comparing Pooled vs. RE Tobit¶
# Compare log-likelihoods
print(f"Pooled Tobit LL: {pooled_result.llf:.2f}")
print(f"RE Tobit LL: {re_result.llf:.2f}")
# LR test for random effects
lr_stat = 2 * (re_result.llf - pooled_result.llf)
print(f"LR statistic: {lr_stat:.2f}")
# Compare with chi-squared(1) critical value
Tutorials¶
| Tutorial | Description | Link |
|---|---|---|
| Censored Models | Full walkthrough of Tobit estimation |
See Also¶
- Honore Trimmed Estimator -- Fixed effects estimation for censored data
- Panel Heckman -- Sample selection models
- Marginal Effects for Censored Models -- Interpreting nonlinear effects
- Murphy-Topel Correction -- Correcting SEs in two-step estimators
References¶
- Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica, 26(1), 24-36.
- Amemiya, T. (1984). Tobit models: A survey. Journal of Econometrics, 24(1-2), 3-61.
- Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press. Chapter 17.