Skip to content

Tobit Models

Quick Reference

Classes: panelbox.models.censored.PooledTobit, panelbox.models.censored.RandomEffectsTobit Import: from panelbox.models.censored import PooledTobit, RandomEffectsTobit Stata equivalent: tobit (pooled), xttobit (RE) R equivalent: censReg::censReg(), pglm::pglm()

Overview

The Tobit model (Tobin, 1958) is the standard approach for regression with censored dependent variables -- outcomes that are observed at a boundary value rather than their true latent value. Common examples include hours worked (censored at zero), expenditure on durable goods, and insurance claims.

The key insight is that the observed outcome is a censored version of a latent variable:

\[y_{it}^* = X_{it}'\beta + \varepsilon_{it}\]
\[y_{it} = \max(c, y_{it}^*)\]

where \(c\) is the censoring point. Standard OLS ignores the pile-up of observations at the boundary, producing biased and inconsistent estimates. The Tobit model accounts for censoring by modeling the likelihood of being at the boundary versus being uncensored.

PanelBox provides two Tobit specifications: PooledTobit (ignores panel structure) and RandomEffectsTobit (accounts for entity-level heterogeneity via random effects with Gauss-Hermite quadrature integration).

Quick Example

import numpy as np
from panelbox.models.censored import PooledTobit

# Simulate censored data
np.random.seed(42)
n = 500
X = np.column_stack([np.ones(n), np.random.randn(n, 2)])
y_star = X @ np.array([1.0, 0.5, -0.3]) + np.random.randn(n)
y = np.maximum(0, y_star)  # Left-censored at 0
groups = np.repeat(np.arange(50), 10)

# Fit Pooled Tobit
model = PooledTobit(endog=y, exog=X, groups=groups, censoring_point=0.0)
result = model.fit(method="BFGS")
print(result.summary())

When to Use

  • Your dependent variable is censored at a known boundary (e.g., 0, 100)
  • You observe the boundary value for censored observations (not missing)
  • You want to estimate effects on the latent (uncensored) outcome
  • The censoring mechanism is exogenous (not related to unobservables)

Key Assumptions

  • Normality: errors \(\varepsilon_{it} \sim N(0, \sigma^2)\) (pooled) or \(\varepsilon_{it} \sim N(0, \sigma_\varepsilon^2)\) (RE)
  • Exogenous censoring: the censoring point is fixed and known
  • Linearity: the latent variable is linear in \(X\)
  • RE Tobit additionally assumes: \(\alpha_i \sim N(0, \sigma_\alpha^2)\), independent of \(X\)

Censoring vs. Truncation

It is important to distinguish censoring from truncation and sample selection:

Problem What happens Observed data Model
Censoring \(y^*\) is limited to \([c, \infty)\); observe \(y = \max(c, y^*)\) All observations, some at boundary Tobit
Truncation Observations with \(y^* \leq c\) are dropped entirely Only uncensored observations Truncated regression
Selection \(y\) observed only if a separate condition holds Outcome missing for some Heckman

With censoring, you still observe the censored value (e.g., zero hours worked). With truncation, those observations are completely absent from the data.

Detailed Guide

Censoring Types

PanelBox supports three types of censoring via the censoring_type parameter:

Type Formula Example
"left" (default) \(y = \max(c, y^*)\) Hours worked \(\geq 0\)
"right" \(y = \min(c, y^*)\) Test scores \(\leq 100\)
"both" \(y = \max(l, \min(u, y^*))\) Satisfaction score in \([1, 5]\)

PooledTobit

The Pooled Tobit ignores the panel structure, treating all observations as independent. It is suitable when entity-level heterogeneity is not a concern or as a baseline model.

from panelbox.models.censored import PooledTobit

model = PooledTobit(
    endog=y,                    # Dependent variable (censored)
    exog=X,                     # Regressors (n x k)
    groups=entity,              # Entity IDs (for clustered SEs)
    censoring_point=0.0,        # Censoring threshold
    censoring_type="left",      # 'left', 'right', or 'both'
)
result = model.fit(method="BFGS", maxiter=1000)

Key attributes after fitting:

Attribute Description
result.beta Coefficient vector \(\hat{\beta}\)
result.sigma Error standard deviation \(\hat{\sigma}\)
result.llf Log-likelihood value
result.bse Standard errors
result.converged Whether optimization converged

RandomEffectsTobit

The RE Tobit adds entity-specific random effects to account for unobserved heterogeneity:

\[y_{it}^* = X_{it}'\beta + \alpha_i + \varepsilon_{it}\]

where \(\alpha_i \sim N(0, \sigma_\alpha^2)\) and \(\varepsilon_{it} \sim N(0, \sigma_\varepsilon^2)\). The random effect is integrated out using Gauss-Hermite quadrature.

from panelbox.models.censored import RandomEffectsTobit

model = RandomEffectsTobit(
    endog=y,                    # Dependent variable
    exog=X,                     # Regressors
    groups=entity,              # Entity IDs
    time=time,                  # Time IDs
    censoring_point=0.0,        # Censoring threshold
    censoring_type="left",      # Censoring type
    quadrature_points=12,       # Integration accuracy
)
result = model.fit(method="BFGS", maxiter=1000)

Additional attributes for RE Tobit:

Attribute Description
result.sigma_eps Idiosyncratic error SD \(\hat{\sigma}_\varepsilon\)
result.sigma_alpha Random effect SD \(\hat{\sigma}_\alpha\)

Quadrature points

The quadrature_points parameter controls the accuracy of the numerical integration over the random effect distribution. Higher values (e.g., 20) increase accuracy but slow computation. The default of 12 is adequate for most applications.

Double Censoring

For outcomes censored at both ends, use censoring_type="both" with explicit limits:

model = PooledTobit(
    endog=y,
    exog=X,
    groups=entity,
    censoring_type="both",
    lower_limit=1.0,        # Lower bound
    upper_limit=5.0,        # Upper bound
)
result = model.fit()

Predictions

Tobit models offer three types of predictions, each with a different interpretation:

# E[y*|X] = X'beta (ignores censoring)
y_latent = result.predict(pred_type="latent")

The expected value of the latent variable, as if there were no censoring. This can produce values below the censoring point.

# E[y|X] accounting for censoring
y_censored = result.predict(pred_type="censored")

The expected value of the observed (censored) outcome. For left censoring at \(c\):

\[E[y|X] = X'\beta \cdot \Phi\!\left(\frac{X'\beta - c}{\sigma}\right) + c \cdot \left[1 - \Phi\!\left(\frac{X'\beta - c}{\sigma}\right)\right] + \sigma \cdot \phi\!\left(\frac{X'\beta - c}{\sigma}\right)\]
# P(y > c | X) — probability of being uncensored (PooledTobit only)
p_uncensored = result.predict(pred_type="probability")

The probability that the observation is uncensored. Only available for PooledTobit.

Marginal Effects

In nonlinear models, coefficients \(\beta\) do not directly represent marginal effects. PanelBox computes three types of marginal effects:

# Average Marginal Effects (AME) on conditional mean
ame = result.marginal_effects(at="overall", which="conditional")

# Marginal Effects at Means (MEM) on probability
mem = result.marginal_effects(at="mean", which="probability")

See Marginal Effects for Censored Models for details.

Configuration Options

PooledTobit Parameters

Parameter Type Default Description
endog array-like required Dependent variable
exog array-like required Regressors
groups array-like None Entity IDs
censoring_point float 0.0 Censoring threshold
censoring_type str "left" "left", "right", or "both"
lower_limit float None Lower bound (for "both")
upper_limit float None Upper bound (for "both")

RandomEffectsTobit Parameters

Parameter Type Default Description
endog array-like required Dependent variable
exog array-like required Regressors
groups array-like required Entity IDs
time array-like None Time IDs
censoring_point float 0.0 Censoring threshold
censoring_type str "left" "left", "right", or "both"
lower_limit float None Lower bound (for "both")
upper_limit float None Upper bound (for "both")
quadrature_points int 12 Gauss-Hermite quadrature points

fit() Parameters

Parameter Type Default Description
start_params array-like None Starting values (auto-computed from OLS if None)
method str "BFGS" Optimization method
maxiter int 1000 Maximum iterations

Diagnostics

Percentage Censored

n_censored = np.sum(np.abs(y - 0.0) < 1e-10)
pct_censored = n_censored / len(y) * 100
print(f"Censored observations: {n_censored} ({pct_censored:.1f}%)")

Censoring rate

If the censoring rate is very high (>80%) or very low (<5%), the Tobit model may be poorly identified. Very high censoring means little information about the latent variable, and very low censoring means OLS may be adequate.

Comparing Pooled vs. RE Tobit

# Compare log-likelihoods
print(f"Pooled Tobit LL: {pooled_result.llf:.2f}")
print(f"RE Tobit LL:     {re_result.llf:.2f}")

# LR test for random effects
lr_stat = 2 * (re_result.llf - pooled_result.llf)
print(f"LR statistic: {lr_stat:.2f}")
# Compare with chi-squared(1) critical value

Tutorials

Tutorial Description Link
Censored Models Full walkthrough of Tobit estimation Open In Colab

See Also

References

  • Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica, 26(1), 24-36.
  • Amemiya, T. (1984). Tobit models: A survey. Journal of Econometrics, 24(1-2), 3-61.
  • Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press. Chapter 17.