Censored & Selection API Reference¶

Module

Import: from panelbox.models.censored import PooledTobit, RandomEffectsTobit, HonoreTrimmedEstimator from panelbox.models.selection import PanelHeckman, compute_imr Source: panelbox/models/censored/, panelbox/models/selection/

Overview¶

These modules handle two related problems in panel data:

Censored data: The dependent variable is observed only within a range (e.g., wages censored at zero)
Sample selection: Observations are missing non-randomly (e.g., wages observed only for workers)

Model	Problem	Reference
`PooledTobit`	Left/right/two-sided censoring	Tobin (1958)
`RandomEffectsTobit`	Censoring with random effects	—
`HonoreTrimmedEstimator`	Semiparametric FE Tobit	Honore (1992)
`PanelHeckman`	Sample selection bias correction	Heckman (1979), Wooldridge (1995)

Censored Models¶

PooledTobit¶

Tobit model for censored dependent variables. Handles left censoring (at zero), right censoring, or both.

Constructor¶

PooledTobit(
    endog: np.ndarray,
    exog: np.ndarray,
    groups: np.ndarray,
    time: np.ndarray | None = None,
    censoring_point: float = 0.0,
    censoring_type: Literal["left", "right", "both"] = "left",
    lower_limit: float | None = None,
    upper_limit: float | None = None,
)

Parameter	Type	Default	Description
`endog`	`np.ndarray`	required	Censored dependent variable
`exog`	`np.ndarray`	required	Independent variables
`groups`	`np.ndarray`	required	Entity identifiers
`time`	`np.ndarray \\| None`	`None`	Time identifiers
`censoring_point`	`float`	`0.0`	Censoring threshold
`censoring_type`	`str`	`"left"`	Type: `"left"`, `"right"`, or `"both"`
`lower_limit`	`float \\| None`	`None`	Lower censoring limit (for `"both"`)
`upper_limit`	`float \\| None`	`None`	Upper censoring limit (for `"both"`)

Example¶

from panelbox.models.censored import PooledTobit

tobit = PooledTobit(
    endog=df["hours_worked"].values,
    exog=df[["wage", "education", "age"]].values,
    groups=df["person_id"].values,
    censoring_point=0.0,
    censoring_type="left",
)
result = tobit.fit()
result.summary()

RandomEffectsTobit¶

Random effects Tobit model using Gauss-Hermite quadrature.

Constructor¶

RandomEffectsTobit(
    endog: np.ndarray,
    exog: np.ndarray,
    groups: np.ndarray,
    time: np.ndarray | None = None,
    censoring_point: float = 0.0,
    censoring_type: Literal["left", "right", "both"] = "left",
    lower_limit: float | None = None,
    upper_limit: float | None = None,
    quadrature_points: int = 12,
)

Parameter	Type	Default	Description
`quadrature_points`	`int`	`12`	Number of quadrature points for integration

All other parameters are the same as PooledTobit.

HonoreTrimmedEstimator¶

Semiparametric fixed effects estimator for censored panel data (Honore 1992). Does not require distributional assumptions on the error term or the fixed effects.

Constructor¶

HonoreTrimmedEstimator(
    endog: np.ndarray,
    exog: np.ndarray,
    groups: np.ndarray,
    time: np.ndarray,
    censoring_point: float = 0.0,
)

Parameter	Type	Default	Description
`endog`	`np.ndarray`	required	Censored dependent variable
`exog`	`np.ndarray`	required	Independent variables
`groups`	`np.ndarray`	required	Entity identifiers
`time`	`np.ndarray`	required	Time identifiers
`censoring_point`	`float`	`0.0`	Censoring threshold

When to use Honore

Use when you need FE with censoring but want to avoid the incidental parameters problem. Requires at least T=2 periods per entity and uses pairwise comparisons across time periods.

Selection Models¶

PanelHeckman¶

Panel Heckman selection model correcting for non-random sample selection bias. Supports two-step (Wooldridge 1995) and maximum likelihood estimation.

Constructor¶

PanelHeckman(
    endog: np.ndarray,
    exog: np.ndarray,
    selection: np.ndarray,
    exog_selection: np.ndarray,
    entity: np.ndarray | None = None,
    time: np.ndarray | None = None,
    method: Literal["two_step", "mle"] = "two_step",
)

Parameter	Type	Default	Description
`endog`	`np.ndarray`	required	Outcome variable (observed only when selected)
`exog`	`np.ndarray`	required	Outcome equation regressors
`selection`	`np.ndarray`	required	Selection indicator (1=selected, 0=not)
`exog_selection`	`np.ndarray`	required	Selection equation regressors (should include exclusion restriction)
`entity`	`np.ndarray \\| None`	`None`	Entity identifiers
`time`	`np.ndarray \\| None`	`None`	Time identifiers
`method`	`str`	`"two_step"`	`"two_step"` (Heckman 1979) or `"mle"` (full information)

Exclusion restriction

The selection equation (exog_selection) should include at least one variable not in the outcome equation (exog). Without an exclusion restriction, identification relies solely on functional form.

Example¶

from panelbox.models.selection import PanelHeckman

heckman = PanelHeckman(
    endog=df["wage"].values,
    exog=df[["education", "experience"]].values,
    selection=df["employed"].values,
    exog_selection=df[["education", "experience", "children", "spouse_income"]].values,
    entity=df["person_id"].values,
    time=df["year"].values,
    method="two_step",
)
result = heckman.fit()
result.summary()

PanelHeckmanResult¶

Result container for Heckman selection models.

Key Attributes¶

Attribute	Type	Description
`outcome_params`	`pd.Series`	Outcome equation coefficients
`probit_params`	`pd.Series`	Selection equation (probit) coefficients
`sigma`	`float`	Standard deviation of outcome error
`rho`	`float`	Correlation between errors (selection and outcome)
`lambda_imr`	`float`	Coefficient on Inverse Mills Ratio

Methods¶

.summary() — Full results with both equations
.predict(type="unconditional") — Predictions ("unconditional" or "conditional")
.selection_effect() — Test for selection bias (H0: rho = 0)
.imr_diagnostics() — Inverse Mills Ratio diagnostics
.compare_ols_heckman() — Compare with naive OLS (quantify selection bias)
.plot_imr() — Diagnostic plots for IMR

Utility Functions¶

compute_imr¶

Compute the Inverse Mills Ratio (IMR).

from panelbox.models.selection import compute_imr

imr = compute_imr(x, params)

imr_derivative¶

Compute the derivative of the IMR.

from panelbox.models.selection import imr_derivative

d_imr = imr_derivative(x, params)

imr_diagnostics¶

Run diagnostic tests on the IMR.

from panelbox.models.selection import imr_diagnostics

diag = imr_diagnostics(data, selection_results)

test_selection_effect¶

Test whether selection bias is statistically significant (H0: rho = 0).

from panelbox.models.selection import test_selection_effect

test = test_selection_effect(results)
print(f"rho = {test.rho:.4f}, p-value = {test.pvalue:.4f}")

Complete Heckman Workflow¶

from panelbox.models.selection import PanelHeckman, test_selection_effect

# Step 1: Estimate Heckman model
heckman = PanelHeckman(
    endog=df["wage"].values,
    exog=df[["education", "experience"]].values,
    selection=df["employed"].values,
    exog_selection=df[["education", "experience", "children"]].values,
    entity=df["person_id"].values,
    method="two_step",
)
result = heckman.fit()

# Step 2: Check for selection bias
test = test_selection_effect(result)
if test.pvalue < 0.05:
    print("Significant selection bias detected!")
    print(f"rho = {result.rho:.4f}")

# Step 3: Compare with naive OLS
bias = result.compare_ols_heckman()

# Step 4: View full results
result.summary()

Censored & Selection API Reference¶

Overview¶

Censored Models¶

PooledTobit¶

Constructor¶

Example¶

RandomEffectsTobit¶

Constructor¶

HonoreTrimmedEstimator¶

Constructor¶

Selection Models¶

PanelHeckman¶

Constructor¶

Example¶

PanelHeckmanResult¶

Key Attributes¶

Methods¶

Utility Functions¶

compute_imr¶

imr_derivative¶

imr_diagnostics¶

test_selection_effect¶

Complete Heckman Workflow¶

See Also¶