Count Data Models¶

Count data models are designed for non-negative integer outcomes: number of patents filed, hospital admissions, trade flows, crime incidents, etc. Standard linear regression is inappropriate for count data because it can predict negative values and ignores the discrete, non-negative nature of the outcome. Panel count models address these issues while accounting for unobserved entity heterogeneity.

PanelBox provides 11 count data estimators covering Poisson, Negative Binomial, PPML (for trade/gravity models), and Zero-Inflated specifications with pooled, fixed effects, random effects, and quasi-maximum likelihood options.

Available Models¶

Poisson Models¶

Model	Class	Estimation	Key Feature
Pooled Poisson	`PooledPoisson`	MLE + Cluster-robust	Baseline count model
Poisson FE	`PoissonFixedEffects`	Conditional MLE	Consistent with entity effects
Poisson RE	`RandomEffectsPoisson`	MLE	Gamma-distributed RE
Poisson QML	`PoissonQML`	Quasi-MLE	Robust to overdispersion

Negative Binomial Models¶

Model	Class	Estimation	Key Feature
Negative Binomial	`NegativeBinomial`	MLE (NB2)	Handles overdispersion
NB Fixed Effects	`FixedEffectsNegativeBinomial`	Conditional MLE	FE + overdispersion

PPML (Gravity Models)¶

Model	Class	Estimation	Key Feature
PPML	`PPML`	Pseudo-MLE	Trade/gravity; handles zeros

Zero-Inflated Models¶

Model	Class	Estimation	Key Feature
Zero-Inflated Poisson	`ZeroInflatedPoisson`	EM/MLE	Excess zeros (structural)
Zero-Inflated NB	`ZeroInflatedNegativeBinomial`	EM/MLE	Excess zeros + overdispersion

Quick Example¶

from panelbox.models.count import PoissonFixedEffects

model = PoissonFixedEffects(
    "patents ~ rd_spending + employees",
    data, "firm", "year"
)
results = model.fit()
print(results.summary())

Key Concepts¶

Overdispersion¶

The Poisson model assumes \(\text{Var}(y) = E(y)\) (equidispersion). In practice, count data often exhibits overdispersion: \(\text{Var}(y) > E(y)\). Overdispersion does not bias Poisson coefficients but invalidates standard errors.

Solutions:

Approach	Implementation	When to Use
Cluster-robust SEs	`PooledPoisson` with cluster	Mild overdispersion
Quasi-MLE	`PoissonQML`	Moderate overdispersion
Negative Binomial	`NegativeBinomial`	Strong overdispersion

Excess Zeros¶

When the data contains more zeros than the Poisson or NB distributions predict, a zero-inflated model separates the zero-generating process from the count process:

from panelbox.models.count import ZeroInflatedPoisson

model = ZeroInflatedPoisson(
    "patents ~ rd_spending + employees",
    data, "firm", "year",
    inflate_formula="~ small_firm + new_entrant"  # Predicts excess zeros
)
results = model.fit()

PPML for Gravity Models¶

The Poisson Pseudo-Maximum Likelihood estimator is the standard approach for gravity models in international trade. It handles zeros in trade flows and provides consistent estimates under heteroskedasticity:

from panelbox.models.count import PPML

model = PPML(
    "trade ~ log_gdp_origin + log_gdp_dest + log_distance",
    data, "pair", "year"
)
results = model.fit()

PPML advantage

Unlike log-linear OLS (log(trade) ~ ...), PPML handles zero trade flows naturally and is consistent under heteroskedasticity (Santos Silva & Tenreyro, 2006).

Poisson Fixed Effects¶

The Poisson FE estimator uses conditional MLE (Hausman, Hall, and Griliches, 1984), conditioning out the fixed effects to avoid the incidental parameters problem:

from panelbox.models.count import PoissonFixedEffects

model = PoissonFixedEffects("count ~ x1 + x2", data, "id", "year")
results = model.fit()

Detailed Guides¶

Poisson Models -- Pooled, FE, RE, QML
Negative Binomial -- Overdispersion modeling
PPML -- Gravity models and trade
Zero-Inflated Models -- Excess zeros
Marginal Effects -- Interpreting nonlinear count models

Tutorials¶

See Count Data Tutorial for interactive notebooks with Google Colab.

API Reference¶

See Count Data API for complete technical reference.

References¶

Cameron, A. C., & Trivedi, P. K. (2013). Regression Analysis of Count Data (2^nd ed.). Cambridge University Press.
Hausman, J. A., Hall, B. H., & Griliches, Z. (1984). Econometric models for count data with an application to the patents-R&D relationship. Econometrica, 52(4), 909-938.
Santos Silva, J. M. C., & Tenreyro, S. (2006). The log of gravity. Review of Economics and Statistics, 88(4), 641-658.
Wooldridge, J. M. (1999). Distribution-free estimation of some nonlinear panel data models. Journal of Econometrics, 90(1), 77-97.