Skip to content

Negative Binomial Regression

Quick Reference

Classes: NegativeBinomial, FixedEffectsNegativeBinomial Import: from panelbox.models.count import NegativeBinomial, FixedEffectsNegativeBinomial Stata equivalent: nbreg, xtnbreg, fe R equivalent: pglm::pglm(family="negbin"), MASS::glm.nb()

Overview

The Negative Binomial (NB) model extends Poisson regression to handle overdispersion --- the common situation where the variance of count data exceeds its mean (\(\text{Var}(y) > E[y]\)). While Poisson regression assumes equidispersion (\(\text{Var}(y) = E[y]\)), real-world count data almost always violates this assumption.

PanelBox implements the NB2 parameterization (Cameron and Trivedi, 2013), where variance is a quadratic function of the mean:

\[\text{Var}(y_{it} \mid X_{it}) = \mu_{it} + \alpha \mu_{it}^2\]

where \(\mu_{it} = \exp(X_{it}'\beta)\) and \(\alpha \geq 0\) is the overdispersion parameter. When \(\alpha = 0\), the model reduces to standard Poisson.

Quick Example

from panelbox.models.count import NegativeBinomial

model = NegativeBinomial(
    endog=data["claims"],
    exog=data[["age", "income", "risk_score"]],
    entity_id=data["policy_id"],
    time_id=data["year"]
)
results = model.fit()

# Overdispersion parameter
print(f"alpha = {results.alpha:.4f}")

# LR test: Poisson vs NB
lr_test = results.lr_test_poisson()
print(f"LR statistic: {lr_test['statistic']:.2f}, p-value: {lr_test['pvalue']:.4f}")
print(lr_test["conclusion"])

When to Use

  • Count data with \(\text{Var}(y) > E[y]\) (overdispersion)
  • Insurance claims, hospital visits, accident counts
  • Patent data, publication counts
  • Any count outcome where Poisson standard errors are too small

Key Assumptions

  • NB2 variance: \(\text{Var}(y) = \mu + \alpha \mu^2\) with \(\alpha \geq 0\)
  • Correct mean specification: \(E[y \mid X] = \exp(X'\beta)\)
  • Independence across entities: observations from different entities are independent
  • No underdispersion: NB cannot handle \(\text{Var}(y) < E[y]\)

Detailed Guide

When Poisson Fails

The Poisson model assumes \(\text{Var}(y) = E[y]\), but in practice variance typically exceeds the mean. Overdispersion does not bias Poisson coefficient estimates, but it causes:

  • Standard errors that are too small --- leading to inflated t-statistics
  • Confidence intervals that are too narrow --- producing false rejections
  • Incorrect model selection --- AIC/BIC comparisons are invalid
from panelbox.models.count import PooledPoisson

# First, fit Poisson to check overdispersion
poisson = PooledPoisson(
    endog=data["claims"],
    exog=data[["age", "income"]],
    entity_id=data["policy_id"],
    time_id=data["year"]
)
pois_results = poisson.fit(se_type="cluster")

# Check variance-to-mean ratio
od_test = pois_results.check_overdispersion()
print(od_test)

NB2 Parameterization

The NB2 model introduces one additional parameter \(\alpha\) to capture overdispersion:

Quantity Formula Poisson (\(\alpha = 0\))
Mean \(\mu = \exp(X'\beta)\) Same
Variance \(\mu + \alpha \mu^2\) \(\mu\)
Prob(\(y = k\)) \(\frac{\Gamma(k + 1/\alpha)}{\Gamma(k+1)\Gamma(1/\alpha)} \left(\frac{1/\alpha}{1/\alpha + \mu}\right)^{1/\alpha} \left(\frac{\mu}{1/\alpha + \mu}\right)^k\) \(e^{-\mu} \mu^k / k!\)

The NB2 model can be derived as a Poisson-Gamma mixture: \(y \mid \lambda \sim \text{Poisson}(\lambda)\) with \(\lambda \sim \text{Gamma}(\mu, \alpha)\).

Estimation

Pooled Negative Binomial

from panelbox.models.count import NegativeBinomial

model = NegativeBinomial(
    endog=data["claims"],
    exog=data[["age", "income", "risk_score"]],
    entity_id=data["policy_id"],
    time_id=data["year"]
)
results = model.fit(method="BFGS", maxiter=1000)

print(results.summary())

Fixed Effects Negative Binomial

The FE NB model (Allison and Waterman, 2002) includes entity dummies in the NB model:

from panelbox.models.count import FixedEffectsNegativeBinomial

model = FixedEffectsNegativeBinomial(
    endog=data["claims"],
    exog=data[["age", "income", "risk_score"]],
    entity_id=data["policy_id"],
    time_id=data["year"]
)
results = model.fit()

FE NB Caveat

The Allison-Waterman FE NB estimator uses a dummy variable approach (LSDV) rather than true conditional ML. With many entities, this can be computationally intensive, and PanelBox will warn if there are more than 100 entities. For large panels, consider Poisson FE with cluster-robust SE as an alternative.

Interpreting Results

Coefficients

As in Poisson, NB coefficients are semi-elasticities:

\[\frac{\partial \ln E[y \mid X]}{\partial x_k} = \beta_k\]

A one-unit increase in \(x_k\) changes \(E[y]\) by approximately \(100 \times \beta_k\) percent. Exponentiated coefficients give incidence rate ratios (IRR):

import numpy as np

# Coefficients and IRR
for name, coef, se in zip(results.exog_names, results.params_exog, results.se):
    irr = np.exp(coef)
    print(f"{name}: beta = {coef:.4f} (SE = {se:.4f}), IRR = {irr:.4f}")

Overdispersion Parameter

The estimated \(\alpha\) quantifies the degree of overdispersion:

print(f"Overdispersion (alpha): {results.alpha:.4f}")

# Interpretation
if results.alpha < 0.01:
    print("Minimal overdispersion - Poisson may suffice")
elif results.alpha < 1.0:
    print("Moderate overdispersion - NB preferred")
else:
    print("Strong overdispersion - NB strongly preferred")

Testing Poisson vs Negative Binomial

Likelihood Ratio Test

The LR test compares Poisson (\(\alpha = 0\)) against NB (\(\alpha > 0\)):

\[LR = 2(\ell_{NB} - \ell_{\text{Poisson}}) \sim \bar{\chi}^2(1)\]

The distribution is a mixture of \(\chi^2(0)\) and \(\chi^2(1)\) since \(\alpha = 0\) is on the boundary.

# Built-in LR test
lr_test = results.lr_test_poisson()
print(f"LR statistic: {lr_test['statistic']:.2f}")
print(f"p-value: {lr_test['pvalue']:.4f}")
print(f"Conclusion: {lr_test['conclusion']}")

Informal Check: Variance-to-Mean Ratio

var_mean_ratio = data["claims"].var() / data["claims"].mean()
print(f"Var/Mean ratio: {var_mean_ratio:.2f}")
# Poisson expects ~1.0; values >> 1 suggest overdispersion

When NOT to Use

  • Underdispersion (\(\text{Var}(y) < E[y]\)): NB cannot handle this; consider generalized Poisson
  • Excess zeros: if overdispersion is driven by too many zeros, consider Zero-Inflated models
  • Gravity models: use PPML instead, which provides elasticity tools and handles heteroskedasticity

Configuration Options

Parameter Type Default Description
endog array-like required Dependent variable (non-negative counts)
exog array-like required Independent variables
entity_id array-like None Entity identifiers
time_id array-like None Time identifiers
weights array-like None Observation weights

fit() Parameters

Parameter Type Default Description
start_params array None Starting values (Poisson estimates + \(\alpha = 0.1\) if None)
method str "BFGS" Optimization method
maxiter int 1000 Maximum iterations

Diagnostics

Model Comparison

from panelbox.models.count import PooledPoisson, NegativeBinomial

# Fit both models
poisson = PooledPoisson(endog=y, exog=X, entity_id=entity, time_id=time)
pois_res = poisson.fit(se_type="cluster")

nb = NegativeBinomial(endog=y, exog=X, entity_id=entity, time_id=time)
nb_res = nb.fit()

# Compare
print(f"Poisson LLF: {pois_res.llf:.2f}, AIC: {pois_res.aic:.2f}")
print(f"NB LLF:      {nb_res.llf:.2f}, AIC: {nb_res.aic:.2f}")
print(f"Alpha:       {nb_res.alpha:.4f}")

# LR test
lr_test = nb_res.lr_test_poisson()
print(f"LR test p-value: {lr_test['pvalue']:.4f}")

Predictions

# Predicted counts
y_hat = nb_res.predict(which="mean")

# Linear predictor
xb = nb_res.predict(which="linear")

Tutorials

Tutorial Description Link
Count Data Models Poisson vs NB comparison with overdispersion testing Open in Colab

See Also

References

  • Cameron, A. C., & Trivedi, P. K. (2013). Regression Analysis of Count Data (2nd ed.). Cambridge University Press.
  • Allison, P. D., & Waterman, R. P. (2002). Fixed-Effects Negative Binomial Regression Models. Sociological Methodology, 32(1), 247--265.
  • Hilbe, J. M. (2011). Negative Binomial Regression (2nd ed.). Cambridge University Press.
  • Cameron, A. C., & Trivedi, P. K. (1986). Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests. Journal of Applied Econometrics, 1(1), 29--53.