Ordered Choice Models¶

Quick Reference

Classes: OrderedLogit, OrderedProbit, RandomEffectsOrderedLogit Import: from panelbox.models.discrete.ordered import OrderedLogit, OrderedProbit, RandomEffectsOrderedLogit Stata equivalent: ologit, oprobit, xtologit R equivalent: MASS::polr(), ordinal::clmm()

Overview¶

Ordered choice models are designed for ordinal dependent variables where \(y_{it} \in \{0, 1, \ldots, J-1\}\) with a natural ordering but no meaningful numeric scale. Examples include survey responses (strongly disagree to strongly agree), credit ratings (AAA to D), health status (poor, fair, good, excellent), or education levels.

The model posits a latent continuous variable:

\[y^*_{it} = X_{it}'\beta + \varepsilon_{it}\]

The observed ordinal outcome is determined by cutpoints (thresholds) \(\kappa_0 < \kappa_1 < \cdots < \kappa_{J-2}\):

\[y_{it} = j \quad \text{if} \quad \kappa_{j-1} < y^*_{it} \leq \kappa_j\]

with \(\kappa_{-1} = -\infty\) and \(\kappa_{J-1} = +\infty\). The probability of observing category \(j\) is:

\[P(y_{it} = j \mid X_{it}) = F(\kappa_j - X_{it}'\beta) - F(\kappa_{j-1} - X_{it}'\beta)\]

where \(F(\cdot)\) is the logistic CDF for Ordered Logit or the standard normal CDF for Ordered Probit.

Quick Example¶

import numpy as np
from panelbox.models.discrete.ordered import OrderedLogit

# Ordinal outcome: 0=low, 1=medium, 2=high
model = OrderedLogit(endog=y, exog=X, groups=entity, time=time)
model.fit(method="BFGS")

# Predicted probabilities for each category
probs = model.predict_proba()  # Shape: (N, J)

# Predicted most-likely category
categories = model.predict(type="category")

print(model.summary())

When to Use¶

Ordinal dependent variable with a natural ordering (survey scales, ratings, grades)
Proportional odds assumption is plausible -- the effect of \(X\) is the same across all cutpoints
OrderedLogit -- Default choice; logistic errors yield proportional odds interpretation
OrderedProbit -- When normality of errors is preferred; results are typically similar to logit
RandomEffectsOrderedLogit -- When panel data has individual heterogeneity uncorrelated with regressors

Key Assumptions

Proportional odds (parallel regression): The slope coefficients \(\beta\) are the same for all cutpoints. If this fails, consider generalized ordered logit models.
Correct category ordering: Categories must have a meaningful natural order.
No constant term: The cutpoints absorb the intercept; do not include a constant in \(X\).

Detailed Guide¶

Data Preparation¶

The dependent variable should be integer-coded starting from 0. PanelBox automatically remaps categories to \(\{0, 1, \ldots, J-1\}\) if they are not already in this format.

import numpy as np
import pandas as pd

# Example: satisfaction ratings (1-5 scale)
n_entities = 200
n_periods = 4
N = n_entities * n_periods

entity = np.repeat(range(n_entities), n_periods)
time = np.tile(range(n_periods), n_entities)
x1 = np.random.normal(0, 1, N)
x2 = np.random.normal(0, 1, N)

# Exogenous variables (no constant -- cutpoints serve as intercepts)
X = np.column_stack([x1, x2])

OrderedLogit¶

Uses the logistic CDF: \(F(z) = \Lambda(z) = \frac{e^z}{1 + e^z}\)

from panelbox.models.discrete.ordered import OrderedLogit

model = OrderedLogit(endog=y, exog=X, groups=entity, time=time)
model.fit(method="BFGS", maxiter=1000)

# Estimated parameters
print("Coefficients:", model.beta)
print("Cutpoints:", model.cutpoints)

# Predicted category probabilities
probs = model.predict_proba()  # (N, J) array
print(f"Probability of category 0: {probs[:, 0].mean():.3f}")
print(f"Probability of category 1: {probs[:, 1].mean():.3f}")

# Most likely category
predicted = model.predict(type="category")

OrderedProbit¶

Uses the standard normal CDF: \(F(z) = \Phi(z)\)

from panelbox.models.discrete.ordered import OrderedProbit

model = OrderedProbit(endog=y, exog=X, groups=entity, time=time)
model.fit(method="BFGS")

print("Coefficients:", model.beta)
print("Cutpoints:", model.cutpoints)
print(model.summary())

RandomEffectsOrderedLogit¶

Extends the ordered logit with individual random effects \(\alpha_i \sim N(0, \sigma^2_\alpha)\):

\[y^*_{it} = X_{it}'\beta + \alpha_i + \varepsilon_{it}\]

The marginal likelihood integrates out \(\alpha_i\) using Gauss-Hermite quadrature:

\[L_i = \int \prod_{t=1}^{T_i} P(y_{it} \mid X_{it}, \alpha_i) \, \phi(\alpha_i / \sigma_\alpha) \, d\alpha_i\]

from panelbox.models.discrete.ordered import RandomEffectsOrderedLogit

model = RandomEffectsOrderedLogit(
    endog=y, exog=X, groups=entity, time=time,
    quadrature_points=12
)
model.fit(method="BFGS", maxiter=1000)

print("Coefficients:", model.beta)
print("Cutpoints:", model.cutpoints)
print(f"sigma_alpha: {model.sigma_alpha:.4f}")
print(model.summary())

Interpreting Results¶

Coefficients in ordered choice models indicate the direction of the effect on the latent variable \(y^*\), but not directly the magnitude of the effect on category probabilities:

Positive \(\beta_k\): increases \(X_{it}'\beta\), shifting probability mass toward higher categories
Negative \(\beta_k\): shifts probability mass toward lower categories
Cutpoints define the boundaries between categories on the latent scale

Marginal Effects Are Essential

A positive coefficient shifts mass to higher categories but can decrease the probability of intermediate categories. Always compute marginal effects for proper interpretation. See Marginal Effects for details.

Cutpoint Parameterization¶

PanelBox uses an exponential parameterization to enforce \(\kappa_0 < \kappa_1 < \cdots < \kappa_{J-2}\):

\[\kappa_0 = \gamma_0, \quad \kappa_j = \kappa_{j-1} + \exp(\gamma_j) \quad \text{for } j > 0\]

This ensures strictly ordered cutpoints without constrained optimization. The parameters \(\gamma_j\) are unconstrained and estimated via MLE.

Configuration Options¶

OrderedLogit / OrderedProbit¶

Parameter	Type	Default	Description
`endog`	`ndarray`	required	Ordinal dependent variable
`exog`	`ndarray`	required	Exogenous variables (no constant)
`groups`	`ndarray`	required	Entity identifiers
`time`	`ndarray`	`None`	Time period identifiers
`n_categories`	`int`	`None`	Number of categories (inferred if `None`)

fit() parameters:

Parameter	Type	Default	Description
`start_params`	`ndarray`	`None`	Starting values (auto-computed if `None`)
`method`	`str`	`"BFGS"`	Optimization method
`maxiter`	`int`	`1000`	Maximum iterations

RandomEffectsOrderedLogit¶

All parameters from OrderedLogit plus:

Parameter	Type	Default	Description
`quadrature_points`	`int`	`12`	Gauss-Hermite quadrature nodes

Result Attributes¶

Attribute	Type	Description
`params`	`ndarray`	Full parameter vector \([\beta; \gamma]\)
`beta`	`ndarray`	Slope coefficients
`cutpoints`	`ndarray`	Ordered threshold values \(\kappa_0 < \kappa_1 < \cdots\)
`llf`	`float`	Log-likelihood at maximum
`converged`	`bool`	Convergence flag
`n_iter`	`int`	Number of iterations
`bse`	`ndarray`	Standard errors
`cov_params`	`ndarray`	Variance-covariance matrix

Additional for RandomEffectsOrderedLogit:

Attribute	Type	Description
`sigma_alpha`	`float`	Random effects standard deviation

Diagnostics¶

Goodness of Fit¶

# Log-likelihood comparison
print(f"Log-likelihood: {model.llf:.3f}")

# Predicted vs actual categories
predicted = model.predict(type="category")
accuracy = np.mean(predicted == y)
print(f"Classification accuracy: {accuracy:.3f}")

Comparing Logit and Probit¶

Results from ordered logit and ordered probit are typically similar after rescaling. The logistic distribution has variance \(\pi^2/3 \approx 3.29\), while the standard normal has variance 1. Therefore, probit coefficients should be approximately \(\beta_{logit} / 1.81\) compared to logit coefficients.

from panelbox.models.discrete.ordered import OrderedLogit, OrderedProbit

ologit = OrderedLogit(endog=y, exog=X, groups=entity, time=time)
ologit.fit()

oprobit = OrderedProbit(endog=y, exog=X, groups=entity, time=time)
oprobit.fit()

# Approximate rescaling
print("Logit coefficients:", ologit.beta)
print("Probit coefficients:", oprobit.beta)
print("Logit / 1.81:", ologit.beta / 1.81)  # Should be close to probit

Tutorials¶

Tutorial	Description	Link
Discrete Choice Models	Full guide including ordered models

References¶

McKelvey, R. D. and Zavoina, W. (1975). "A Statistical Model for the Analysis of Ordinal Level Dependent Variables." Journal of Mathematical Sociology, 4(1), 103-120.
Greene, W. H. and Hensher, D. A. (2010). Modeling Ordered Choices: A Primer. Cambridge University Press.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. 2^nd ed. MIT Press. Chapter 15.
Brant, R. (1990). "Assessing Proportionality in the Proportional Odds Model for Ordinal Logistic Regression." Biometrics, 46(4), 1171-1178.

Ordered Choice Models¶

Overview¶

Quick Example¶

When to Use¶

Detailed Guide¶

Data Preparation¶

OrderedLogit¶

OrderedProbit¶

RandomEffectsOrderedLogit¶

Interpreting Results¶

Cutpoint Parameterization¶

Configuration Options¶

OrderedLogit / OrderedProbit¶

RandomEffectsOrderedLogit¶

Result Attributes¶

Diagnostics¶

Goodness of Fit¶

Comparing Logit and Probit¶

Tutorials¶

See Also¶

References¶