Production and Cost Frontiers¶

Quick Reference

Class: panelbox.frontier.StochasticFrontier Import: from panelbox.frontier import StochasticFrontier Stata equivalent: sfcross / sfpanel R equivalent: frontier::sfa()

Overview¶

Stochastic Frontier Analysis (SFA), introduced independently by Aigner, Lovell & Schmidt (1977) and Meeusen & van den Broeck (1977), is a parametric method for estimating the maximum achievable output (production frontier) or minimum achievable cost (cost frontier) and measuring the distance of each observation from that frontier.

Unlike ordinary regression, SFA decomposes the error term into two components: symmetric random noise \(v\) and one-sided inefficiency \(u \geq 0\). This allows researchers to separate genuine stochastic shocks from systematic underperformance, producing observation-specific efficiency scores.

PanelBox implements SFA with four distributional assumptions for the inefficiency term, full maximum likelihood estimation, and three efficiency estimators with confidence intervals. Both production and cost frontier specifications are supported through the unified StochasticFrontier class.

Quick Example¶

from panelbox.frontier import StochasticFrontier

# Production frontier with half-normal inefficiency
model = StochasticFrontier(
    data=df,
    depvar="log_output",
    exog=["log_labor", "log_capital"],
    frontier="production",
    dist="half_normal",
)
results = model.fit()

# Print results
print(results.summary())

# Get efficiency scores (Battese-Coelli estimator)
eff = results.efficiency(estimator="bc")
print(f"Mean efficiency: {results.mean_efficiency:.4f}")
print(eff.describe())

When to Use¶

You want to measure how close firms/units operate relative to a best-practice frontier
You need observation-specific efficiency scores rather than just average performance
Your data exhibits one-sided deviations from the frontier (not just symmetric noise)
You want to decompose total variation into noise (random shocks) and inefficiency (systematic underperformance)
You need to test whether inefficiency is statistically significant vs. OLS

Key Assumptions

The frontier function is correctly specified (e.g., Cobb-Douglas, Translog)
The inefficiency term \(u\) follows a known distribution (half-normal, exponential, truncated-normal, or gamma)
The noise term \(v\) is normally distributed: \(v \sim N(0, \sigma_v^2)\)
Inefficiency \(u\) is independent of regressors \(X\) and noise \(v\)
The dependent variable is in logarithms (for multiplicative frontier)

Detailed Guide¶

The Stochastic Frontier Model¶

Production Frontier¶

The production frontier measures maximum feasible output given inputs:

\[y_{it} = X_{it}'\beta + v_{it} - u_{it}\]

where:

\(y_{it}\) is log output for unit \(i\) at time \(t\)
\(X_{it}'\beta\) is the deterministic frontier (maximum achievable output)
\(v_{it} \sim N(0, \sigma_v^2)\) is symmetric random noise (weather, measurement error, luck)
\(u_{it} \geq 0\) is technical inefficiency (one-sided, non-negative)

The composed error is \(\varepsilon_{it} = v_{it} - u_{it}\). Since \(u \geq 0\), the composed error is negatively skewed for production frontiers.

Technical Efficiency:

\[TE_i = \frac{\text{observed output}}{\text{frontier output}} = \exp(-u_i) \in (0, 1]\]

A score of \(TE = 0.85\) means the firm produces 85% of its maximum feasible output.

Cost Frontier¶

The cost frontier measures minimum feasible cost given outputs and input prices:

\[y_{it} = X_{it}'\beta + v_{it} + u_{it}\]

where:

\(y_{it}\) is log cost
\(X_{it}'\beta\) is the minimum cost frontier
\(u_{it} \geq 0\) increases cost above the minimum (cost inefficiency)

The composed error is \(\varepsilon_{it} = v_{it} + u_{it}\). Since \(u \geq 0\), the composed error is positively skewed for cost frontiers.

Cost Efficiency:

\[CE_i = \exp(-u_i) \in (0, 1]\]

A score of \(CE = 0.80\) means the firm could reduce costs by 20% without changing output.

Distributional Assumptions¶

PanelBox supports four distributions for the inefficiency term \(u\):

Distribution	API Value	Parameters	Characteristics
Half-Normal	`"half_normal"`	\(\sigma_u\)	Simplest; mode at zero; one parameter
Exponential	`"exponential"`	\(\sigma_u\)	Monotonically decreasing density; one parameter
Truncated Normal	`"truncated_normal"`	\(\mu, \sigma_u\)	Flexible mode location; two parameters
Gamma	`"gamma"`	\(P, \theta\)	Most flexible shape; two parameters

Half-NormalExponentialTruncated NormalGamma

\(u \sim N^+(0, \sigma_u^2)\)

The simplest specification. The mode of inefficiency is at zero, implying most firms are relatively efficient with a declining tail of highly inefficient firms. Recommended as a starting point.

model = StochasticFrontier(
    data=df, depvar="log_output",
    exog=["log_labor", "log_capital"],
    frontier="production", dist="half_normal",
)

\(u \sim \text{Exp}(\sigma_u)\)

Also has mode at zero with a monotonically decreasing density. Produces slightly different efficiency rankings than half-normal. Useful as a robustness check.

model = StochasticFrontier(
    data=df, depvar="log_output",
    exog=["log_labor", "log_capital"],
    frontier="production", dist="exponential",
)

\(u \sim N^+(\mu, \sigma_u^2)\)

Allows the mode of inefficiency to be non-zero. When \(\mu > 0\), most firms have moderate inefficiency. Nests half-normal as a special case (\(\mu = 0\)). Requires truncated-normal for BC95 models with inefficiency determinants.

model = StochasticFrontier(
    data=df, depvar="log_output",
    exog=["log_labor", "log_capital"],
    frontier="production", dist="truncated_normal",
)

\(u \sim \text{Gamma}(P, \theta)\)

Most flexible shape with two parameters. Can produce a wide range of density shapes. Computationally more intensive, especially for panel models. Nests exponential as a special case (\(P = 1\)).

model = StochasticFrontier(
    data=df, depvar="log_output",
    exog=["log_labor", "log_capital"],
    frontier="production", dist="gamma",
)

Efficiency Estimation¶

After fitting the model, PanelBox provides three methods to estimate observation-specific inefficiency/efficiency:

Estimator	API Value	Formula	Returns
JLMS	`"jlms"`	\(E[u \mid \varepsilon]\)	Point estimate of inefficiency
Battese-Coelli	`"bc"`	\(E[\exp(-u) \mid \varepsilon]\)	Efficiency score directly
Mode	`"mode"`	\(M[u \mid \varepsilon]\)	Modal inefficiency estimate

The BC estimator (Battese & Coelli, 1988) is the default and recommended choice because it directly estimates efficiency rather than inefficiency, avoiding Jensen's inequality bias.

# Battese-Coelli estimator (recommended)
eff_bc = results.efficiency(estimator="bc")

# JLMS estimator (Jondrow et al. 1982)
eff_jlms = results.efficiency(estimator="jlms")

# Modal estimator
eff_mode = results.efficiency(estimator="mode")

# Each returns a DataFrame with columns:
# inefficiency, efficiency, ci_lower, ci_upper
print(eff_bc.head())

Confidence intervals follow the Horrace & Schmidt (1996) method, which accounts for the truncated nature of the conditional distribution.

Complete Example¶

from panelbox.frontier import StochasticFrontier

# --- Production frontier ---
sf_prod = StochasticFrontier(
    data=df,
    depvar="log_output",
    exog=["log_labor", "log_capital"],
    frontier="production",
    dist="half_normal",
)
result_prod = sf_prod.fit()

# Summary with diagnostics
print(result_prod.summary())

# Variance components
print(f"sigma_v (noise):        {result_prod.sigma_v:.4f}")
print(f"sigma_u (inefficiency): {result_prod.sigma_u:.4f}")
print(f"lambda (sigma_u/sigma_v): {result_prod.lambda_param:.4f}")
print(f"gamma (share of inefficiency): {result_prod.gamma:.4f}")

# Efficiency scores
eff = result_prod.efficiency(estimator="bc")
print(f"\nMean TE: {result_prod.mean_efficiency:.4f}")
print(eff.describe())

# --- Cost frontier ---
sf_cost = StochasticFrontier(
    data=df,
    depvar="log_cost",
    exog=["log_output", "log_price_labor", "log_price_capital"],
    frontier="cost",
    dist="exponential",
)
result_cost = sf_cost.fit()
print(result_cost.summary())

# Cost efficiency
eff_cost = result_cost.efficiency(estimator="bc")
print(f"Mean CE: {result_cost.mean_efficiency:.4f}")

Configuration Options¶

StochasticFrontier Parameters¶

Parameter	Type	Default	Description
`data`	`DataFrame`	required	DataFrame with all variables
`depvar`	`str`	required	Dependent variable name (in logs)
`exog`	`list[str]`	required	List of exogenous variable names
`entity`	`str`	`None`	Entity identifier (for panel data)
`time`	`str`	`None`	Time identifier (for panel data)
`frontier`	`str`	`"production"`	`"production"` or `"cost"`
`dist`	`str`	`"half_normal"`	`"half_normal"`, `"exponential"`, `"truncated_normal"`, `"gamma"`
`inefficiency_vars`	`list[str]`	`None`	Variables affecting mean inefficiency (BC95)
`het_vars`	`list[str]`	`None`	Variables affecting variance of inefficiency (Wang 2002)
`model_type`	`str`	auto	Panel model type (auto-detected from entity/time)
`css_time_trend`	`str`	`None`	CSS time trend: `"none"`, `"linear"`, `"quadratic"`

fit() Parameters¶

Parameter	Type	Default	Description
`method`	`str`	`"mle"`	Estimation method (only `"mle"` supported)
`start_params`	`ndarray`	`None`	Initial parameter values (auto-computed)
`optimizer`	`str`	`"L-BFGS-B"`	Optimizer: `"L-BFGS-B"`, `"Newton-CG"`, `"BFGS"`
`maxiter`	`int`	`1000`	Maximum iterations
`tol`	`float`	`1e-8`	Convergence tolerance
`grid_search`	`bool`	`False`	Use grid search for starting values
`verbose`	`bool`	`False`	Print optimization progress

Result Attributes¶

Attribute	Type	Description
`params`	`Series`	Parameter estimates
`se`	`Series`	Standard errors
`tvalues`	`Series`	t-statistics
`pvalues`	`Series`	p-values
`loglik`	`float`	Log-likelihood
`aic`	`float`	Akaike Information Criterion
`bic`	`float`	Bayesian Information Criterion
`converged`	`bool`	Convergence flag
`sigma_v`	`float`	Noise standard deviation
`sigma_u`	`float`	Inefficiency standard deviation
`lambda_param`	`float`	\(\lambda = \sigma_u / \sigma_v\)
`gamma`	`float`	\(\gamma = \sigma_u^2 / (\sigma_v^2 + \sigma_u^2)\)
`mean_efficiency`	`float`	Mean efficiency (BC estimator)
`residuals`	`ndarray`	Frontier residuals \(\varepsilon = y - X'\beta\)

Diagnostics¶

Variance Decomposition¶

The parameter \(\gamma = \sigma_u^2 / (\sigma_v^2 + \sigma_u^2)\) measures the share of total variance due to inefficiency:

var_decomp = results.variance_decomposition(ci_level=0.95, method="delta")

print(f"gamma: {var_decomp['gamma']:.4f}")
print(f"95% CI: [{var_decomp['gamma_ci'][0]:.4f}, {var_decomp['gamma_ci'][1]:.4f}]")
print(f"lambda: {var_decomp['lambda_param']:.4f}")
print(var_decomp['interpretation'])

\(\gamma\) Range	Interpretation
\(\gamma < 0.1\)	Inefficiency negligible; OLS may be adequate
\(0.1 \leq \gamma \leq 0.3\)	Inefficiency minor but present
\(0.3 < \gamma < 0.7\)	Both noise and inefficiency important
\(0.7 \leq \gamma \leq 0.9\)	Inefficiency dominant
\(\gamma > 0.9\)	Nearly deterministic frontier; check specification

Returns to Scale¶

rts = results.returns_to_scale_test(
    input_vars=["log_labor", "log_capital"],
    alpha=0.05,
)
print(f"RTS = {rts['rts']:.4f} ({rts['conclusion']})")
print(f"p-value: {rts['pvalue']:.4f}")

Distribution Comparison¶

comparison = results.compare_distributions(
    distributions=["half_normal", "exponential", "truncated_normal"]
)
print(comparison[["Distribution", "AIC", "BIC", "Mean Efficiency"]])

Bootstrap Confidence Intervals¶

# Bootstrap CIs for parameters
boot_params = results.bootstrap(n_boot=999, seed=42)
print(boot_params[["parameter", "estimate", "ci_lower", "ci_upper"]])

# Bootstrap CIs for efficiency scores
boot_eff = results.bootstrap_efficiency(n_boot=999, seed=42)
print(boot_eff[["te", "ci_lower", "ci_upper"]].describe())

Tutorials¶

Tutorial	Description	Link
SFA Fundamentals	Basic SFA estimation and interpretation
Production vs Cost	Side-by-side comparison of frontier types

References¶

Aigner, D., Lovell, C. K., & Schmidt, P. (1977). Formulation and estimation of stochastic frontier production function models. Journal of Econometrics, 6(1), 21-37.
Meeusen, W., & van den Broeck, J. (1977). Efficiency estimation from Cobb-Douglas production functions with composed error. International Economic Review, 435-444.
Jondrow, J., Lovell, C. K., Materov, I. S., & Schmidt, P. (1982). On the estimation of technical inefficiency in the stochastic frontier production function model. Journal of Econometrics, 19(2-3), 233-238.
Battese, G. E., & Coelli, T. J. (1988). Prediction of firm-level technical efficiencies with a generalized frontier production function and panel data. Journal of Econometrics, 38(3), 387-399.
Horrace, W. C., & Schmidt, P. (1996). Confidence statements for efficiency estimates from stochastic frontier models. Journal of Productivity Analysis, 7(2-3), 257-282.
Greene, W. H. (1990). A gamma-distributed stochastic frontier model. Journal of Econometrics, 46(1-2), 141-163.
Kumbhakar, S. C., & Lovell, C. A. K. (2000). Stochastic Frontier Analysis. Cambridge University Press.