Four-Component SFA¶

Quick Reference

Class: panelbox.frontier.advanced.FourComponentSFA Import: from panelbox.frontier.advanced import FourComponentSFA Stata equivalent: sfpanel (user-written extensions) R equivalent: sfaR::sfacross() (partial support)

Unique in Python

PanelBox is the only Python library that implements the four-component SFA model. This model is available in Stata through user-written extensions and partially in R through the sfaR package, but PanelBox provides the most complete Python implementation.

Overview¶

Standard SFA models estimate a single inefficiency term \(u_{it}\) that combines all sources of underperformance. However, in practice, inefficiency has distinct components with different policy implications:

Persistent inefficiency (\(\eta_i\)): structural, long-run inefficiency due to organizational design, corporate culture, or institutional constraints. This component is time-invariant and requires fundamental reforms to address.
Transient inefficiency (\(u_{it}\)): short-run, time-varying inefficiency due to temporary managerial decisions, demand shocks, or operational disruptions. This component can be addressed through day-to-day management improvements.

The four-component model (Kumbhakar, Lien & Hardaker, 2014; Colombi et al., 2014) decomposes the error into four parts, providing a richer picture of firm performance and more targeted policy recommendations.

Quick Example¶

from panelbox.frontier.advanced import FourComponentSFA

model = FourComponentSFA(
    data=panel_df,
    depvar="log_output",
    exog=["log_labor", "log_capital"],
    entity="firm_id",
    time="year",
    frontier_type="production",
)
result = model.fit(verbose=True)

# Efficiency decomposition
te_persistent = result.persistent_efficiency()
te_transient = result.transient_efficiency()
te_overall = result.overall_efficiency()

print(f"Mean persistent efficiency: {te_persistent['persistent_efficiency'].mean():.4f}")
print(f"Mean transient efficiency:  {te_transient['transient_efficiency'].mean():.4f}")
print(f"Mean overall efficiency:    {te_overall['overall_efficiency'].mean():.4f}")

When to Use¶

You want to distinguish structural (persistent) from managerial (transient) inefficiency
Policy recommendations need to differentiate between long-run reforms and short-run management improvements
You need to understand why overall efficiency is low -- is it structural or managerial?
Your panel has sufficient time periods (\(T \geq 5\)) for reliable within-entity estimation

Key Assumptions

Requires panel data with both entity and time identifiers
Persistent inefficiency \(\eta_i\) is assumed time-invariant (entity-specific)
Transient inefficiency \(u_{it}\) is assumed half-normally distributed
Random heterogeneity \(\mu_i\) is assumed normally distributed
Sufficient within-entity variation is needed for identification

The Four-Component Model¶

Model Specification¶

\[y_{it} = X_{it}'\beta + \mu_i - \eta_i + v_{it} - u_{it}\]

Component	Symbol	Distribution	Interpretation
Random heterogeneity	\(\mu_i\)	\(N(0, \sigma_\mu^2)\)	Technology differences, unobserved firm characteristics
Persistent inefficiency	\(\eta_i\)	\(N^+(0, \sigma_\eta^2)\)	Structural, long-run underperformance
Random noise	\(v_{it}\)	\(N(0, \sigma_v^2)\)	Weather, measurement error, luck
Transient inefficiency	\(u_{it}\)	\(N^+(0, \sigma_u^2)\)	Short-run managerial inefficiency

Three-Step Estimation¶

The four-component model is estimated sequentially:

Step 1: Within (FE) Estimator

Demean data within entities to estimate \(\beta\) and recover entity-level effects \(\alpha_i\) and time-varying residuals \(\varepsilon_{it}\):

\[y_{it} - \bar{y}_i = (X_{it} - \bar{X}_i)'\beta + (\varepsilon_{it} - \bar{\varepsilon}_i)\]

where \(\alpha_i = \mu_i - \eta_i\) and \(\varepsilon_{it} = v_{it} - u_{it}\).

Step 2: Separate Transient Inefficiency

Apply cross-sectional SFA to the residuals \(\hat{\varepsilon}_{it}\) to separate \(v_{it}\) and \(u_{it}\):

\[\hat{\varepsilon}_{it} = v_{it} - u_{it}\]

Uses half-normal MLE and JLMS estimator for \(\hat{u}_{it}\).

Step 3: Separate Persistent Inefficiency

Apply cross-sectional SFA to the estimated fixed effects \(\hat{\alpha}_i\) to separate \(\mu_i\) and \(\eta_i\):

\[\hat{\alpha}_i = \mu_i - \eta_i\]

Uses half-normal MLE and JLMS estimator for \(\hat{\eta}_i\).

Efficiency Types¶

Persistent efficiency (structural capability):

\[TE_{p,i} = \exp(-\eta_i) \in (0, 1]\]

Transient efficiency (short-run management):

\[TE_{t,it} = \exp(-u_{it}) \in (0, 1]\]

Overall efficiency (combined):

\[TE_{o,it} = TE_{p,i} \times TE_{t,it}\]

Detailed Guide¶

Full Estimation Example¶

from panelbox.frontier.advanced import FourComponentSFA

# Fit the model
model = FourComponentSFA(
    data=panel_df,
    depvar="log_output",
    exog=["log_labor", "log_capital"],
    entity="firm_id",
    time="year",
    frontier_type="production",
)
result = model.fit(verbose=False)

# Print full summary
result.print_summary()

The print_summary() method displays:

Sample information (observations, entities, periods)
Variance components (\(\sigma_v^2, \sigma_u^2, \sigma_\mu^2, \sigma_\eta^2\)) and their shares
Efficiency summary statistics (persistent, transient, overall)

Variance Components¶

# Individual variance components
print(f"sigma_v  (noise):              {result.sigma_v:.4f}")
print(f"sigma_u  (transient ineff.):   {result.sigma_u:.4f}")
print(f"sigma_mu (heterogeneity):      {result.sigma_mu:.4f}")
print(f"sigma_eta (persistent ineff.): {result.sigma_eta:.4f}")

# Variance shares
total_var = (result.sigma_v**2 + result.sigma_u**2
             + result.sigma_mu**2 + result.sigma_eta**2)
print(f"\nVariance shares:")
print(f"  Noise:              {100 * result.sigma_v**2 / total_var:.1f}%")
print(f"  Transient ineff.:   {100 * result.sigma_u**2 / total_var:.1f}%")
print(f"  Heterogeneity:      {100 * result.sigma_mu**2 / total_var:.1f}%")
print(f"  Persistent ineff.:  {100 * result.sigma_eta**2 / total_var:.1f}%")

Efficiency Decomposition¶

# Get all efficiency types
te_p = result.persistent_efficiency()
te_t = result.transient_efficiency()
te_o = result.overall_efficiency()

# Full decomposition table
decomp = result.decomposition()
print(decomp.head())
# Columns: entity, time, mu_i, eta_i, u_it, v_it

# Overall efficiency with both components
print(te_o.head())
# Columns: entity, time, overall_efficiency,
#           persistent_efficiency, transient_efficiency

Bootstrap Confidence Intervals¶

boot = result.bootstrap(
    n_bootstrap=100,
    confidence_level=0.95,
    random_state=42,
    verbose=True,
)

# Persistent efficiency with CIs
pers_ci = boot.persistent_efficiency_ci()
print(pers_ci.head())
# Columns: entity, persistent_efficiency, ci_lower, ci_upper

# Variance component CIs
boot.print_summary()

Policy Implications¶

The decomposition directly informs policy:

Scenario	Persistent TE	Transient TE
High / High	Structurally efficient, well-managed	Maintain current practices
High / Low	Good structure, poor daily management	Management training, operational improvements
Low / High	Structural problems, good management	Institutional reforms, technology upgrades
Low / Low	Both structural and managerial issues	Comprehensive restructuring

Configuration Options¶