Location-Scale Model¶

Quick Reference

Class: panelbox.models.quantile.location_scale.LocationScale Import: from panelbox.models.quantile import LocationScale Stata equivalent: No direct equivalent R equivalent: Partially via Qtools::qlss()

Overview¶

The Location-Scale (LS) quantile regression model, introduced by Machado and Santos Silva (2019), provides a fundamentally different approach to estimating conditional quantiles. Instead of estimating each quantile independently, the LS model represents the entire conditional distribution through two components — location (mean) and scale (variance) — combined with a reference distribution's quantile function.

The conditional quantile is modeled as:

\[Q_y(\tau | X) = X'\alpha + \sqrt{\exp(X'\gamma)} \cdot q(\tau)\]

where:

$X'\alpha$ is the location (conditional mean)
$\sqrt{\exp(X'\gamma)}$ is the scale (conditional standard deviation)
$q(\tau)$ is the quantile function of a chosen reference distribution

This decomposition provides three major advantages:

Non-crossing guarantee: quantile curves cannot cross by construction, since $q(\tau)$ is monotonically increasing and shared across all observations
Computational efficiency: only two regressions (location and scale) are needed, regardless of how many quantiles are requested
Full density estimation: the location-scale structure allows prediction of the complete conditional density, not just specific quantiles

Quick Example¶

from panelbox.core.panel_data import PanelData
from panelbox.models.quantile import LocationScale

panel_data = PanelData(data=df, entity_col="firm_id", time_col="year")

model = LocationScale(
    data=panel_data,
    formula="investment ~ value + capital",
    tau=[0.1, 0.25, 0.5, 0.75, 0.9],
    distribution="normal",
    fixed_effects=True,
)
results = model.fit(robust_scale=True)

# Predict conditional quantiles (guaranteed non-crossing)
quantile_preds = results.predict_quantiles(tau=[0.1, 0.5, 0.9])

# Predict conditional density
y_grid, density = results.predict_density(n_points=100)

When to Use¶

Non-crossing is critical: applications where quantile crossing would be problematic (e.g., risk management, forecasting)
Full distribution needed: interest in the complete conditional distribution, not just specific quantiles
Computational speed: estimating many quantiles (e.g., 99 percentiles for density estimation)
Panel fixed effects: natural handling of entity heterogeneity
Extrapolation: need quantiles beyond the range observed in the data (e.g., extreme tails)

Key Assumptions

Location-scale structure: the conditional distribution is fully characterized by its mean and variance
Correct reference distribution: the shape of $q(\tau)$ matches the true conditional distribution
Homogeneous distributional shape: all observations share the same distributional form (up to location and scale shifts)
Testable: use test_normality() to check the reference distribution choice

Detailed Guide¶

Theoretical Foundation¶

The LS model decomposes the quantile estimation into two moment conditions:

Step 1: Location estimation (conditional mean)

\[\hat{\alpha} = \arg\min_\alpha \sum_{i,t} (y_{it} - X_{it}'\alpha)^2\]

This is standard OLS (or FE-OLS when fixed_effects=True).

Step 2: Scale estimation (conditional variance)

Using residuals $\hat{\varepsilon}_{it} = y_{it} - X_{it}'\hat{\alpha}$, the scale parameters are estimated from:

\[\log|\hat{\varepsilon}_{it}| = \frac{X_{it}'\gamma}{2} + \text{adjustment} + v_{it}\]

The robust approach (default) uses log-absolute residuals, which is more stable than regressing on squared residuals.

Step 3: Quantile coefficients

For any quantile $\tau$:

\[\hat{\beta}(\tau) = \hat{\alpha} + \sqrt{\exp(\hat{\gamma})} \odot q(\tau)\]

where $q(\tau)$ is the quantile function of the reference distribution.

Reference Distributions¶

PanelBox supports four built-in reference distributions:

Distribution	$q(\tau)$	Tails	Best For
`"normal"`	$\Phi^{-1}(\tau)$	Light	General purpose (default)
`"logistic"`	$\log[\tau/(1-\tau)]$	Medium	Heavier-tailed data
`"t"`	$t_\nu^{-1}(\tau)$	Heavy (adjustable)	Financial data, outliers
`"laplace"`	$-\text{sign}(\tau-0.5)\log(1-2	\tau-0.5	)$

You can also provide a custom callable:

# Custom quantile function
import scipy.stats as stats

model = LocationScale(
    data=panel_data,
    formula="y ~ x1 + x2",
    tau=[0.1, 0.5, 0.9],
    distribution=lambda tau: stats.gennorm.ppf(tau, beta=1.5),
)

Data Preparation¶

from panelbox.core.panel_data import PanelData
from panelbox.models.quantile import LocationScale

panel_data = PanelData(data=df, entity_col="id", time_col="year")

model = LocationScale(
    data=panel_data,
    formula="y ~ x1 + x2",
    tau=[0.1, 0.25, 0.5, 0.75, 0.9],
    distribution="normal",      # reference distribution
    fixed_effects=True,          # include entity FE
    df_t=5,                      # degrees of freedom (for 't' distribution)
)

Estimation¶

results = model.fit(
    robust_scale=True,   # use log-transformation for scale (recommended)
    verbose=True,        # print step-by-step progress
)

The fit returns a LocationScaleResult containing:

Location and scale regression results
Quantile coefficients for all requested $\tau$
Methods for prediction and diagnostics

Interpreting Results¶

# Location parameters (conditional mean effects)
print("Location params:", results.model.location_params_)

# Scale parameters (conditional variance effects)
print("Scale params:", results.model.scale_params_)

# Quantile-specific coefficients
for tau in [0.1, 0.25, 0.5, 0.75, 0.9]:
    r = results.results[tau]
    print(f"tau={tau:.2f}: beta = {r.params}")

# Location and scale from individual results
r_50 = results.results[0.5]
print(f"Location: {r_50.location_params}")
print(f"Scale: {r_50.scale_params}")

Key insight: if the scale parameter $\gamma_j$ for covariate $x_j$ is significantly different from zero, then $x_j$ affects not just the level but also the spread of $y$. This is a direct test for heteroskedasticity and distributional effects.

Predicting Conditional Quantiles¶

# Predict quantiles (guaranteed non-crossing)
quantile_df = results.predict_quantiles(
    X=None,                  # use estimation sample (or provide new data)
    tau=[0.1, 0.25, 0.5, 0.75, 0.9],
    ci=True,                  # include confidence intervals
    alpha=0.05,               # significance level
)
print(quantile_df.head())

Predicting Conditional Density¶

A unique feature of the LS model: estimate the full conditional density $f(y|X)$.

y_grid, density = results.predict_density(
    X=None,          # mean of covariates (default)
    y_grid=None,     # automatic grid
    n_points=100,    # grid resolution
)

# Plot the density
import matplotlib.pyplot as plt
plt.plot(y_grid, density)
plt.xlabel("y")
plt.ylabel("f(y|X)")
plt.title("Conditional Density Estimate")

Testing the Reference Distribution¶

# Test if the normal distribution is appropriate
normality = results.test_normality()
print(f"Normality test: stat={normality.statistic:.3f}, p={normality.pvalue:.3f}")

# If rejected, try alternative distributions
for dist in ["normal", "logistic", "t", "laplace"]:
    model_d = LocationScale(data=panel_data, formula="y ~ x1 + x2",
                             tau=0.5, distribution=dist)
    res_d = model_d.fit()
    norm_test = res_d.test_normality()
    print(f"{dist:10s}: test stat = {norm_test.statistic:.3f}, p = {norm_test.pvalue:.3f}")

Fixed Effects¶

The LS model naturally accommodates entity fixed effects:

# With fixed effects
model_fe = LocationScale(
    data=panel_data,
    formula="y ~ x1 + x2",
    tau=[0.1, 0.5, 0.9],
    distribution="normal",
    fixed_effects=True,    # within-transformation for location and scale
)
results_fe = model_fe.fit()

With fixed_effects=True:

Location step: uses within-transformation (FE-OLS)
Scale step: applies the same within-transformation to log-residuals
Non-crossing property is preserved because the transformation is linear

Configuration Options¶

Parameter	Type	Default	Description
`data`	PanelData	required	Panel data object
`formula`	str	`None`	Model formula `"y ~ x1 + x2"`
`tau`	float/array	`0.5`	Quantile level(s) in $(0, 1)$
`distribution`	str/callable	`"normal"`	Reference distribution: `"normal"`, `"logistic"`, `"t"`, `"laplace"`
`fixed_effects`	bool	`False`	Include entity fixed effects
`df_t`	float	`5`	Degrees of freedom for Student's $t$ distribution

Fit Parameters¶

Parameter	Type	Default	Description
`robust_scale`	bool	`True`	Use robust (log-transformation) scale estimation
`verbose`	bool	`False`	Print estimation progress

Result Attributes¶

Attribute	Description
`results`	Dict mapping $\tau \to$ `LocationScaleQuantileResult`
`location_result`	Step 1 (location) regression results
`scale_result`	Step 2 (scale) regression results
`model.location_params_`	Location parameter estimates $\hat{\alpha}$
`model.scale_params_`	Scale parameter estimates $\hat{\gamma}$

Result Methods¶

Method	Description
`predict_quantiles(X, tau, ci, alpha)`	Predict conditional quantiles
`predict_density(X, y_grid, n_points)`	Estimate conditional density
`test_normality(tau_grid)`	Test reference distribution fit

Comparison with Standard Quantile Regression¶

Feature	Standard QR	Location-Scale
Non-crossing	Not guaranteed	Guaranteed by construction
Computation	One optimization per $\tau$	Two regressions total
Flexibility	Fully nonparametric in $\tau$	Constrained by reference distribution
Density estimation	Not directly available	Available via `predict_density()`
Fixed effects	Incidental parameters problem	Natural via within-transformation
Extreme quantiles	Unreliable near boundaries	Extrapolation via $q(\tau)$

Tutorials¶

Tutorial	Description	Link
Location-Scale Model	Complete LS workflow with density estimation
Distribution Selection	Comparing reference distributions

References¶

Machado, J. A., & Santos Silva, J. M. C. (2019). Quantiles via moments. Journal of Econometrics, 213(1), 145-173.
Koenker, R. (2005). Quantile Regression. Cambridge University Press.
Chernozhukov, V., Fernandez-Val, I., & Melly, B. (2013). Inference on counterfactual distributions. Econometrica, 81(6), 2205-2268.
He, X. (1997). Quantile curves without crossing. The American Statistician, 51(2), 186-192.

Distribution	\(q(\tau)\)	Tails	Best For
`"normal"`	\(\Phi^{-1}(\tau)\)	Light	General purpose (default)
`"logistic"`	\(\log[\tau/(1-\tau)]\)	Medium	Heavier-tailed data
`"t"`	\(t_\nu^{-1}(\tau)\)	Heavy (adjustable)	Financial data, outliers
`"laplace"`	$-\text{sign}(\tau-0.5)\log(1-2	\tau-0.5	)$

Attribute	Description
`results`	Dict mapping \(\tau \to\) `LocationScaleQuantileResult`
`location_result`	Step 1 (location) regression results
`scale_result`	Step 2 (scale) regression results
`model.location_params_`	Location parameter estimates \(\hat{\alpha}\)
`model.scale_params_`	Scale parameter estimates \(\hat{\gamma}\)