Canay Two-Step Estimator¶

Quick Reference

Class: panelbox.models.quantile.canay.CanayTwoStep Import: from panelbox.models.quantile import CanayTwoStep Stata equivalent: xtqreg y x1 x2, fe method(canay) R equivalent: qrpanel::qregpd(y ~ x1 + x2, method = "cre")

Overview¶

The Canay (2011) Two-Step estimator provides a simple and computationally efficient approach to fixed effects quantile regression for panel data. It avoids the incidental parameters problem by estimating fixed effects in a first step and then removing them before running pooled quantile regression.

The two-step procedure is:

Step 1 (Within-transformation OLS): Estimate fixed effects \(\hat{\alpha}_i\) via standard FE-OLS regression
Step 2 (Pooled QR on transformed data): Run pooled quantile regression on \(\tilde{y}_{it} = y_{it} - \hat{\alpha}_i\)

The estimator relies on a key assumption: fixed effects are pure location shifters — they shift the entire conditional distribution by the same amount across all quantiles. Formally:

\[Q_\tau(y_{it} | X_{it}, \alpha_i) = X_{it}'\beta_\tau + \alpha_i \quad \forall \tau\]

where \(\alpha_i\) does not depend on \(\tau\). This means individual heterogeneity affects only the level, not the shape, of the conditional distribution.

PanelBox provides a formal test of this assumption via the test_location_shift() method.

Quick Example¶

from panelbox.core.panel_data import PanelData
from panelbox.models.quantile import CanayTwoStep

panel_data = PanelData(data=df, entity_col="firm_id", time_col="year")

model = CanayTwoStep(
    data=panel_data,
    formula="investment ~ value + capital",
    tau=[0.25, 0.5, 0.75],
)
results = model.fit(se_adjustment="two-step")

# Test the location-shift assumption
test = model.test_location_shift(method="wald")
print(f"Location shift test: stat={test.statistic:.3f}, p={test.pvalue:.3f}")

When to Use¶

Location-shift is plausible: entity heterogeneity affects levels but not distributional shape
Computational speed: much faster than the Koenker penalty method
Large panels: works well with large \(N\) and moderate \(T\)
Quick analysis: useful for initial exploration before using more complex methods

Key Assumptions

Location shift: fixed effects are pure location shifters (same \(\alpha_i\) for all \(\tau\))
Large \(T\): the first-step FE-OLS estimator of \(\hat{\alpha}_i\) must be consistent, requiring \(T\) to grow
Strict exogeneity: \(E[\varepsilon_{it}|X_i, \alpha_i] = 0\)
Testable: use test_location_shift() to check the key assumption

When NOT to Use

If the treatment or covariates affect different parts of the distribution differently (e.g., a policy helps low-income workers more than high-income workers), the location-shift assumption is violated. Use the Koenker penalty method or Location-Scale model instead.

Detailed Guide¶

The Two-Step Procedure¶

Step 1: FE-OLS to estimate \(\hat{\alpha}_i\)

Standard within-transformation OLS regression:

\[y_{it} = X_{it}'\beta + \alpha_i + \varepsilon_{it}\]

The within estimator demeans by entity: \(\ddot{y}_{it} = y_{it} - \bar{y}_i\), then recovers \(\hat{\alpha}_i = \bar{y}_i - \bar{X}_i'\hat{\beta}\).

Step 2: Pooled QR on \(\tilde{y}_{it} = y_{it} - \hat{\alpha}_i\)

\[\min_{\beta_\tau} \sum_{i=1}^{N}\sum_{t=1}^{T} \rho_\tau(\tilde{y}_{it} - X_{it}'\beta_\tau)\]

This is simply a pooled quantile regression on the "de-fixed-effected" data.

Data Preparation¶

from panelbox.core.panel_data import PanelData
from panelbox.models.quantile import CanayTwoStep

panel_data = PanelData(data=df, entity_col="id", time_col="year")

model = CanayTwoStep(
    data=panel_data,
    formula="y ~ x1 + x2",
    tau=[0.1, 0.25, 0.5, 0.75, 0.9],
)

Estimation¶

results = model.fit(
    se_adjustment="two-step",  # account for first-step uncertainty
    verbose=False,
)

Standard Error Adjustment¶

The two-step nature of the estimator affects inference. Three options are available:

SE Adjustment	Description	Recommended
`"two-step"`	Accounts for estimation error in \(\hat{\alpha}_i\) from step 1	Yes (default)
`"naive"`	Ignores first-step uncertainty; treats \(\hat{\alpha}_i\) as known	No (understates SEs)
`"bootstrap"`	Block bootstrap over both steps	For robustness checks

# Recommended: account for two-step uncertainty
results = model.fit(se_adjustment="two-step")

# Bootstrap for robustness
results_boot = model.fit(se_adjustment="bootstrap")

Interpreting Results¶

# Coefficients at each quantile
for tau in model.tau:
    r = results.results[tau]
    print(f"tau={tau:.2f}: beta = {r.params}, se = {r.std_errors}")

# First-step results
print(f"FE-OLS R²: {model.fe_ols_result_.rsquared:.4f}")
print(f"Fixed effects (first 5): {model.fixed_effects_[:5]}")

# Transformed dependent variable
print(f"y_tilde stats: mean={model.y_transformed_.mean():.4f}")

Testing the Location-Shift Assumption¶

This is the most important diagnostic for the Canay estimator:

# Wald test
test_wald = model.test_location_shift(method="wald")
print(f"Wald test: stat={test_wald.statistic:.3f}, p={test_wald.pvalue:.3f}")

# Kolmogorov-Smirnov test
test_ks = model.test_location_shift(method="ks")
print(f"KS test: stat={test_ks.statistic:.3f}, p={test_ks.pvalue:.3f}")

\(H_0\): \(\beta(\tau)\) is constant across \(\tau\) (location shift holds)
\(H_1\): \(\beta(\tau)\) varies with \(\tau\) (location shift violated)
Decision: if \(p < 0.05\), the location-shift assumption is rejected

Interpreting the Test

Rejection means the Canay estimator may be inconsistent. Consider using the Koenker penalty method or Location-Scale model instead. Non-rejection does not prove the assumption holds — it may simply lack power.

Comparison with Penalty Method¶

# Direct comparison
comparison = model.compare_with_penalty_method(tau=0.5, lambda_fe="auto")

# The comparison dict contains:
# - Coefficient estimates from both methods
# - Computation times
# - Maximum absolute difference in coefficients

If results differ substantially, the location-shift assumption is likely violated.

Configuration Options¶

Parameter	Type	Default	Description
`data`	PanelData	required	Panel data object
`formula`	str	`None`	Model formula `"y ~ x1 + x2"`
`tau`	float/list	`0.5`	Quantile level(s) in \((0, 1)\)

Fit Parameters¶

Parameter	Type	Default	Description
`se_adjustment`	str	`"two-step"`	SE method: `"two-step"`, `"naive"`, `"bootstrap"`
`verbose`	bool	`False`	Print estimation progress

Result Attributes¶

Attribute	Description
`results`	Dict mapping \(\tau \to\) result objects
`fixed_effects_`	Estimated entity fixed effects from step 1
`fe_ols_result_`	Full step-1 FE-OLS results
`y_transformed_`	Transformed dependent variable \(\tilde{y}_{it}\)

Test Methods¶

Method	Signature	Description
`test_location_shift()`	`(tau_grid=None, method="wald")`	Test \(H_0\): location shift holds
`compare_with_penalty_method()`	`(tau=0.5, lambda_fe="auto")`	Compare Canay vs Koenker

Tutorials¶

Tutorial	Description	Link
Canay Two-Step	Step-by-step estimation and assumption testing
FE QR Comparison	Canay vs Koenker penalty method

References¶

Canay, I. A. (2011). A simple approach to quantile regression for panel data. The Econometrics Journal, 14(3), 368-386.
Koenker, R. (2004). Quantile regression for longitudinal data. Journal of Multivariate Analysis, 91(1), 74-89.
Abrevaya, J., & Dahl, C. M. (2008). The effects of birth inputs on birthweight: Evidence from quantile estimation on panel data. Journal of Business & Economic Statistics, 26(4), 379-397.