Canay Two-Step Estimator¶
Quick Reference
Class: panelbox.models.quantile.canay.CanayTwoStep
Import: from panelbox.models.quantile import CanayTwoStep
Stata equivalent: xtqreg y x1 x2, fe method(canay)
R equivalent: qrpanel::qregpd(y ~ x1 + x2, method = "cre")
Overview¶
The Canay (2011) Two-Step estimator provides a simple and computationally efficient approach to fixed effects quantile regression for panel data. It avoids the incidental parameters problem by estimating fixed effects in a first step and then removing them before running pooled quantile regression.
The two-step procedure is:
- Step 1 (Within-transformation OLS): Estimate fixed effects \(\hat{\alpha}_i\) via standard FE-OLS regression
- Step 2 (Pooled QR on transformed data): Run pooled quantile regression on \(\tilde{y}_{it} = y_{it} - \hat{\alpha}_i\)
The estimator relies on a key assumption: fixed effects are pure location shifters — they shift the entire conditional distribution by the same amount across all quantiles. Formally:
where \(\alpha_i\) does not depend on \(\tau\). This means individual heterogeneity affects only the level, not the shape, of the conditional distribution.
PanelBox provides a formal test of this assumption via the test_location_shift() method.
Quick Example¶
from panelbox.core.panel_data import PanelData
from panelbox.models.quantile import CanayTwoStep
panel_data = PanelData(data=df, entity_col="firm_id", time_col="year")
model = CanayTwoStep(
data=panel_data,
formula="investment ~ value + capital",
tau=[0.25, 0.5, 0.75],
)
results = model.fit(se_adjustment="two-step")
# Test the location-shift assumption
test = model.test_location_shift(method="wald")
print(f"Location shift test: stat={test.statistic:.3f}, p={test.pvalue:.3f}")
When to Use¶
- Location-shift is plausible: entity heterogeneity affects levels but not distributional shape
- Computational speed: much faster than the Koenker penalty method
- Large panels: works well with large \(N\) and moderate \(T\)
- Quick analysis: useful for initial exploration before using more complex methods
Key Assumptions
- Location shift: fixed effects are pure location shifters (same \(\alpha_i\) for all \(\tau\))
- Large \(T\): the first-step FE-OLS estimator of \(\hat{\alpha}_i\) must be consistent, requiring \(T\) to grow
- Strict exogeneity: \(E[\varepsilon_{it}|X_i, \alpha_i] = 0\)
- Testable: use
test_location_shift()to check the key assumption
When NOT to Use
If the treatment or covariates affect different parts of the distribution differently (e.g., a policy helps low-income workers more than high-income workers), the location-shift assumption is violated. Use the Koenker penalty method or Location-Scale model instead.
Detailed Guide¶
The Two-Step Procedure¶
Step 1: FE-OLS to estimate \(\hat{\alpha}_i\)
Standard within-transformation OLS regression:
The within estimator demeans by entity: \(\ddot{y}_{it} = y_{it} - \bar{y}_i\), then recovers \(\hat{\alpha}_i = \bar{y}_i - \bar{X}_i'\hat{\beta}\).
Step 2: Pooled QR on \(\tilde{y}_{it} = y_{it} - \hat{\alpha}_i\)
This is simply a pooled quantile regression on the "de-fixed-effected" data.
Data Preparation¶
from panelbox.core.panel_data import PanelData
from panelbox.models.quantile import CanayTwoStep
panel_data = PanelData(data=df, entity_col="id", time_col="year")
model = CanayTwoStep(
data=panel_data,
formula="y ~ x1 + x2",
tau=[0.1, 0.25, 0.5, 0.75, 0.9],
)
Estimation¶
results = model.fit(
se_adjustment="two-step", # account for first-step uncertainty
verbose=False,
)
Standard Error Adjustment¶
The two-step nature of the estimator affects inference. Three options are available:
| SE Adjustment | Description | Recommended |
|---|---|---|
"two-step" |
Accounts for estimation error in \(\hat{\alpha}_i\) from step 1 | Yes (default) |
"naive" |
Ignores first-step uncertainty; treats \(\hat{\alpha}_i\) as known | No (understates SEs) |
"bootstrap" |
Block bootstrap over both steps | For robustness checks |
# Recommended: account for two-step uncertainty
results = model.fit(se_adjustment="two-step")
# Bootstrap for robustness
results_boot = model.fit(se_adjustment="bootstrap")
Interpreting Results¶
# Coefficients at each quantile
for tau in model.tau:
r = results.results[tau]
print(f"tau={tau:.2f}: beta = {r.params}, se = {r.std_errors}")
# First-step results
print(f"FE-OLS R²: {model.fe_ols_result_.rsquared:.4f}")
print(f"Fixed effects (first 5): {model.fixed_effects_[:5]}")
# Transformed dependent variable
print(f"y_tilde stats: mean={model.y_transformed_.mean():.4f}")
Testing the Location-Shift Assumption¶
This is the most important diagnostic for the Canay estimator:
# Wald test
test_wald = model.test_location_shift(method="wald")
print(f"Wald test: stat={test_wald.statistic:.3f}, p={test_wald.pvalue:.3f}")
# Kolmogorov-Smirnov test
test_ks = model.test_location_shift(method="ks")
print(f"KS test: stat={test_ks.statistic:.3f}, p={test_ks.pvalue:.3f}")
- \(H_0\): \(\beta(\tau)\) is constant across \(\tau\) (location shift holds)
- \(H_1\): \(\beta(\tau)\) varies with \(\tau\) (location shift violated)
- Decision: if \(p < 0.05\), the location-shift assumption is rejected
Interpreting the Test
Rejection means the Canay estimator may be inconsistent. Consider using the Koenker penalty method or Location-Scale model instead. Non-rejection does not prove the assumption holds — it may simply lack power.
Comparison with Penalty Method¶
# Direct comparison
comparison = model.compare_with_penalty_method(tau=0.5, lambda_fe="auto")
# The comparison dict contains:
# - Coefficient estimates from both methods
# - Computation times
# - Maximum absolute difference in coefficients
If results differ substantially, the location-shift assumption is likely violated.
Configuration Options¶
| Parameter | Type | Default | Description |
|---|---|---|---|
data |
PanelData | required | Panel data object |
formula |
str | None |
Model formula "y ~ x1 + x2" |
tau |
float/list | 0.5 |
Quantile level(s) in \((0, 1)\) |
Fit Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
se_adjustment |
str | "two-step" |
SE method: "two-step", "naive", "bootstrap" |
verbose |
bool | False |
Print estimation progress |
Result Attributes¶
| Attribute | Description |
|---|---|
results |
Dict mapping \(\tau \to\) result objects |
fixed_effects_ |
Estimated entity fixed effects from step 1 |
fe_ols_result_ |
Full step-1 FE-OLS results |
y_transformed_ |
Transformed dependent variable \(\tilde{y}_{it}\) |
Test Methods¶
| Method | Signature | Description |
|---|---|---|
test_location_shift() |
(tau_grid=None, method="wald") |
Test \(H_0\): location shift holds |
compare_with_penalty_method() |
(tau=0.5, lambda_fe="auto") |
Compare Canay vs Koenker |
Tutorials¶
| Tutorial | Description | Link |
|---|---|---|
| Canay Two-Step | Step-by-step estimation and assumption testing | |
| FE QR Comparison | Canay vs Koenker penalty method |
See Also¶
- Pooled Quantile Regression — baseline without fixed effects (step 2 of Canay)
- Fixed Effects Quantile Regression — Koenker penalty method (no location-shift assumption)
- Location-Scale Model — alternative FE approach with non-crossing guarantee
- Diagnostics — additional diagnostic tests
References¶
- Canay, I. A. (2011). A simple approach to quantile regression for panel data. The Econometrics Journal, 14(3), 368-386.
- Koenker, R. (2004). Quantile regression for longitudinal data. Journal of Multivariate Analysis, 91(1), 74-89.
- Abrevaya, J., & Dahl, C. M. (2008). The effects of birth inputs on birthweight: Evidence from quantile estimation on panel data. Journal of Business & Economic Statistics, 26(4), 379-397.