Nickell Bias & LSDVC Correction¶
Key Takeaway
Fixed Effects estimation of dynamic panel models is inconsistent for fixed T due to the Nickell (1981) bias. The LSDVC estimator corrects this analytically using Kiviet (1995) bias formulas, with bootstrap inference following Bruno (2005). LSDVC is particularly useful when N is small to moderate, where GMM asymptotics may not hold.
The Dynamic Panel Model¶
Consider the standard dynamic panel data model with individual fixed effects:
where:
- \(y_{it}\) is the dependent variable for entity \(i\) at time \(t\)
- \(\rho\) is the autoregressive parameter (\(|\rho| < 1\) for stationarity)
- \(X_{it}\) is a vector of exogenous regressors
- \(\alpha_i\) is the unobserved entity-specific fixed effect
- \(\varepsilon_{it} \sim \text{iid}(0, \sigma^2_\varepsilon)\), independent of \(\alpha_i\)
For \(i = 1, \ldots, N\) entities and \(t = 1, \ldots, T\) time periods.
The Nickell (1981) Bias¶
Within Transformation¶
The standard Fixed Effects (within) estimator removes \(\alpha_i\) by demeaning:
where \(\ddot{z}_{it} = z_{it} - \bar{z}_i\) and \(\bar{z}_i = \frac{1}{T}\sum_{t=1}^T z_{it}\).
Source of the Bias¶
The demeaned lagged dependent variable \(\ddot{y}_{i,t-1}\) is correlated with the demeaned error \(\ddot{\varepsilon}_{it}\), because:
contains \(y_{i,t-1}\), which depends on \(\varepsilon_{i,t-1}\), while:
contains \(-\frac{1}{T}\varepsilon_{i,t-1}\) through the entity mean \(\bar{\varepsilon}_i\). This correlation does not vanish as \(N \to \infty\) with fixed \(T\).
Bias Formula¶
Nickell (1981) showed that for the AR(1) model without exogenous regressors:
For large \(T\), this simplifies to the well-known approximation:
Key Properties of the Nickell Bias
- Always downward for positive \(\rho\): the FE estimator underestimates persistence
- Does not vanish as \(N \to \infty\) with fixed \(T\)
- Magnitude depends on T: with \(\rho = 0.7\) and \(T = 10\), bias \(\approx -0.19\); with \(T = 5\), bias \(\approx -0.43\)
- Contaminates \(\hat{\beta}\): bias in \(\hat{\rho}\) propagates to all coefficient estimates
Numerical Examples¶
| \(\rho\) | \(T = 5\) | \(T = 10\) | \(T = 20\) | \(T = 50\) |
|---|---|---|---|---|
| 0.3 | -0.325 | -0.144 | -0.068 | -0.027 |
| 0.5 | -0.375 | -0.167 | -0.079 | -0.031 |
| 0.7 | -0.425 | -0.189 | -0.089 | -0.035 |
| 0.9 | -0.475 | -0.211 | -0.100 | -0.039 |
The table shows the approximate bias \(-\frac{1+\rho}{T-1}\) for different combinations of \(\rho\) and \(T\).
The LSDV Estimator¶
The LSDV (Least Squares Dummy Variable) estimator is algebraically equivalent to FE but explicitly includes entity dummies \(D_i\):
In matrix notation, define \(W = [y_{-1}, X]\) and \(D\) as the matrix of entity dummies. The LSDV estimator is:
where \(A = I_{NT} - D(D'D)^{-1}D'\) is the within-transformation matrix that projects out entity means, and \(\delta = (\rho, \beta')'\) is the full parameter vector.
Kiviet (1995) Bias Correction¶
The Idea¶
Rather than using instruments (as in GMM), Kiviet (1995) derives an analytical expression for the bias of \(\hat{\delta}_{LSDV}\) and subtracts it:
The bias depends on unknown parameters \((\rho, \beta, \sigma^2)\), which are estimated iteratively.
Bias Decomposition¶
The bias of the LSDV estimator can be decomposed as:
where \(c_1\), \(c_2\), \(c_3\) are bias terms of increasing order.
Order 1: \(O(T^{-1})\)¶
The leading bias term is:
where:
- \(F = \rho L + X \cdot g(\beta)\) captures the dynamic structure (with \(L\) being the lag operator matrix)
- \(q_0, q_1\) are functions of the data matrices
- \(\text{tr}(\cdot)\) denotes the matrix trace
More specifically, for the AR(1) model with exogenous regressors:
where \(C\) is the companion matrix encoding the AR dynamics.
Order 2: \(O(N^{-1}T^{-1})\)¶
The second-order term corrects for the interaction between the cross-sectional and time-series dimensions:
where \(B_1, B_2\) are functions of \(A\), the companion matrix, and the data.
Order 3: \(O(N^{-1}T^{-2})\)¶
Bun & Kiviet (2003) derive the third-order term, which provides a further refinement:
Practical Guidance on Bias Order
- Order 1 (
bias_order=1): Simplest; captures the dominant bias from the within transformation. Sufficient when T is moderate. - Order 2 (
bias_order=2): Recommended default. Adds the \(O(N^{-1}T^{-1})\) correction, which matters when both N and T are small. - Order 3 (
bias_order=3): Bun & Kiviet (2003) refinement. Gains over order 2 are typically small; use when maximum accuracy is needed.
Iterated Bias Correction¶
Since the bias formulas depend on the unknown parameters \(\delta\) and \(\sigma^2\), LSDVC uses an iterative procedure:
- Initialize: Obtain a consistent estimate \(\hat{\rho}_0\) from an initial estimator (Anderson-Hsiao, Arellano-Bond, or Blundell-Bond)
- LSDV: Compute \(\hat{\delta}_{LSDV}\) and \(\hat{\sigma}^2\) from Fixed Effects
- Correct: \(\hat{\delta}^{(k+1)} = \hat{\delta}_{LSDV} - \widehat{\text{Bias}}(\hat{\delta}^{(k)}, \hat{\sigma}^2)\)
- Iterate: Repeat step 3 until \(\|\hat{\delta}^{(k+1)} - \hat{\delta}^{(k)}\| < \text{tol}\)
The initial consistent estimator is needed only to start the iteration. The final LSDVC estimates are typically robust to the choice of initial estimator, though convergence may differ.
Bootstrap Inference¶
Why Bootstrap?¶
The analytical bias correction produces corrected point estimates, but the asymptotic distribution of \(\hat{\delta}_{LSDVC}\) is not straightforward. Standard errors from the LSDV variance-covariance matrix are not valid for the corrected estimator. Bruno (2005) proposes parametric bootstrap for inference.
Parametric Bootstrap Procedure¶
For \(b = 1, \ldots, B\) bootstrap replications:
- Generate errors: \(\varepsilon^*_{it} \sim N(0, \hat{\sigma}^2)\)
-
Simulate panel data from the estimated DGP:
\[ y^*_{it} = \hat{\rho} \, y^*_{i,t-1} + X_{it}'\hat{\beta} + \hat{\alpha}_i + \varepsilon^*_{it} \]using the original \(X_{it}\) and estimated fixed effects \(\hat{\alpha}_i\)
-
Re-estimate LSDVC on the simulated panel → \(\hat{\delta}^*_b\)
The bootstrap distribution \(\{\hat{\delta}^*_1, \ldots, \hat{\delta}^*_B\}\) provides:
- Standard errors: \(\text{SE}(\hat{\delta}_k) = \text{sd}(\hat{\delta}^*_{1,k}, \ldots, \hat{\delta}^*_{B,k})\)
- Confidence intervals: Percentile method using the \(\alpha/2\) and \(1-\alpha/2\) quantiles
- z-statistics and p-values: Using bootstrap standard errors with normal approximation
Number of Bootstrap Replications
- 200-500 replications are sufficient for standard errors
- 1000+ replications recommended for accurate confidence interval endpoints
- Use
seedparameter for reproducibility
Comparison with GMM¶
Both LSDVC and GMM address the Nickell bias, but they take fundamentally different approaches:
| Feature | LSDVC | Difference GMM | System GMM |
|---|---|---|---|
| Approach | Analytical bias correction | Moment conditions + instruments | Moment conditions + instruments |
| Asymptotics | Fixed N, fixed T | N → ∞, fixed T | N → ∞, fixed T |
| Small N | Works well (N < 100) | Unreliable | Unreliable |
| Instruments | Not needed (uses initial estimator) | Lagged levels | Lagged levels + differences |
| Exogenous regressors | Assumed strictly exogenous | Can handle predetermined/endogenous | Can handle predetermined/endogenous |
| Inference | Bootstrap | Asymptotic (Windmeijer-corrected) | Asymptotic (Windmeijer-corrected) |
| Overidentification test | Not applicable | Hansen J / Sargan | Hansen J / Sargan |
| Persistent series | Works with any \(\rho\) | Weak instruments for high \(\rho\) | Better for high \(\rho\) |
When to Use Each¶
- LSDVC: Small to moderate N (N < 100), strictly exogenous regressors, preference for simplicity
- Difference GMM: Large N, predetermined or endogenous regressors, need for overidentification tests
- System GMM: Large N, highly persistent processes, near-unit-root data
Key limitation of LSDVC
LSDVC assumes that all regressors in \(X_{it}\) are strictly exogenous — they cannot be correlated with past, present, or future errors. If you have predetermined or endogenous regressors, use GMM instead.
Extension to Unbalanced Panels¶
Bruno (2005) extends the Kiviet bias correction to unbalanced panels. The key modifications are:
- Entity-specific time dimensions \(T_i\) replace the common \(T\)
- The within-transformation matrix \(A\) accounts for varying \(T_i\)
- Bias formulas sum over entity-specific contributions weighted by \(T_i\)
The PanelBox implementation handles unbalanced panels automatically.
References¶
-
Nickell, S. (1981). "Biases in Dynamic Models with Fixed Effects." Econometrica, 49(6), 1417-1426.
-
Anderson, T. W., & Hsiao, C. (1982). "Formulation and Estimation of Dynamic Models Using Panel Data." Journal of Econometrics, 18(1), 47-82.
-
Kiviet, J. F. (1995). "On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic Panel Data Models." Journal of Econometrics, 68(1), 53-78.
-
Bun, M. J. G., & Kiviet, J. F. (2003). "On the Diminishing Returns of Higher-Order Terms in Asymptotic Expansions of Bias." Economics Letters, 79(2), 145-152.
-
Bruno, G. S. F. (2005). "Approximating the Bias of the LSDV Estimator for Dynamic Unbalanced Panel Data Models." Economics Letters, 87(3), 361-366.
-
Arellano, M., & Bond, S. (1991). "Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations." Review of Economic Studies, 58(2), 277-297.
-
Blundell, R., & Bond, S. (1998). "Initial Conditions and Moment Restrictions in Dynamic Panel Data Models." Journal of Econometrics, 87(1), 115-143.