Newey-West HAC Standard Errors¶

Quick Reference

Class: panelbox.standard_errors.NeweyWestStandardErrors Convenience: panelbox.standard_errors.newey_west() Model integration: model.fit(cov_type="newey_west") Stata equivalent: newey, lag(m) R equivalent: sandwich::NeweyWest()

Overview¶

When errors are serially correlated --- that is, \(\text{Cov}(\varepsilon_t, \varepsilon_{t-j}) \neq 0\) for some \(j > 0\) --- both classical and heteroskedasticity-robust standard errors are invalid. The Newey-West (1987) estimator provides standard errors that are consistent in the presence of both heteroskedasticity and autocorrelation of unknown form.

The estimator uses kernel-weighted autocovariances to capture serial dependence up to a specified maximum lag, while ensuring the resulting covariance matrix is positive semi-definite.

When to Use¶

Time series or longitudinal data with serial correlation
Single entity observed over many time periods
Panel data treated as pooled time series (before considering cross-sectional dependence)

When NOT to use

For panel data with cross-sectional dependence, use Driscoll-Kraay (which extends Newey-West to panels).
For panel data with within-entity correlation, clustered SE is simpler and often sufficient.
Newey-West does not handle cross-sectional dependence.

Quick Example¶

from panelbox.standard_errors import NeweyWestStandardErrors, newey_west

# Convenience function
result = newey_west(X, resid, max_lags=4)
print(f"SE: {result.std_errors}")
print(f"Max lags: {result.max_lags}")
print(f"Kernel: {result.kernel}")
print(f"Prewhitening: {result.prewhitening}")

# Class-based (more control)
nw = NeweyWestStandardErrors(
    X=X, resid=resid,
    max_lags=6,
    kernel="bartlett",
    prewhitening=False,
)
result = nw.compute()

# Via model.fit()
from panelbox.models import PooledOLS
model = PooledOLS("y ~ x1 + x2", data, entity="firm", time="year")
results = model.fit(cov_type="newey_west")
print(results.summary())

Mathematical Details¶

The Newey-West Estimator¶

The HAC covariance matrix is:

\[ V_{NW} = (X'X)^{-1} \hat{\Omega}_{NW} (X'X)^{-1} \]

where the meat is:

\[ \hat{\Omega}_{NW} = \hat{\Gamma}_0 + \sum_{j=1}^{m} w(j) \left( \hat{\Gamma}_j + \hat{\Gamma}_j' \right) \]

Autocovariance Matrices¶

The lag-\(j\) autocovariance matrix captures serial dependence at lag \(j\):

\[ \hat{\Gamma}_j = \frac{1}{n} \sum_{t=j+1}^{n} \left( x_t \hat{e}_t \right) \left( x_{t-j} \hat{e}_{t-j} \right)' \]

At lag 0, this reduces to the heteroskedasticity-robust meat:

\[ \hat{\Gamma}_0 = \frac{1}{n} \sum_{t=1}^{n} \hat{e}_t^2 x_t x_t' \]

Kernel Weighting¶

The kernel weights \(w(j)\) serve two purposes: (1) downweight distant autocovariances (which are noisily estimated), and (2) ensure positive semi-definiteness.

Bartlett (default)ParzenQuadratic Spectral

\[w(j) = 1 - \frac{j}{m+1}\]

Linear decay to zero. Simple and widely used.

\[ w(j) = \begin{cases} 1 - 6z^2 + 6z^3 & \text{if } z \leq 0.5 \\ 2(1-z)^3 & \text{if } 0.5 < z \leq 1 \end{cases} \]

where \(z = j/(m+1)\). Smoother decay with better bias properties.

\[w(j) = \frac{3}{x^2}\left(\frac{\sin x}{x} - \cos x\right)\]

where \(x = 6\pi j / (5(m+1))\). Optimal in MSE sense (Andrews, 1991).

Bandwidth Selection¶

The default bandwidth follows the Newey-West (1994) rule:

\[ m = \left\lfloor 4 \left( \frac{n}{100} \right)^{2/9} \right\rfloor \]

where \(n\) is the number of observations. This is conservative and works well in practice.

Prewhitening¶

The optional AR(1) prewhitening step (Andrews & Monahan, 1992):

Fit \(\hat{e}_t = \rho \hat{e}_{t-1} + v_t\)
Apply HAC to the prewhitened residuals \(v_t\)
Transform back to get the covariance of the original estimator

Prewhitening can reduce finite-sample bias when autocorrelation is strong.

Configuration Options¶

NeweyWestStandardErrors Class¶

Parameter	Type	Default	Description
`X`	`np.ndarray`	---	Design matrix \((n \times k)\)
`resid`	`np.ndarray`	---	Residuals \((n,)\)
`max_lags`	`int` or `None`	`None`	Maximum lags; if `None`, uses \(\lfloor 4(n/100)^{2/9} \rfloor\)
`kernel`	`str`	`"bartlett"`	Kernel: `"bartlett"`, `"parzen"`, `"quadratic_spectral"`
`prewhitening`	`bool`	`False`	Apply AR(1) prewhitening

NeweyWestResult¶

Attribute	Type	Description
`cov_matrix`	`np.ndarray`	NW covariance matrix \((k \times k)\)
`std_errors`	`np.ndarray`	NW standard errors \((k,)\)
`max_lags`	`int`	Lags used
`kernel`	`str`	Kernel function used
`n_obs`	`int`	Number of observations
`n_params`	`int`	Number of parameters
`prewhitening`	`bool`	Whether prewhitening was applied

Prewhitening Comparison¶

from panelbox.standard_errors import NeweyWestStandardErrors

# Without prewhitening
nw = NeweyWestStandardErrors(X, resid, max_lags=4, prewhitening=False)
result_no_pw = nw.compute()

# With prewhitening
nw_pw = NeweyWestStandardErrors(X, resid, max_lags=4, prewhitening=True)
result_pw = nw_pw.compute()

print("Without prewhitening:", result_no_pw.std_errors)
print("With prewhitening:   ", result_pw.std_errors)

Newey-West vs Driscoll-Kraay¶

Feature	Newey-West	Driscoll-Kraay
Heteroskedasticity	Yes	Yes
Autocorrelation	Yes	Yes
Cross-sectional dependence	No	Yes
Designed for	Single time series	Panel data
Asymptotics	\(n \to \infty\)	\(T \to \infty\)
Bandwidth default	\(\lfloor 4(n/100)^{2/9} \rfloor\)	\(\lfloor 4(T/100)^{2/9} \rfloor\)

In panel data, Driscoll-Kraay is generally preferred because it additionally handles cross-sectional dependence. Newey-West is appropriate when working with a single long time series or when cross-sectional independence holds.

Diagnostics¶

Diagnostic Summary¶

nw = NeweyWestStandardErrors(X, resid, max_lags=4)
print(nw.diagnostic_summary())

Testing for Serial Correlation¶

from panelbox.validation.serial_correlation.wooldridge_ar import WooldridgeARTest

ar_result = WooldridgeARTest(results).run(alpha=0.05)
print(f"Wooldridge AR(1): statistic={ar_result.statistic:.3f}, p={ar_result.pvalue:.4f}")
print(ar_result.conclusion)

Common Pitfalls¶

Pitfall 1: Too many lags

Setting max_lags too high relative to sample size wastes degrees of freedom and increases variance. Use the default or set \(m < n/3\).

Pitfall 2: Using NW for panel data

Newey-West does not account for cross-sectional dependence. In panel data, errors at the same time period across different entities may be correlated. Use Driscoll-Kraay instead.

Pitfall 3: Small samples

NW standard errors require a moderately large sample (\(n > 50\)) for the HAC estimator to perform well. The diagnostic summary warns when \(n < 50\).

References¶

Newey, W. K., & West, K. D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3), 703-708.
Newey, W. K., & West, K. D. (1994). Automatic lag selection in covariance matrix estimation. Review of Economic Studies, 61(4), 631-653.
Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59(3), 817-858.
Andrews, D. W. K., & Monahan, J. C. (1992). An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator. Econometrica, 60(4), 953-966.