Panel-Corrected Standard Errors (PCSE)¶

Quick Reference

Class: panelbox.standard_errors.PanelCorrectedStandardErrors Convenience: panelbox.standard_errors.pcse() Model integration: model.fit(cov_type="pcse") Stata equivalent: xtpcse R equivalent: pcse::pcse()

Overview¶

Panel-Corrected Standard Errors (PCSE), proposed by Beck & Katz (1995), are designed for time-series cross-section (TSCS) data --- datasets with a small number of entities (\(N\)) observed over a long time period (\(T\)). Typical examples include panels of countries, states, or industries over decades.

PCSE estimates the full \(N \times N\) contemporaneous cross-sectional covariance matrix \(\hat{\Sigma}\) from the residuals, then uses it to compute corrected standard errors. This accounts for cross-entity error correlation (e.g., all countries being affected by a global recession in the same year).

Key Requirement: \(T > N\)

PCSE requires more time periods than entities. If \(T \leq N\), the estimated \(\hat{\Sigma}\) matrix is singular or poorly conditioned. For micro panels where \(N \gg T\), use clustered SE instead.

When to Use¶

Macro panels: Countries, states, or regions observed over decades (\(N = 20\), \(T = 40\))
Political science TSCS data: International relations, comparative politics
Industry panels: Small number of industries over many quarters/years
When contemporaneous cross-sectional correlation is expected

When NOT to use

Micro panels (\(N \gg T\)): Use clustered SE
Spatial correlation with distance decay: Use Spatial HAC
Large \(N\), moderate \(T\): Use Driscoll-Kraay (does not require \(T > N\))

Quick Example¶

from panelbox.standard_errors import PanelCorrectedStandardErrors, pcse

# Convenience function
result = pcse(X, resid, entity_ids, time_ids)
print(f"SE: {result.std_errors}")
print(f"Entities (N): {result.n_entities}")
print(f"Periods (T): {result.n_periods}")
print(f"Sigma matrix shape: {result.sigma_matrix.shape}")

# Class-based (with diagnostics)
pcse_calc = PanelCorrectedStandardErrors(X, resid, entity_ids, time_ids)
result = pcse_calc.compute()
print(pcse_calc.diagnostic_summary())

# Via model.fit()
from panelbox.models import PooledOLS
model = PooledOLS("y ~ x1 + x2", data, entity="country", time="year")
results = model.fit(cov_type="pcse")
print(results.summary())

Mathematical Details¶

The PCSE Estimator¶

PCSE uses FGLS with the estimated contemporaneous covariance matrix:

\[ V_{PCSE} = (X' \hat{\Omega}^{-1} X)^{-1} \]

where:

\[ \hat{\Omega} = \hat{\Sigma} \otimes I_T \]

and \(\otimes\) denotes the Kronecker product.

Estimating \(\hat{\Sigma}\)¶

The contemporaneous covariance matrix \(\hat{\Sigma}\) is estimated from the OLS residuals:

\[ \hat{\Sigma}_{ij} = \frac{1}{T} \sum_{t=1}^{T} \hat{e}_{it} \hat{e}_{jt} \]

In matrix form, if \(E\) is the \(N \times T\) matrix of residuals (entities as rows, time as columns):

\[ \hat{\Sigma} = \frac{1}{T} E E' \]

Why \(T > N\)?¶

The matrix \(\hat{\Sigma}\) is \(N \times N\), but its rank is at most \(\min(N, T)\). When \(T \leq N\):

\(\hat{\Sigma}\) is singular (rank \(T < N\))
\(\hat{\Sigma}^{-1}\) does not exist (PanelBox falls back to pseudo-inverse with a warning)
Standard errors are unreliable

A safe rule of thumb is \(T > 2N\) for well-conditioned estimation.

PCSE vs FGLS¶

Beck & Katz (1995) showed that Parks-Kmenta FGLS standard errors are severely anti-conservative in typical TSCS settings, rejecting at 50-60% when the true rejection rate should be 5%. PCSE provides much better coverage properties:

Method	True rejection rate (\(\alpha = 0.05\))	Coverage
OLS with classical SE	Varies	May under/over-cover
FGLS (Parks-Kmenta)	50-60%	Severe under-coverage
OLS with PCSE	5-8%	Near-nominal coverage

Configuration Options¶

PanelCorrectedStandardErrors Class¶

Parameter	Type	Description
`X`	`np.ndarray`	Design matrix \((n \times k)\)
`resid`	`np.ndarray`	Residuals \((n,)\)
`entity_ids`	`np.ndarray`	Entity identifiers \((n,)\)
`time_ids`	`np.ndarray`	Time period identifiers \((n,)\)

PCSEResult¶

Attribute	Type	Description
`cov_matrix`	`np.ndarray`	PCSE covariance matrix \((k \times k)\)
`std_errors`	`np.ndarray`	PCSE standard errors \((k,)\)
`sigma_matrix`	`np.ndarray`	Estimated cross-sectional covariance \(\hat{\Sigma}\) \((N \times N)\)
`n_obs`	`int`	Number of observations
`n_params`	`int`	Number of parameters
`n_entities`	`int`	Number of entities (\(N\))
`n_periods`	`int`	Number of time periods (\(T\))

Diagnostics¶

Diagnostic Summary¶

pcse_calc = PanelCorrectedStandardErrors(X, resid, entity_ids, time_ids)
print(pcse_calc.diagnostic_summary())

The summary reports:

Number of observations, entities (\(N\)), and time periods (\(T\))
\(T/N\) ratio and whether it is sufficient
Warnings if \(T \leq N\) or \(T < 2N\)

Examining the Cross-Sectional Covariance¶

result = pcse_calc.compute()

# Inspect Sigma matrix
import numpy as np
print(f"Sigma shape: {result.sigma_matrix.shape}")
print(f"Sigma diagonal (variances): {np.diag(result.sigma_matrix)}")
print(f"Sigma condition number: {np.linalg.cond(result.sigma_matrix):.2f}")

# Correlation matrix
D_inv = np.diag(1.0 / np.sqrt(np.diag(result.sigma_matrix)))
corr = D_inv @ result.sigma_matrix @ D_inv
print(f"Cross-sectional correlation range: [{corr.min():.3f}, {corr.max():.3f}]")

Common Pitfalls¶

Pitfall 1: \(T \leq N\)

The most critical issue. With 30 countries and 20 years, \(\hat{\Sigma}\) is rank-deficient. PanelBox warns and uses pseudo-inverse, but results are unreliable. Use clustered SE or Driscoll-Kraay instead.

Pitfall 2: Unbalanced panels

PCSE works best with balanced panels. Missing observations reduce the effective \(T\) for pairwise covariance estimation, which can degrade \(\hat{\Sigma}\).

Pitfall 3: Ignoring autocorrelation

Standard PCSE accounts for contemporaneous cross-sectional correlation but not autocorrelation. If residuals are serially correlated, consider combining PCSE with a Prais-Winsten transformation or using Driscoll-Kraay.

Pitfall 4: Using PCSE coefficients instead of OLS

Beck & Katz (1995) recommend using OLS coefficients with PCSE standard errors, not FGLS coefficients. The OLS estimator is consistent and PCSE corrects only the standard errors.

References¶

Beck, N., & Katz, J. N. (1995). What to do (and not to do) with time-series cross-section data. American Political Science Review, 89(3), 634-647.
Beck, N., & Katz, J. N. (1996). Nuisance vs. substance: Specifying and estimating time-series-cross-section models. Political Analysis, 6, 1-36.
Bailey, D., & Katz, J. N. (2011). Implementing panel corrected standard errors in R: The pcse package. Journal of Statistical Software, 42(CS1), 1-11.
Parks, R. W. (1967). Efficient estimation of a system of regression equations when disturbances are both serially and contemporaneously correlated. Journal of the American Statistical Association, 62(318), 500-509.