Panel VAR Estimation¶
Quick Reference
Class: panelbox.var.model.PanelVAR
Data: panelbox.var.data.PanelVARData
Import: from panelbox.var import PanelVARData, PanelVAR
Stata equivalent: pvar (community-contributed)
R equivalent: panelvar::pvargmm()
Overview¶
Panel Vector Autoregression (Panel VAR) extends the classical VAR framework to panel data settings where multiple entities (countries, firms, individuals) are observed over time. Each endogenous variable is modeled as a function of its own lagged values and the lagged values of all other endogenous variables in the system, while controlling for entity-specific fixed effects.
The Panel VAR(p) model with \(K\) endogenous variables, \(N\) entities, and \(p\) lags is:
where \(Y_{it}\) is the \(K \times 1\) vector of endogenous variables for entity \(i\) at time \(t\), \(A_1, \ldots, A_p\) are \(K \times K\) coefficient matrices, \(\alpha_i\) are entity-specific fixed effects, and \(\varepsilon_{it} \sim (0, \Sigma)\) is the error term.
PanelBox estimates Panel VAR using OLS equation-by-equation with within transformation (entity demeaning) to remove fixed effects. This approach follows the methodology of Holtz-Eakin, Newey, and Rosen (1988) and the implementation strategy of Abrigo and Love (2016).
Quick Example¶
import pandas as pd
from panelbox.var import PanelVARData, PanelVAR
# Step 1: Prepare data container
var_data = PanelVARData(
data=df,
endog_vars=["gdp", "inflation", "unemployment"],
entity_col="country",
time_col="year",
lags=2,
trend="constant",
)
# Step 2: Create and estimate model
model = PanelVAR(data=var_data)
results = model.fit(method="ols", cov_type="clustered")
# Step 3: View results
print(results.summary())
print(f"Stable: {results.is_stable()}")
print(f"AIC: {results.aic:.4f}, BIC: {results.bic:.4f}")
When to Use¶
- Macroeconomic dynamics: Studying interactions between GDP, inflation, interest rates, and unemployment across countries
- Financial contagion: Analyzing how shocks propagate across markets or firms
- Policy analysis: Evaluating the dynamic effects of monetary or fiscal policy
- Firm-level dynamics: Modeling investment, employment, and output interactions across firms
- No strong prior on causality: When the causal ordering among variables is unknown
Key Assumptions
- Stationarity: All endogenous variables must be stationary (or cointegrated -- see VECM)
- No cross-entity contamination: Lags are constructed within each entity separately
- Homogeneous slope coefficients: The \(A_l\) matrices are assumed identical across entities
- Continuous time series: No internal gaps allowed within any entity's time series
- Sufficient time periods: \(T > K \times p + 1\) for each entity after lag construction
Detailed Guide¶
Data Preparation¶
The PanelVARData class handles all data preparation, including lag construction, gap detection, and missing value handling.
from panelbox.var import PanelVARData
var_data = PanelVARData(
data=df, # pandas DataFrame in long format
endog_vars=["gdp", "inflation", "rate"], # Endogenous variables (K)
entity_col="country", # Entity identifier column
time_col="year", # Time identifier column
exog_vars=None, # Optional exogenous variables
lags=2, # Number of lags (p)
trend="constant", # Deterministic terms
dropna="any", # Missing value strategy
)
# Inspect data properties
print(f"Variables (K): {var_data.K}")
print(f"Lags (p): {var_data.p}")
print(f"Entities (N): {var_data.N}")
print(f"Observations: {var_data.n_obs}")
print(f"Balanced: {var_data.is_balanced}")
print(f"T range: [{var_data.T_min}, {var_data.T_max}]")
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
data |
pd.DataFrame |
required | Panel data in long format |
endog_vars |
list[str] |
required | Names of endogenous variables |
entity_col |
str |
required | Entity identifier column name |
time_col |
str |
required | Time identifier column name |
exog_vars |
list[str] |
None |
Optional exogenous variable names |
lags |
int |
1 |
Number of lags (\(p\)) |
trend |
str |
"constant" |
"none", "constant", "trend", or "both" |
dropna |
str |
"any" |
"any" (drop if any variable missing) or "equation" |
Critical Safety Feature
Lags are constructed using .groupby(entity).shift() to ensure that lag \(t\) of entity A never contains an observation from entity B. A cross-contamination verification check runs automatically after lag construction.
Estimation¶
The PanelVAR class performs OLS equation-by-equation estimation with within transformation (entity demeaning) to remove fixed effects.
from panelbox.var import PanelVAR
model = PanelVAR(data=var_data)
results = model.fit(
method="ols", # Estimation method
cov_type="clustered", # Covariance estimator
)
Within Transformation: Before OLS estimation, each variable is demeaned within each entity:
This removes the entity-specific fixed effects \(\alpha_i\) and avoids the incidental parameters problem.
Estimation proceeds equation by equation: For each equation \(k = 1, \ldots, K\):
- Apply within transformation to \(y_k\) and \(X\)
- Estimate \(\hat{\beta}_k = (X'X)^{-1} X' y_k\)
- Compute residuals \(\hat{\varepsilon}_k = y_k - X \hat{\beta}_k\)
- Compute covariance matrix using the specified method
Covariance Types¶
cov_type |
Description | When to Use |
|---|---|---|
"clustered" |
Cluster-robust by entity | Default (recommended). Accounts for within-entity correlation |
"driscoll_kraay" |
Driscoll-Kraay HAC | Cross-sectional dependence suspected |
"hc1" |
Heteroskedasticity-robust (HC1) | Heteroskedastic errors, no clustering |
"nonrobust" |
Classical OLS | Homoskedastic, no clustering (rarely appropriate) |
"sur" |
Seemingly Unrelated Regressions | Exploit cross-equation correlation |
# Cluster-robust standard errors (recommended)
results = model.fit(cov_type="clustered")
# Driscoll-Kraay for cross-sectional dependence
results = model.fit(cov_type="driscoll_kraay", max_lags=3)
# SUR covariance (exploits cross-equation correlation)
results = model.fit(cov_type="sur")
Lag Selection¶
Choosing the optimal lag order is crucial. Too few lags can lead to omitted variable bias; too many waste degrees of freedom and reduce efficiency.
# Automatic lag selection
lag_result = model.select_lag_order(max_lags=8, cov_type="clustered")
print(lag_result.summary())
# Access optimal lag by criterion
optimal_bic = lag_result.selected["BIC"]
optimal_aic = lag_result.selected["AIC"]
# Visualize information criteria
fig = lag_result.plot(backend="plotly")
The select_lag_order method tests \(p = 1, 2, \ldots, \text{max\_lags}\) and computes four information criteria:
| Criterion | Formula | Properties |
|---|---|---|
| AIC | $\log | \hat{\Sigma} |
| BIC | $\log | \hat{\Sigma} |
| HQIC | $\log | \hat{\Sigma} |
| MBIC | $\log | \hat{\Sigma} |
The LagOrderResult object contains:
criteria_df: DataFrame with all criteria values for each lagselected: Dictionary mapping criterion name to optimal lagsummary(): Formatted summary tableplot(): Visual comparison of criteria
Practical Advice
BIC is generally recommended for Panel VAR because it penalizes complexity more heavily and is consistent (selects the true lag order as \(N, T \to \infty\)). Start with max_lags=8 and reduce if you get warnings about insufficient observations.
Interpreting Results¶
The PanelVARResult object provides comprehensive access to estimation results.
# Coefficient matrices A_1, A_2, ..., A_p
for lag in range(1, results.p + 1):
print(f"\nA_{lag}:")
print(results.coef_matrix(lag)) # Returns labeled DataFrame
# Residual covariance matrix
print(f"\nSigma (residual covariance):\n{results.Sigma}")
# Information criteria
print(f"AIC: {results.aic:.6f}")
print(f"BIC: {results.bic:.6f}")
print(f"HQIC: {results.hqic:.6f}")
print(f"Log-likelihood: {results.loglik:.2f}")
# Per-equation summary
for k in range(results.K):
print(results.equation_summary(k))
# System-level summary (compact)
print(results.summary_system())
# Full summary with coefficient tables
print(results.summary())
Key Result Attributes:
| Attribute | Type | Description |
|---|---|---|
params_by_eq |
list[np.ndarray] |
Coefficient vectors per equation |
std_errors_by_eq |
list[np.ndarray] |
Standard errors per equation |
A_matrices |
list[np.ndarray] |
\(K \times K\) coefficient matrices \([A_1, \ldots, A_p]\) |
Sigma |
np.ndarray |
Residual covariance matrix (\(K \times K\)) |
aic, bic, hqic |
float |
Information criteria |
loglik |
float |
Log-likelihood |
K, p, N, n_obs |
int |
Dimensions |
Stability Analysis¶
A Panel VAR is stable (stationary) if all eigenvalues of the companion matrix have modulus strictly less than 1. Stability is essential for meaningful impulse response functions and forecasts.
# Check stability
print(f"Stable: {results.is_stable()}")
print(f"Max eigenvalue modulus: {results.max_eigenvalue_modulus:.6f}")
print(f"Stability margin: {results.stability_margin:.6f}")
# Eigenvalues of companion matrix
eigenvalues = results.eigenvalues
print(f"Eigenvalues: {eigenvalues}")
# Companion matrix (Kp x Kp)
F = results.companion_matrix()
# Visual stability check
results.plot_stability(backend="matplotlib")
The companion matrix \(\mathbf{F}\) reformulates VAR(p) as VAR(1):
The plot_stability() method visualizes eigenvalues on the complex plane with the unit circle, making it easy to identify whether any eigenvalues lie outside the circle (unstable system).
Unstable VAR
If is_stable() returns False, the system is explosive. IRFs will diverge rather than converge to zero, and forecasts will blow up. Consider:
- Differencing non-stationary variables
- Using VECM for cointegrated variables (see VECM)
- Reducing the lag order
- Checking for data issues
Export Results¶
# LaTeX table
latex = results.to_latex()
with open("var_results.tex", "w") as f:
f.write(latex)
# HTML table
html = results.to_html()
# Single equation export
latex_eq1 = results.to_latex(equation=0)
Configuration Options¶
PanelVAR.fit() Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
method |
str |
"ols" |
Estimation method (currently "ols" only) |
cov_type |
str |
"clustered" |
Covariance estimator type |
**cov_kwds |
Additional keyword arguments for covariance (e.g., max_lags for Driscoll-Kraay) |
PanelVAR.select_lag_order() Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
max_lags |
int |
8 |
Maximum number of lags to test |
criteria |
list[str] |
["AIC", "BIC", "HQIC", "MBIC"] |
Information criteria to compute |
cov_type |
str |
"clustered" |
Covariance type for each estimation |
Complete Workflow Example¶
import pandas as pd
from panelbox.var import PanelVARData, PanelVAR
# Load macroeconomic panel data
# df has columns: country, year, gdp_growth, inflation, unemployment
df = pd.read_csv("macro_panel.csv")
# Step 1: Create data container
var_data = PanelVARData(
data=df,
endog_vars=["gdp_growth", "inflation", "unemployment"],
entity_col="country",
time_col="year",
lags=2, # Start with initial guess
trend="constant",
)
print(f"Panel: N={var_data.N}, T_avg={var_data.T_avg:.1f}, "
f"K={var_data.K}, balanced={var_data.is_balanced}")
# Step 2: Select optimal lag order
model = PanelVAR(data=var_data)
lag_result = model.select_lag_order(max_lags=6)
print(lag_result.summary())
optimal_p = lag_result.selected["BIC"]
print(f"\nOptimal lag (BIC): {optimal_p}")
# Step 3: Re-estimate with optimal lag
var_data_opt = PanelVARData(
data=df,
endog_vars=["gdp_growth", "inflation", "unemployment"],
entity_col="country",
time_col="year",
lags=optimal_p,
)
model_opt = PanelVAR(data=var_data_opt)
results = model_opt.fit(method="ols", cov_type="clustered")
# Step 4: Check stability
print(f"\nStable: {results.is_stable()}")
print(f"Max eigenvalue modulus: {results.max_eigenvalue_modulus:.4f}")
# Step 5: Full summary
print(results.summary())
# Step 6: Proceed to analysis
# IRF, FEVD, Granger causality, forecasting
irf = results.irf(periods=10, method="cholesky")
gc = results.granger_causality(cause="inflation", effect="gdp_growth")
print(gc.summary())
Tutorials¶
| Tutorial | Description | Link |
|---|---|---|
| Panel VAR Notebook | Full estimation workflow |
See Also¶
- Impulse Response Functions -- Dynamic effects of shocks
- Forecast Error Variance Decomposition -- Relative importance of shocks
- Granger Causality -- Testing predictive relationships
- VECM -- For cointegrated (non-stationary) variables
- Forecasting -- Multi-step ahead predictions
References¶
- Holtz-Eakin, D., Newey, W., & Rosen, H. S. (1988). Estimating vector autoregressions with panel data. Econometrica, 56(6), 1371-1395.
- Abrigo, M. R., & Love, I. (2016). Estimation of panel vector autoregression in Stata. The Stata Journal, 16(3), 778-804.
- Andrews, D. W. K., & Lu, B. (2001). Consistent model and moment selection procedures for GMM estimation with application to dynamic panel data models. Journal of Econometrics, 101(1), 123-164.
- Luetkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer-Verlag.