Granger Causality¶
Quick Reference
Classes: panelbox.var.causality.GrangerCausalityResult, panelbox.var.causality.DumitrescuHurlinResult
Methods: PanelVARResult.granger_causality(), PanelVARResult.dumitrescu_hurlin()
Import: from panelbox.var import PanelVAR
Stata equivalent: pvargranger
R equivalent: panelvar::pvargmm() + Granger tests
Overview¶
Granger causality tests whether the past values of one variable help predict another variable beyond what the other variable's own past values can predict. In a Panel VAR context, this provides a formal statistical framework for testing predictive relationships between variables.
Given the Panel VAR equation for variable \(Y\):
Granger causality from \(X\) to \(Y\) tests:
PanelBox provides two complementary approaches:
- Standard Panel Wald test: Assumes homogeneous coefficients across entities (pooled test)
- Dumitrescu-Hurlin (2012) test: Allows for heterogeneous causality across entities -- causality may exist for some entities but not others
Quick Example¶
from panelbox.var import PanelVARData, PanelVAR
# Estimate model
var_data = PanelVARData(df, endog_vars=["gdp", "inflation", "rate"],
entity_col="country", time_col="year", lags=2)
model = PanelVAR(data=var_data)
results = model.fit(cov_type="clustered")
# Standard Granger causality test
gc = results.granger_causality(cause="inflation", effect="gdp")
print(gc.summary())
# Dumitrescu-Hurlin heterogeneous test
dh = results.dumitrescu_hurlin(cause="inflation", effect="gdp")
print(dh.summary())
# Full causality matrix (all pairs)
gc_matrix = results.granger_causality_matrix()
print(gc_matrix)
When to Use¶
- Predictive relationships: Does knowing past inflation help predict future GDP growth?
- Policy analysis: Does monetary policy (interest rate) Granger-cause output?
- Identifying variable ordering: Granger causality can inform the ordering for Cholesky IRFs
- Model specification: Variables with no causal relationship may be excluded from the system
- Heterogeneity: Use Dumitrescu-Hurlin when causality may differ across entities
Key Assumptions
- Granger causality is NOT structural causality: It measures predictive content, not true causal mechanisms
- Sensitive to lag length: Results may change with different lag orders
- Requires stationarity: Variables must be stationary (or use VECM for cointegrated variables)
- Omitted variables: Missing relevant variables can create spurious Granger causality
Detailed Guide¶
Standard Panel Granger Causality (Wald Test)¶
The standard test assumes homogeneous coefficients across all entities and tests joint significance of all lags of the causing variable in the equation of the effect variable.
Test statistics:
- Wald statistic: \(W = (R\hat{\beta})' [R \cdot \text{Var}(\hat{\beta}) \cdot R']^{-1} (R\hat{\beta}) \sim \chi^2(p)\)
- F-statistic: \(F = W / p\)
The restriction matrix \(R\) selects all \(p\) lag coefficients of the causing variable in the effect variable's equation.
GrangerCausalityResult attributes:
| Attribute | Type | Description |
|---|---|---|
cause |
str |
Causing variable name |
effect |
str |
Effect variable name |
wald_stat |
float |
Wald test statistic |
f_stat |
float |
F-statistic (\(W/p\)) |
df |
int |
Degrees of freedom (number of lags) |
p_value |
float |
P-value from \(\chi^2\) distribution |
p_value_f |
float |
P-value from F distribution |
conclusion |
str |
Statistical conclusion |
lags_tested |
int |
Number of lags tested |
# Access individual results
print(f"Wald stat: {gc.wald_stat:.4f}")
print(f"F-stat: {gc.f_stat:.4f}")
print(f"P-value: {gc.p_value:.4f}")
print(f"Conclusion: {gc.conclusion}")
All-Pairs Causality Matrix¶
Test Granger causality for all variable pairs simultaneously:
# P-value matrix (K x K)
gc_matrix = results.granger_causality_matrix(significance_level=0.05)
print(gc_matrix)
The resulting DataFrame has p-values where element \((i, j)\) is the p-value for testing whether variable \(i\) Granger-causes variable \(j\). Diagonal elements are NaN.
Dumitrescu-Hurlin (2012) Heterogeneous Test¶
The standard Wald test assumes that all entities share the same causal relationship. The Dumitrescu-Hurlin (DH) test relaxes this assumption by allowing for heterogeneous coefficients across entities.
The DH procedure:
- For each entity \(i\), estimate an individual bivariate regression and compute entity-specific Wald statistic \(W_i\)
- Compute the average: \(\bar{W} = \frac{1}{N} \sum_{i=1}^{N} W_i\)
- Standardize using two statistics:
- \(\tilde{Z} = \sqrt{\frac{N}{2p}} (\bar{W} - p)\) -- for fixed \(T\), \(N \to \infty\)
- \(\bar{Z} = \sqrt{N} \frac{\bar{W} - E[W_i]}{Var[W_i]}\) -- for \(T \to \infty\), \(N \to \infty\)
Both test statistics are asymptotically standard normal under \(H_0\).
DumitrescuHurlinResult attributes:
| Attribute | Type | Description |
|---|---|---|
cause |
str |
Causing variable |
effect |
str |
Effect variable |
W_bar |
float |
Average Wald statistic across entities |
Z_tilde_stat |
float |
\(\tilde{Z}\) statistic (for fixed \(T\), \(N \to \infty\)) |
Z_tilde_pvalue |
float |
P-value for \(\tilde{Z}\) |
Z_bar_stat |
float |
\(\bar{Z}\) statistic (for \(T \to \infty\), \(N \to \infty\)) |
Z_bar_pvalue |
float |
P-value for \(\bar{Z}\) |
individual_W |
np.ndarray |
Per-entity Wald statistics |
recommended_stat |
str |
"Z_tilde" or "Z_bar" (automatic selection) |
N |
int |
Number of entities |
T_avg |
float |
Average time periods |
lags |
int |
Number of lags tested |
Which Statistic to Use?
PanelBox automatically recommends the appropriate statistic based on the sample:
- \(\tilde{Z}\) (
Z_tilde): Use when \(T\) is small (< 10) relative to \(N\) - \(\bar{Z}\) (
Z_bar): Use when both \(T\) and \(N\) are large
The recommended_stat attribute tells you which to use.
Visualizing Individual Heterogeneity¶
The DH test provides per-entity Wald statistics, revealing the distribution of causality strength across entities:
# Plot histogram of individual Wald statistics
dh.plot_individual_statistics(backend="matplotlib")
# Access individual statistics
for i, w in enumerate(dh.individual_W):
print(f"Entity {i}: W = {w:.4f}")
The plot shows the distribution of \(W_i\) values with the 5% critical value and the average \(\bar{W}\) marked. Entities above the critical value exhibit individual Granger causality.
Instantaneous Causality¶
Test for contemporaneous (same-period) correlation between variables:
# Single pair
ic = results.instantaneous_causality(var1="gdp", var2="inflation")
print(ic.summary())
# Full matrix
corr_matrix, pvalue_matrix = results.instantaneous_causality_matrix()
print("Correlation matrix:")
print(corr_matrix)
print("\nP-value matrix:")
print(pvalue_matrix)
The test uses the likelihood ratio statistic:
where \(r\) is the correlation between residuals of the two equations.
Causality Network Visualization¶
Visualize all significant Granger causality relationships as a directed network graph:
# Interactive network plot
results.plot_causality_network(
threshold=0.05, # Significance threshold
layout="circular", # "circular", "spring", "kamada_kawai", "shell"
backend="plotly", # "plotly" or "matplotlib"
)
The network shows:
- Nodes: Variables
- Directed edges: Significant Granger causality relationships (\(p < \text{threshold}\))
- Edge thickness: Inversely proportional to p-value (stronger significance = thicker edge)
- Edge color: Dark green (\(p < 0.01\)), green (\(p < 0.05\)), orange (\(p < 0.10\))
Requirement
Network visualization requires networkx: pip install networkx
Comparison: Standard vs Dumitrescu-Hurlin¶
| Feature | Standard Wald | Dumitrescu-Hurlin |
|---|---|---|
| Coefficients | Homogeneous (pooled) | Heterogeneous (per-entity) |
| Null hypothesis | \(\beta_l = 0\) for all lags | \(\beta_{il} = 0\) for all entities and lags |
| Alternative | Causality for all entities | Causality for at least some entities |
| Requires | Fitted Panel VAR | Raw panel data (re-estimates per entity) |
| Small-sample | More powerful (if homogeneous) | More reliable under heterogeneity |
| Entity-level results | No | Yes (individual_W) |
Common Pitfalls¶
Granger Causality is NOT True Causality
Granger causality is a statistical concept about predictive content, not about structural or mechanistic causation. Variable \(X\) Granger-causing \(Y\) means that past values of \(X\) contain information useful for predicting \(Y\) beyond \(Y\)'s own history. This could be due to:
- True causal effect
- Common unobserved factors
- Third variable driving both
Sensitivity to Lag Length
Results can change substantially with different lag orders. Always:
- Use information criteria (BIC) to select the lag order first
- Test robustness with \(p \pm 1\) lags
- Report the lag length alongside results
Complete Workflow Example¶
import pandas as pd
from panelbox.var import PanelVARData, PanelVAR
# Load data
df = pd.read_csv("macro_panel.csv")
# Estimate Panel VAR
var_data = PanelVARData(
data=df,
endog_vars=["gdp_growth", "inflation", "interest_rate"],
entity_col="country",
time_col="year",
lags=2,
)
model = PanelVAR(data=var_data)
results = model.fit(cov_type="clustered")
# 1. Standard Granger causality -- all pairs
print("=== Standard Granger Causality (Wald) ===\n")
gc_matrix = results.granger_causality_matrix()
print(gc_matrix)
# 2. Dumitrescu-Hurlin -- allows heterogeneous causality
print("\n=== Dumitrescu-Hurlin Heterogeneous Test ===\n")
pairs = [
("inflation", "gdp_growth"),
("interest_rate", "gdp_growth"),
("gdp_growth", "inflation"),
("interest_rate", "inflation"),
]
for cause, effect in pairs:
dh = results.dumitrescu_hurlin(cause=cause, effect=effect)
rec_stat = dh.recommended_stat
rec_pval = dh.Z_tilde_pvalue if rec_stat == "Z_tilde" else dh.Z_bar_pvalue
sig = "***" if rec_pval < 0.01 else "**" if rec_pval < 0.05 else "*" if rec_pval < 0.10 else ""
print(f"{cause} -> {effect}: W_bar={dh.W_bar:.3f}, "
f"{rec_stat} p={rec_pval:.4f} {sig}")
# 3. Instantaneous causality
print("\n=== Instantaneous Causality ===\n")
corr, pvals = results.instantaneous_causality_matrix()
print("Correlations:")
print(corr)
print("\nP-values:")
print(pvals)
# 4. Network visualization
results.plot_causality_network(threshold=0.05, layout="circular")
Tutorials¶
| Tutorial | Description | Link |
|---|---|---|
| Granger Causality | Standard and DH tests with visualization |
See Also¶
- Panel VAR Estimation -- Model setup and estimation
- Impulse Response Functions -- Dynamic effects of shocks (Granger causality can inform ordering)
- FEVD -- Relative importance of shocks
- VECM -- Causality in cointegrated systems
- Forecasting -- Multi-step predictions
References¶
- Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 424-438.
- Dumitrescu, E. I., & Hurlin, C. (2012). Testing for Granger non-causality in heterogeneous panels. Economic Modelling, 29(4), 1450-1460.
- Holtz-Eakin, D., Newey, W., & Rosen, H. S. (1988). Estimating vector autoregressions with panel data. Econometrica, 56(6), 1371-1395.
- Lopez, L., & Weber, S. (2017). Testing for Granger causality in panel data. The Stata Journal, 17(4), 972-984.