Instrumental Variables¶
Instrumental Variables (IV) estimation addresses the fundamental problem of endogeneity -- when a regressor is correlated with the error term due to omitted variables, measurement error, or simultaneity. Even fixed effects cannot solve endogeneity caused by time-varying confounders. IV/2SLS uses external instruments that are correlated with the endogenous regressor but uncorrelated with the error term to produce consistent estimates.
PanelBox provides a Panel IV estimator with first-stage diagnostics, instrument validity tests, and flexible standard error options.
Available Models¶
| Model | Class | Method | Key Feature |
|---|---|---|---|
| Panel IV | PanelIV |
2SLS | Two-Stage Least Squares with panel structure |
Quick Example¶
from panelbox.models.iv import PanelIV
from panelbox.datasets import load_grunfeld
data = load_grunfeld()
model = PanelIV(
"invest ~ value + capital",
data, "firm", "year",
endogenous=["value"],
instruments=["L.value", "L2.value"]
)
results = model.fit(cov_type="clustered")
print(results.summary())
Key Concepts¶
The Endogeneity Problem¶
A regressor \(X\) is endogenous when \(\text{Cov}(X_{it}, \epsilon_{it}) \neq 0\). Sources include:
| Source | Example | Fix |
|---|---|---|
| Omitted variables | Ability affects both education and wages | IV with external instrument |
| Simultaneity | Price and quantity determined jointly | Supply/demand shifters |
| Measurement error | True \(X^*\) measured with noise | IV with alternative measure |
| Reverse causality | Does \(X \to Y\) or \(Y \to X\)? | IV with predetermined variable |
Instrument Requirements¶
A valid instrument \(Z\) must satisfy two conditions:
- Relevance: \(\text{Cov}(Z_{it}, X_{it}) \neq 0\) -- the instrument must be correlated with the endogenous regressor
- Exogeneity: \(\text{Cov}(Z_{it}, \epsilon_{it}) = 0\) -- the instrument must be uncorrelated with the error
Weak instruments
If the instrument is only weakly correlated with the endogenous regressor (relevance is marginal), IV estimates are biased toward OLS and have very large standard errors. Check the first-stage F-statistic: a value above 10 is the conventional rule of thumb (Stock & Yogo, 2005).
Two-Stage Least Squares (2SLS)¶
The 2SLS procedure:
- First stage: Regress endogenous \(X\) on instruments \(Z\) and exogenous controls. Obtain predicted values \(\hat{X}\).
- Second stage: Regress outcome \(Y\) on \(\hat{X}\) and exogenous controls. Standard errors are adjusted for the two-stage procedure.
First-Stage Diagnostics¶
results = model.fit(cov_type="clustered")
# First-stage F-statistic (instrument strength)
print(f"First-stage F: {results.first_stage_f:.2f}")
# Partial R-squared of excluded instruments
print(f"Partial R²: {results.partial_r_squared:.4f}")
| Diagnostic | Threshold | Interpretation |
|---|---|---|
| First-stage F | > 10 | Instruments are sufficiently strong |
| Partial R-squared | > 0.10 | Instruments explain meaningful variation |
Overidentification Test¶
When you have more instruments than endogenous variables, test whether all instruments are valid:
# Sargan/Hansen J test (if overidentified)
if results.overid_test is not None:
print(f"Sargan test: p = {results.overid_test.pvalue:.4f}")
# p > 0.05: instruments are valid (do not reject)
Standard Error Options¶
Panel IV supports the same cov_type options as static models:
results = model.fit(cov_type="clustered") # Recommended
results = model.fit(cov_type="robust") # HC robust
results = model.fit(cov_type="driscoll_kraay") # Cross-sectional dependence
When to Use IV vs. GMM¶
| Feature | Panel IV (2SLS) | GMM (Arellano-Bond) |
|---|---|---|
| Endogenous variables | External instruments available | Uses internal (lagged) instruments |
| Dynamic model | No lagged dependent variable | Designed for lagged \(y\) |
| Instrument source | Researcher-specified | Automatically generated from lags |
| When to use | Known external instruments | Dynamic panels, no external instruments |
Rule of thumb
Use Panel IV when you have strong external instruments. Use GMM when the endogeneity comes from including a lagged dependent variable and you lack external instruments.
Detailed Guides¶
- Panel 2SLS -- Two-Stage Least Squares estimation (detailed guide coming soon)
- Instrument Selection -- Choosing and testing instruments (detailed guide coming soon)
Tutorials¶
See Static Models Tutorial for IV examples alongside OLS, FE, and RE models.
API Reference¶
See IV API for complete technical reference.
References¶
- Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics. Princeton University Press.
- Stock, J. H., & Yogo, M. (2005). Testing for weak instruments in linear IV regression. In D. W. K. Andrews & J. H. Stock (Eds.), Identification and Inference for Econometric Models (pp. 80-108). Cambridge University Press.
- Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press.