Random Effects (GLS Estimator)¶
Quick Reference
Class: panelbox.models.static.random_effects.RandomEffects
Import: from panelbox import RandomEffects
Stata equivalent: xtreg y x1 x2, re
R equivalent: plm(y ~ x1 + x2, data, model = "random")
Overview¶
The Random Effects (RE) estimator uses Generalized Least Squares (GLS) to efficiently estimate panel models by exploiting the variance component structure. The model is:
where \(u_i \sim \text{i.i.d.}(0, \sigma^2_u)\) is the entity-specific random effect and \(\varepsilon_{it} \sim \text{i.i.d.}(0, \sigma^2_\varepsilon)\) is the idiosyncratic error. The GLS transformation applies a partial demeaning (quasi-demeaning):
where \(\theta = 1 - \sqrt{\sigma^2_\varepsilon / (\sigma^2_\varepsilon + T \sigma^2_u)}\) depends on the estimated variance components. When \(\theta = 0\), RE reduces to Pooled OLS; when \(\theta = 1\), it becomes equivalent to Fixed Effects.
The critical assumption is that entity effects \(u_i\) are uncorrelated with the regressors: \(E[u_i | X_{it}] = 0\). If this assumption holds, RE is more efficient than FE (smaller standard errors). If it fails, RE is biased and inconsistent -- use Fixed Effects instead. The Hausman test helps decide between the two.
Quick Example¶
from panelbox import RandomEffects
from panelbox.datasets import load_grunfeld
data = load_grunfeld()
model = RandomEffects("invest ~ value + capital", data, "firm", "year")
results = model.fit()
print(results.summary())
# Examine variance components
print(f"Entity variance (sigma2_u): {model.sigma2_u:.4f}")
print(f"Idiosyncratic variance (sigma2_e): {model.sigma2_e:.4f}")
print(f"Theta: {model.theta:.4f}")
When to Use¶
- Entity-specific effects are uncorrelated with regressors: \(E[u_i | X_{it}] = 0\)
- You need to estimate time-invariant variables (e.g., gender, country, industry)
- The sample is a random draw from a large population
- You want more efficient estimates (smaller standard errors) than Fixed Effects
- The Hausman test does not reject the RE specification
Key Assumptions
- Orthogonality: \(E[u_i | X_{it}] = 0\) -- random effects uncorrelated with regressors
- Strict exogeneity: \(E[\varepsilon_{it} | X_{i1}, \ldots, X_{iT}, u_i] = 0\)
- Homoskedastic random effects: \(\text{Var}(u_i) = \sigma^2_u\) (constant across entities)
- Independence: \(u_i\) and \(\varepsilon_{it}\) are independent
If \(E[u_i | X_{it}] \neq 0\), RE produces biased estimates. Always run the Hausman test to verify.
Detailed Guide¶
Data Preparation¶
Same as other static models -- data in long format with entity and time identifiers:
Estimation¶
from panelbox import RandomEffects
# Default: Swamy-Arora variance estimator
model = RandomEffects("invest ~ value + capital", data, "firm", "year")
results = model.fit()
# With different variance estimator
model_amemiya = RandomEffects(
"invest ~ value + capital", data, "firm", "year",
variance_estimator="amemiya"
)
results_amemiya = model_amemiya.fit()
# With robust standard errors
results_robust = model.fit(cov_type="robust")
# With clustered standard errors
results_clustered = model.fit(cov_type="clustered")
Variance Estimators¶
PanelBox supports four methods for estimating the variance components \(\sigma^2_u\) and \(\sigma^2_\varepsilon\):
| Estimator | Description |
|---|---|
"swamy-arora" |
Most commonly used (default). Based on within and between residuals. |
"walhus" |
Wallace-Hussain estimator |
"amemiya" |
Amemiya's alternative estimator |
"nerlove" |
Nerlove's estimator |
All produce consistent estimates; differences are typically small in practice.
Interpreting Results¶
Key output attributes specific to Random Effects:
| Attribute | Description |
|---|---|
model.sigma2_u |
Estimated variance of entity effects (\(\sigma^2_u\)) |
model.sigma2_e |
Estimated variance of idiosyncratic errors (\(\sigma^2_\varepsilon\)) |
model.theta |
GLS transformation parameter (\(\theta\)) |
results.rsquared_within |
Within R-squared |
results.rsquared_between |
Between R-squared |
results.rsquared_overall |
Overall R-squared (primary for RE) |
results.params |
Estimated coefficients (includes intercept) |
Variance components interpretation:
# Proportion of variance due to entity effects
rho = model.sigma2_u / (model.sigma2_u + model.sigma2_e)
print(f"Proportion due to entity effects (rho): {rho:.2%}")
# Theta close to 1 -> RE behaves like FE
# Theta close to 0 -> RE behaves like Pooled OLS
print(f"Theta: {model.theta:.4f}")
Key differences from FE output:
- RE estimates include an intercept (FE absorbs it into entity effects)
- RE can estimate coefficients on time-invariant variables
- RE reports overall R-squared as its primary goodness-of-fit measure
Configuration Options¶
Constructor:
| Parameter | Type | Default | Description |
|---|---|---|---|
formula |
str | required | R-style formula (e.g., "y ~ x1 + x2") |
data |
DataFrame | required | Panel data in long format |
entity_col |
str | required | Entity identifier column name |
time_col |
str | required | Time identifier column name |
variance_estimator |
str | "swamy-arora" |
Method for variance components |
weights |
np.ndarray | None |
Observation weights for WLS |
fit() method:
| Parameter | Type | Default | Description |
|---|---|---|---|
cov_type |
str | "nonrobust" |
Standard error type (see table below) |
max_lags |
int | auto | Maximum lags for Driscoll-Kraay / Newey-West |
kernel |
str | "bartlett" |
Kernel for HAC estimators |
Standard Errors¶
cov_type |
Method | When to Use |
|---|---|---|
"nonrobust" |
Classical GLS | Correctly specified model, no heteroskedasticity |
"robust" / "hc1" |
White HC1 | Heteroskedasticity |
"hc0", "hc2", "hc3" |
HC variants | Heteroskedasticity with varying corrections |
"clustered" |
Cluster-robust | Within-entity correlation |
"twoway" |
Two-way clustered | Correlation within entities and time periods |
"driscoll_kraay" |
Driscoll-Kraay | Cross-sectional dependence + serial correlation |
"newey_west" |
Newey-West HAC | Serial correlation |
Diagnostics¶
After estimating Random Effects, verify the specification:
from panelbox import FixedEffects, RandomEffects
from panelbox.validation import HausmanTest, MundlakTest
# Estimate both models
fe_results = FixedEffects("invest ~ value + capital", data, "firm", "year").fit()
re_results = RandomEffects("invest ~ value + capital", data, "firm", "year").fit()
# 1. Hausman test: FE vs RE
hausman = HausmanTest(fe_results, re_results)
print(f"Hausman statistic: {hausman.statistic:.4f}")
print(f"P-value: {hausman.pvalue:.4f}")
print(f"Recommendation: {hausman.recommendation}")
# 2. Mundlak test: add entity means to RE
mundlak = MundlakTest(re_results)
result = mundlak.run()
print(result.summary())
See the FE vs RE Decision Guide for a comprehensive workflow.
Tutorials¶
| Tutorial | Level | Colab |
|---|---|---|
| Random Effects and Hausman Test | Beginner | |
| Comparison of All Estimators | Advanced |
See Also¶
- Fixed Effects -- Consistent alternative when effects are correlated with regressors
- FE vs RE Decision Guide -- Hausman test and decision workflow
- Pooled OLS -- Special case when \(\theta = 0\) (no entity effects)
- Between Estimator -- Uses only between-entity variation
References¶
- Baltagi, B. H. (2021). Econometric Analysis of Panel Data (6th ed.). Springer. Chapter 2.
- Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press. Chapter 10.
- Swamy, P. A. V. B., & Arora, S. S. (1972). "The Exact Finite Sample Properties of the Estimators of Coefficients in the Error Components Regression Models." Econometrica, 40(2), 261--275.
- Hausman, J. A. (1978). "Specification Tests in Econometrics." Econometrica, 46(6), 1251--1271.