Mundlak Test¶

Quick Reference

Class: panelbox.validation.specification.mundlak.MundlakTest Result: ValidationTestResult H₀: RE is consistent (entity effects uncorrelated with regressors, \(\gamma = 0\)) H₁: RE is inconsistent (use Fixed Effects, \(\gamma \neq 0\)) Stata equivalent: Manually add group means to RE regression R equivalent: plm::phtest(model, method = "aux")

What It Tests¶

The Mundlak test is an alternative to the Hausman test for choosing between Fixed Effects and Random Effects. Instead of comparing two sets of estimates, it augments the RE model with entity-level means of the time-varying regressors and tests whether those means are jointly significant.

The intuition is straightforward: if \(\alpha_i\) is correlated with \(X_{it}\), then the entity means \(\bar{X}_i\) will capture this correlation. If \(\bar{X}_i\) is significant in the augmented model, the RE assumption is violated.

The Mundlak Device¶

The standard RE model is:

\[y_{it} = X_{it}'\beta + \alpha_i + \varepsilon_{it}\]

The Mundlak augmented model adds entity means:

\[y_{it} = X_{it}'\beta + \bar{X}_i'\gamma + \alpha_i + \varepsilon_{it}\]

where \(\bar{X}_i = \frac{1}{T_i} \sum_{t=1}^{T_i} X_{it}\) is the time average of regressors for entity \(i\).

If \(\gamma = 0\): the group means add no information, confirming RE is appropriate.

If \(\gamma \neq 0\): the regressors are correlated with the individual effects, and FE should be used.

Quick Example¶

from panelbox.models.static.random_effects import RandomEffects
from panelbox.validation.specification.mundlak import MundlakTest
from panelbox.datasets import load_grunfeld

data = load_grunfeld()

# Estimate Random Effects model
re = RandomEffects("invest ~ value + capital", data, "firm", "year")
re_results = re.fit()

# Run Mundlak test
mundlak = MundlakTest(re_results)
result = mundlak.run(alpha=0.05)
print(result.summary())

# Programmatic access
print(f"Wald statistic: {result.statistic:.4f}")
print(f"P-value: {result.pvalue:.4f}")
print(f"Degrees of freedom: {result.df}")
print(f"Reject H0: {result.reject_null}")

Interpretation¶

P-value	Decision	Interpretation	Action
p < 0.01	Strong rejection	Strong evidence of correlated effects	Use Fixed Effects
0.01 \(\leq\) p < 0.05	Rejection	Moderate evidence of correlation	Use Fixed Effects
0.05 \(\leq\) p < 0.10	Borderline	Weak evidence	Report both; lean toward FE
p \(\geq\) 0.10	Fail to reject	No evidence of correlation	Use Random Effects

Examining Individual Group Means¶

The test metadata provides the coefficients on individual group-mean variables, revealing which regressors drive the correlation:

result = mundlak.run(alpha=0.05)

# Coefficients on group means
for var, coef in result.metadata["delta_coefficients"].items():
    se = result.metadata["standard_errors"][var]
    t_stat = coef / se if se > 0 else 0
    print(f"  {var}: coef={coef:.4f}, se={se:.4f}, t={t_stat:.2f}")

Mathematical Details¶

Wald Test Statistic¶

The test statistic is a Wald test for the joint significance of \(\gamma\):

\[W = \hat{\gamma}' \left[\widehat{\text{Var}}(\hat{\gamma})\right]^{-1} \hat{\gamma} \sim \chi^2(K)\]

where \(K\) is the number of time-varying regressors.

Equivalence to Hausman¶

Mundlak (1978) showed that the Correlated Random Effects (CRE) model:

\[y_{it} = X_{it}'\beta + \bar{X}_i'\gamma + \alpha_i^* + \varepsilon_{it}\]

yields \(\hat{\beta}\) identical to the FE estimator when \(\gamma \neq 0\). The test on \(\gamma\) is asymptotically equivalent to the Hausman test but is computed differently, offering practical advantages.

Implementation Details¶

PanelBox implements the Mundlak test using Pooled OLS with entity-clustered standard errors on the augmented model. This approach:

Adds entity-mean variables for all time-varying regressors
Estimates the augmented model with cluster-robust standard errors
Performs a Wald test on the joint significance of the mean variables

Implementation Note

PanelBox uses Pooled OLS with clustered SEs (rather than RE estimation on the augmented model) to avoid numerical issues with variables that are constant within entities. This produces results consistent with the auxiliary regression approach used in R's plm package.

Configuration Options¶

from panelbox.validation.specification.mundlak import MundlakTest

# Basic usage
mundlak = MundlakTest(re_results)
result = mundlak.run(alpha=0.05)

# Stricter significance level
result = mundlak.run(alpha=0.01)

Parameters¶

Parameter	Type	Default	Description
`results`	`PanelResults`	required	Results from Random Effects estimation

The .run() method accepts:

Parameter	Type	Default	Description
`alpha`	`float`	`0.05`	Significance level

Result Metadata¶

Key	Type	Description
`n_time_varying_vars`	`int`	Number of time-varying regressors
`delta_coefficients`	`dict`	Coefficients on group-mean variables
`standard_errors`	`dict`	Standard errors for group-mean coefficients
`F_statistic`	`float`	F-statistic (Wald / df)
`augmented_formula`	`str`	Formula used for augmented regression

Advantages Over Hausman¶

The Mundlak test offers several practical advantages:

Feature	Hausman	Mundlak
Models required	FE and RE	RE only
Positive test statistic	Not guaranteed	Always (Wald test)
Robust standard errors	Problematic	Fully compatible
Identifies which variable	No	Yes (individual \(\hat{\gamma}_k\))
Time-invariant regressors	Excluded from test	Handled naturally
Unbalanced panels	Potential issues	Straightforward

When to Prefer Mundlak

Use the Mundlak test when:

You want to identify which regressors are correlated with entity effects
The Hausman test produces a negative statistic
You need robust/clustered standard errors for the test
You only have RE results available

Comparing Hausman and Mundlak¶

from panelbox.models.static.fixed_effects import FixedEffects
from panelbox.models.static.random_effects import RandomEffects
from panelbox.validation.specification.hausman import HausmanTest
from panelbox.validation.specification.mundlak import MundlakTest
from panelbox.datasets import load_grunfeld

data = load_grunfeld()

# Estimate both models
fe = FixedEffects("invest ~ value + capital", data, "firm", "year")
fe_results = fe.fit()

re = RandomEffects("invest ~ value + capital", data, "firm", "year")
re_results = re.fit()

# Hausman test
hausman = HausmanTest(fe_results, re_results)
print(f"Hausman: chi2={hausman.statistic:.4f}, p={hausman.pvalue:.4f}")
print(f"  Recommendation: {hausman.recommendation}")

# Mundlak test
mundlak = MundlakTest(re_results)
mundlak_result = mundlak.run(alpha=0.05)
print(f"Mundlak: Wald={mundlak_result.statistic:.4f}, p={mundlak_result.pvalue:.4f}")
print(f"  Conclusion: {mundlak_result.conclusion}")

# Both tests should agree in most cases

Common Pitfalls¶

RE Model Required¶

The Mundlak test is only applicable to Random Effects models. Passing FE or Pooled OLS results will raise a ValueError:

# This will raise ValueError
mundlak = MundlakTest(fe_results)  # Error: only for RE models

Time-Invariant Regressors¶

If all regressors are time-invariant (no within-entity variation), the test cannot be computed because there are no group means to add. Ensure at least one regressor varies over time.

Small Samples¶

With few entities or short time series, the Wald test may have limited power. Consider:

Using a higher significance level (e.g., \(\alpha = 0.10\))
Reporting both FE and RE estimates regardless of test outcome
Supplementing with economic reasoning about likely endogeneity

References¶

Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica, 46(1), 69-85.
Chamberlain, G. (1982). Multivariate regression models for panel data. Journal of Econometrics, 18(1), 5-46.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2^nd ed.). MIT Press. Chapter 10.
Baltagi, B. H. (2021). Econometric Analysis of Panel Data (6^th ed.). Springer. Chapter 4.