Fixed Effects (Within Estimator)¶

Quick Reference

Class: panelbox.models.static.fixed_effects.FixedEffects Import: from panelbox import FixedEffects Stata equivalent: xtreg y x1 x2, fe R equivalent: plm(y ~ x1 + x2, data, model = "within")

Overview¶

The Fixed Effects (FE) estimator, also known as the Within estimator, is the workhorse model of panel data econometrics. It controls for all time-invariant unobserved heterogeneity by demeaning each variable within each entity. The model is:

\[y_{it} = \alpha_i + \gamma_t + X_{it} \beta + \varepsilon_{it}\]

where \(\alpha_i\) are entity fixed effects and \(\gamma_t\) are (optional) time fixed effects. The within transformation removes entity effects by subtracting entity means:

\[(y_{it} - \bar{y}_i) = (X_{it} - \bar{X}_i) \beta + (\varepsilon_{it} - \bar{\varepsilon}_i)\]

This is equivalent to including a dummy variable for each entity, but computationally much more efficient. The key advantage is that \(\alpha_i\) can be freely correlated with the regressors \(X_{it}\), making FE consistent under weaker assumptions than Random Effects.

The trade-off is that Fixed Effects cannot estimate coefficients on time-invariant variables (e.g., gender, country, industry) because they are perfectly absorbed by the entity effects. If you need time-invariant coefficients, consider Random Effects instead.

Quick Example¶

from panelbox import FixedEffects
from panelbox.datasets import load_grunfeld

data = load_grunfeld()
model = FixedEffects("invest ~ value + capital", data, "firm", "year")
results = model.fit(cov_type="clustered")
print(results.summary())

When to Use¶

Unobserved entity-specific heterogeneity exists and is correlated with regressors
You want to control for all time-invariant confounders without specifying them
Your entities are not a random sample from a larger population (e.g., specific firms, specific countries)
You do not need to estimate coefficients on time-invariant variables
You have at least T = 2 observations per entity

Key Assumptions

Strict exogeneity: \(E[\varepsilon_{it} | X_{i1}, \ldots, X_{iT}, \alpha_i] = 0\) for all \(t\)
No perfect multicollinearity among time-varying regressors
Sufficient within-entity variation: At least T = 2 per entity
Time-invariant variables are not of interest (they will be dropped)

Strict exogeneity fails when the model includes lagged dependent variables (\(y_{i,t-1}\)), leading to Nickell bias. Use GMM estimators in that case.

Detailed Guide¶

Data Preparation¶

PanelBox requires data in long format with entity and time identifiers:

from panelbox.datasets import load_grunfeld

data = load_grunfeld()

# Verify panel structure
print(f"Entities: {data['firm'].nunique()}")
print(f"Time periods: {data['year'].nunique()}")
print(f"Total observations: {len(data)}")

Time-Invariant Variables

If your formula includes variables that do not change over time within an entity, they will be automatically absorbed by the fixed effects and dropped from the estimation. This is by design, not an error.

Estimation¶

One-Way Fixed Effects (Entity Only)¶

from panelbox import FixedEffects

# Entity fixed effects (default)
model = FixedEffects(
    "invest ~ value + capital",
    data, "firm", "year",
    entity_effects=True,   # default
    time_effects=False     # default
)
results = model.fit(cov_type="clustered")

Two-Way Fixed Effects (Entity + Time)¶

# Two-way fixed effects
model_twoway = FixedEffects(
    "invest ~ value + capital",
    data, "firm", "year",
    entity_effects=True,
    time_effects=True
)
results_twoway = model_twoway.fit(cov_type="twoway")

Use two-way FE when there are common time shocks (recessions, policy changes) affecting all entities simultaneously.

Time Fixed Effects Only¶

# Time fixed effects only (removes common time trends)
model_time = FixedEffects(
    "invest ~ value + capital",
    data, "firm", "year",
    entity_effects=False,
    time_effects=True
)
results_time = model_time.fit()

Interpreting Results¶

print(results.summary())

Key output attributes specific to Fixed Effects:

Attribute	Description
`results.rsquared_within`	R-squared from the within (demeaned) model
`results.rsquared_between`	R-squared for entity means
`results.rsquared_overall`	Overall R-squared including fixed effects
`results.f_statistic`	F-test statistic: FE vs Pooled OLS
`results.f_pvalue`	p-value for the F-test
`model.entity_fe`	Estimated entity fixed effects (pd.Series)
`model.time_fe`	Estimated time fixed effects (pd.Series, if applicable)

R-squared interpretation:

Within R-squared is the primary goodness-of-fit measure for FE. It measures how well the model explains variation within each entity over time.
Between R-squared measures how well entity means of fitted values match entity means of the dependent variable.
Overall R-squared combines both sources of variation.

print(f"Within R-squared:  {results.rsquared_within:.4f}")
print(f"Between R-squared: {results.rsquared_between:.4f}")
print(f"Overall R-squared: {results.rsquared_overall:.4f}")

Accessing estimated fixed effects:

# Entity fixed effects
print(model.entity_fe)
# firm
# 1    -70.297
# 2    101.906
# ...

# Time fixed effects (if time_effects=True)
if model.time_fe is not None:
    print(model.time_fe)

F-test for entity effects (FE vs Pooled OLS):

The F-test evaluates whether entity fixed effects are jointly significant. A significant F-test (p < 0.05) means Pooled OLS is inadequate and FE is needed.

\[F = \frac{(SSR_{Pooled} - SSR_{FE}) / (N - 1)}{SSR_{FE} / (NT - N - K)}\]

print(f"F-statistic: {results.f_statistic:.4f}")
print(f"F-test p-value: {results.f_pvalue:.4f}")
# p < 0.05 -> reject Pooled OLS in favor of FE

Configuration Options¶

Constructor:

Parameter	Type	Default	Description
`formula`	str	required	R-style formula (e.g., `"y ~ x1 + x2"`)
`data`	DataFrame	required	Panel data in long format
`entity_col`	str	required	Entity identifier column name
`time_col`	str	required	Time identifier column name
`entity_effects`	bool	`True`	Include entity fixed effects
`time_effects`	bool	`False`	Include time fixed effects
`weights`	np.ndarray	`None`	Observation weights for WLS

Note

At least one of entity_effects or time_effects must be True. If you want no fixed effects, use Pooled OLS instead.

fit() method:

Parameter	Type	Default	Description
`cov_type`	str	`"nonrobust"`	Standard error type (see table below)
`max_lags`	int	auto	Maximum lags for Driscoll-Kraay / Newey-West
`kernel`	str	`"bartlett"`	Kernel for HAC estimators

Standard Errors¶

`cov_type`	Method	When to Use
`"nonrobust"`	Classical	Homoskedastic errors, no serial correlation
`"robust"` / `"hc1"`	White HC1	Heteroskedasticity
`"hc0"`, `"hc2"`, `"hc3"`	HC variants	Heteroskedasticity with varying corrections
`"clustered"`	Cluster-robust	Within-entity correlation (recommended)
`"twoway"`	Two-way clustered	Correlation within entities and time periods
`"driscoll_kraay"`	Driscoll-Kraay	Cross-sectional dependence + serial correlation
`"newey_west"`	Newey-West HAC	Serial correlation
`"pcse"`	Panel-corrected	Cross-sectional dependence, requires T > N

Recommendation

Always use cov_type="clustered" for entity FE models. After demeaning, residuals within the same entity are still correlated, and classical standard errors will be too small. For two-way FE, use cov_type="twoway".

Diagnostics¶

After estimating Fixed Effects, consider the following tests:

from panelbox import FixedEffects, RandomEffects

# 1. F-test: FE vs Pooled OLS (automatic in FE results)
fe = FixedEffects("invest ~ value + capital", data, "firm", "year")
fe_results = fe.fit(cov_type="clustered")
print(f"F-test p-value: {fe_results.f_pvalue:.4f}")

# 2. Hausman test: FE vs RE
re = RandomEffects("invest ~ value + capital", data, "firm", "year")
re_results = re.fit()

from panelbox.validation import HausmanTest
hausman = HausmanTest(fe_results, re_results)
print(f"Hausman p-value: {hausman.pvalue:.4f}")
print(f"Recommendation: {hausman.recommendation}")

# 3. Serial correlation test (Wooldridge)
from panelbox.validation import WooldridgeTest
wooldridge = WooldridgeTest(fe_results)
result = wooldridge.run()
print(result.summary())

See the FE vs RE Decision Guide for a comprehensive workflow on choosing between Fixed and Random Effects.

Tutorials¶

Tutorial	Level	Colab
Fixed Effects	Beginner
Random Effects and Hausman Test	Beginner
Comparison of All Estimators	Advanced

References¶

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2^nd ed.). MIT Press. Chapter 10.
Baltagi, B. H. (2021). Econometric Analysis of Panel Data (6^th ed.). Springer. Chapter 2.
Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications. Cambridge University Press. Chapter 21.
Nickell, S. (1981). "Biases in Dynamic Models with Fixed Effects." Econometrica, 49(6), 1417--1426.