Time-Series Cross-Validation¶

Quick Reference

Class: panelbox.validation.robustness.TimeSeriesCV Import: from panelbox.validation.robustness import TimeSeriesCV Key method: cv.cross_validate() returns CVResults Stata equivalent: Rolling estimation (custom) R equivalent: caret::trainControl(method="timeslice")

Why Cross-Validation for Panels?¶

In-sample goodness-of-fit (\(R^2\), AIC, BIC) measures how well the model describes the data it was estimated on. But the real test of a model is whether it can predict data it has not seen. Time-series cross-validation evaluates out-of-sample predictive performance while respecting the temporal ordering of panel data -- no future data ever leaks into the training set.

Two CV Methods¶

PanelBox implements two temporal CV strategies:

Expanding Window¶

Train on periods \([1, t]\), predict period \(t+1\), then expand to \([1, t+1]\), predict \(t+2\), and so on. The training set grows with each fold.

Fold 1: Train [1,2,3]         → Predict [4]
Fold 2: Train [1,2,3,4]       → Predict [5]
Fold 3: Train [1,2,3,4,5]     → Predict [6]
...

Rolling Window¶

Train on a fixed-size window \([t-w, t]\), predict \(t+1\), then slide the window. The training set size remains constant.

Fold 1: Train [1,2,3]   → Predict [4]
Fold 2: Train [2,3,4]   → Predict [5]
Fold 3: Train [3,4,5]   → Predict [6]
...

When to Use Which

Expanding window: When you believe more data always helps (stable relationships). This is the default.
Rolling window: When you suspect structural change or time-varying parameters, so recent data is more relevant than distant past.

Quick Example¶

from panelbox import FixedEffects
from panelbox.validation.robustness import TimeSeriesCV
from panelbox.datasets import load_grunfeld

data = load_grunfeld()
model = FixedEffects("invest ~ value + capital", data, "firm", "year")
results = model.fit()

# Expanding window cross-validation
cv = TimeSeriesCV(results, method="expanding", min_train_periods=3, verbose=True)
cv_results = cv.cross_validate()

# Overall metrics
print(f"Out-of-sample R²: {cv_results.metrics['r2_oos']:.4f}")
print(f"RMSE:             {cv_results.metrics['rmse']:.4f}")
print(f"MAE:              {cv_results.metrics['mae']:.4f}")

# Per-fold breakdown
print(cv_results.fold_metrics)

# Full summary
print(cv.summary())

API Reference¶

Constructor¶

TimeSeriesCV(
    results=results,           # PanelResults from model.fit()
    method="expanding",        # 'expanding' or 'rolling'
    window_size=None,          # Required for 'rolling' method
    min_train_periods=3,       # Minimum training periods (>= 2)
    verbose=True,              # Print progress
)

window_size is Required for Rolling

When using method="rolling", you must specify window_size. A good starting point is window_size = T // 2 where T is the number of time periods.

Methods¶

Method	Returns	Description
`cross_validate()`	`CVResults`	Run CV and return results
`plot_predictions(entity)`	--	Actual vs predicted plot (all entities or specific one)
`summary()`	`str`	Formatted summary string

CVResults Attributes¶

Attribute	Type	Description
`predictions`	`pd.DataFrame`	Columns: `actual`, `predicted`, `fold`, `test_period`, `entity`, `time`
`metrics`	`dict`	Overall metrics: `mse`, `rmse`, `mae`, `r2_oos`
`fold_metrics`	`pd.DataFrame`	Per-fold metrics with fold number and test period
`method`	`str`	CV method used (`'expanding'` or `'rolling'`)
`n_folds`	`int`	Number of CV folds
`window_size`	`int` or `None`	Window size (for rolling CV)

Evaluation Metrics¶

Metric	Formula	Interpretation
MSE	\(\frac{1}{n}\sum(y_i - \hat{y}_i)^2\)	Mean squared prediction error
RMSE	\(\sqrt{MSE}\)	In the units of the dependent variable
MAE	\(\frac{1}{n}\sum\\|y_i - \hat{y}_i\\|\)	Robust to outliers (unlike MSE)
\(R^2_{OOS}\)	\(1 - \frac{SS_{res}}{SS_{tot}}\)	Out-of-sample explained variance

Negative \(R^2_{OOS}\)

Unlike in-sample \(R^2\), the out-of-sample \(R^2\) can be negative. A negative value means the model predicts worse than simply using the sample mean as the forecast. This signals overfitting or a misspecified model.

Rolling Window Example¶

# Rolling window with 5-period training window
cv_roll = TimeSeriesCV(
    results,
    method="rolling",
    window_size=5,
    min_train_periods=3,
    verbose=True,
)
cv_results_roll = cv_roll.cross_validate()

print(f"Rolling R² (OOS): {cv_results_roll.metrics['r2_oos']:.4f}")
print(f"Number of folds:  {cv_results_roll.n_folds}")

Visualization¶

# Plot actual vs predicted for a specific entity
cv.plot_predictions(entity="General Motors")

# Plot for all entities (scatter + time series)
cv.plot_predictions()

The plot_predictions() method produces two panels:

Scatter plot: Actual vs predicted values with a 45-degree reference line
Time series: Mean actual and predicted values over time periods

Panel-Specific Considerations¶

Temporal Integrity

PanelBox cross-validation always respects temporal ordering:

Training data uses only past periods (no future leakage)
All entities are included in each fold (cross-sectional dimension is preserved)
Models are fully re-estimated for each fold (no parameter recycling)

This is more conservative than random k-fold CV, which would violate the time-series structure.

Comparing Expanding vs Rolling¶

# Run both methods
cv_exp = TimeSeriesCV(results, method="expanding", min_train_periods=3)
cv_roll = TimeSeriesCV(results, method="rolling", window_size=5, min_train_periods=3)

exp_results = cv_exp.cross_validate()
roll_results = cv_roll.cross_validate()

print(f"Expanding R² (OOS): {exp_results.metrics['r2_oos']:.4f}")
print(f"Rolling R² (OOS):   {roll_results.metrics['r2_oos']:.4f}")

# If rolling >> expanding: structural change may be present
# If expanding >> rolling: stable relationships; more data helps

Common Pitfalls¶

Watch Out

Too few training periods: Setting min_train_periods=2 may produce unreliable models. Use at least 3 periods.
Ignoring fold variation: Stable overall metrics can mask poor performance in specific periods. Always check fold_metrics.
Fixed effects with new entities: If entities appear/disappear over time, some folds may fail because the training set lacks data for test-set entities.
Computational cost: Each fold requires a full model re-estimation. For large panels, this can be slow.

References¶

Bergmeir, C., & Benitez, J. M. (2012). On the use of cross-validation for time series predictor evaluation. Information Sciences, 191, 192-213.
Tashman, L. J. (2000). Out-of-sample tests of forecasting accuracy: An analysis and review. International Journal of Forecasting, 16(4), 437-450.
Racine, J. (2000). Consistent cross-validatory model-selection for dependent data: hv-block cross-validation. Journal of Econometrics, 99(1), 39-61.