AutoExperiment Quickstart¶

This tutorial takes you from zero to a fully validated panel data model using AutoExperiment -- PanelBox's automated model selection pipeline.

What You'll Learn¶

Run AutoExperiment with a single run() call
Interpret the results summary and ranking
Generate an HTML report
Customize transformations and sign constraints

Step 1: Load Data and Run¶

AutoExperiment needs panel data, a dependent variable, and entity/time identifiers. Everything else has sensible defaults.

from panelbox.datasets import load_grunfeld
from panelbox.autoexperiment import AutoExperiment

data = load_grunfeld()

auto = AutoExperiment(
    data=data,
    depvar="invest",
    entity_col="firm",
    time_col="year",
)
results = auto.run()

That's it -- AutoExperiment will:

Generate variable transformations (lag-1, first difference by default)
Run forward stepwise selection by BIC for each model type
Estimate Pooled OLS, Fixed Effects, Random Effects, and First Difference
Validate each model with diagnostic tests
Run the Hausman test (FE vs RE)
Rank all models and select the best one

Step 2: View Results¶

print(results.summary())

==================================================
AutoExperiment Results
==================================================
Best model: Fixed Effects
Formula:    invest ~ value + capital
BIC:        1502.3
R-squared:  0.767
Cov type:   robust
Status:     VALID (all tests passed)
--------------------------------------------------
Models tested:     4
Combinations:      24
Variables tested:  6
Transformations:   2 types

Step 3: Compare Top Models¶

print(results.compare_top(3))

  model_type                  formula      bic      aic    rsq classification  cov_type   score
0         fe   invest ~ value + capital  1502.3  1495.1  0.767          VALID    robust  1488.3
1         re   invest ~ value + capital  1510.8  1503.6  0.761        WARNING  clustered  1548.8
2  pooled_ols  invest ~ value + capital  1580.2  1573.0  0.812          VALID  nonrobust  1566.2

Step 4: Access the Best Model¶

# Print the best model's full regression output
print(results.best_model.summary())

# Key attributes
print(f"Estimator: {results.best_estimator}")
print(f"Formula: {results.best_formula}")
print(f"Cov type: {results.best_cov_type}")

Step 5: Generate an HTML Report¶

results.report("autoexperiment_report.html")

This creates a comprehensive report with the ranking table, variable selection details, and diagnostic test results.

Going Further: Custom Configuration¶

Add Transformations¶

Generate lags, logs, growth rates, and more:

auto = AutoExperiment(
    data=data,
    depvar="invest",
    entity_col="firm",
    time_col="year",
    transformations={
        "lag": [1, 2],     # L1_var, L2_var
        "diff": True,      # D_var
        "log": True,       # log_var
        "growth": True,    # growth_var
    },
)
results = auto.run()

Add Sign Constraints¶

Encode economic theory to prevent spurious results:

auto = AutoExperiment(
    data=data,
    depvar="invest",
    entity_col="firm",
    time_col="year",
    sign_constraints={
        "value": "+",      # Market value should increase investment
        "capital": "+",    # Capital stock should increase investment
    },
)
results = auto.run()

Use AIC Instead of BIC¶

auto = AutoExperiment(
    data=data,
    depvar="invest",
    entity_col="firm",
    time_col="year",
    criterion="aic",
)
results = auto.run()

Visualize Results¶

# BIC comparison across models
fig = results.plot_bic_comparison()

# Variable selection frequency
fig = results.plot_variable_importance()

Data Mining

AutoExperiment tests many combinations. Always validate on holdout data and use sign constraints to anchor results in economic theory. Check results.datamining_warning to see if the threshold was exceeded.

Next Steps¶

API Reference

Full parameter documentation for all classes
User Guide

Limitations, interpreting results, and best practices
Experiment Pattern

Manual model comparison for fine-grained control