Skip to content

Quick Start

This guide takes you from zero to a working panel data model with diagnostics and an HTML report -- all in under 5 minutes.

What You'll Learn

  • Estimate a Fixed Effects model with 3 lines of code
  • Run the Hausman test to validate your model choice
  • Generate an interactive HTML report using the Experiment pattern

Step 1: Load Data

PanelBox ships with classic econometric datasets. We'll use the Grunfeld dataset: investment data for 10 US firms over 20 years (1935--1954).

from panelbox.datasets import load_grunfeld

data = load_grunfeld()
print(f"Shape: {data.shape}")
print(f"Firms: {data['firm'].nunique()}, Years: {data['year'].nunique()}")
print(data.head())
Shape: (200, 5)
Firms: 10, Years: 20
   firm  year   invest     value   capital
0     1  1935   317.60   3078.50     2.80
1     1  1936   391.80   4661.70    52.60
2     1  1937   410.60   5387.10   156.90
3     1  1938   257.70   2792.20   209.20
4     1  1939   330.80   4313.20   203.40
Variable Description
firm Firm identifier (1--10)
year Year (1935--1954)
invest Gross investment
value Market value of the firm
capital Stock of plant and equipment

Step 2: Estimate a Model

Fit a Fixed Effects model with clustered standard errors in three lines:

from panelbox import FixedEffects

model = FixedEffects("invest ~ value + capital", data, "firm", "year")
results = model.fit(cov_type="clustered")
print(results.summary())
================================================================================
                      Fixed Effects Estimation Results
================================================================================
Dependent Variable:              invest        No. Observations:             200
Model:                    Fixed Effects        No. Entities:                  10
Method:                 Within (LSDV)          No. Time Periods:              20
Cov. Type:                  clustered          R-squared (within):         0.767
================================================================================
                    coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
value             0.1101      0.012      9.286      0.000       0.087       0.134
capital           0.3101      0.053      5.837      0.000       0.205       0.415
================================================================================

Formula syntax

PanelBox uses R-style formulas: "y ~ x1 + x2". The four positional arguments are always: formula, data, entity column, time column.

Step 3: Interpret Results

The key outputs to check:

Output Value Meaning
R-squared (within) 0.767 Model explains 76.7% of within-firm variation
value 0.1101 (p < 0.001) A unit increase in firm value raises investment by 0.11
capital 0.3101 (p < 0.001) A unit increase in capital stock raises investment by 0.31
Cov. Type clustered Standard errors account for within-firm correlation

Access results programmatically:

# Coefficients as a pandas Series
print(results.params)

# Standard errors
print(results.std_errors)

# R-squared
print(f"R-squared (within): {results.rsquared_within:.4f}")

Step 4: Run Diagnostics

Use the Hausman test to verify that Fixed Effects is preferred over Random Effects:

from panelbox import RandomEffects
from panelbox.validation import HausmanTest

re_model = RandomEffects("invest ~ value + capital", data, "firm", "year")
re_results = re_model.fit()

hausman = HausmanTest(results, re_results)
print(hausman)
Hausman Test
H0: Random Effects is consistent and efficient
statistic: 14.82, p-value: 0.0006
Decision: Reject H0 → Use Fixed Effects

Interpreting the Hausman test

p < 0.05: Reject the null -- use Fixed Effects (entity effects are correlated with regressors). p >= 0.05: Fail to reject -- Random Effects is more efficient.

Step 5: Generate a Report

The PanelExperiment pattern automates model comparison, validation, and reporting:

from panelbox.experiment import PanelExperiment

# Create experiment
exp = PanelExperiment(data, "invest ~ value + capital", "firm", "year")

# Fit multiple models at once
exp.fit_all_models(["pooled", "fe", "re"])

# Run validation on the preferred model
validation = exp.validate_model("fe")

# Compare all models side by side
comparison = exp.compare_models(["pooled", "fe", "re"])

# Generate an interactive HTML report
exp.save_master_report("grunfeld_analysis.html")

This produces a self-contained HTML file with:

  • Summary tables for each model
  • Side-by-side coefficient comparison
  • Diagnostic test results
  • Interactive Plotly charts

Next Steps

You now have a working panel data analysis pipeline. Here's where to go next:

  • Core Concepts

    Learn about panel data structure, formulas, and the PanelBox workflow

  • Choosing a Model

    Decision guide covering all 13 model families

  • Static Models

    Deep dive into Pooled OLS, Fixed Effects, and Random Effects

  • Dynamic GMM

    Handle dynamics and endogeneity with Arellano-Bond and Blundell-Bond