Troubleshooting Guide¶
Step-by-step solutions for common errors and problems when using PanelBox.
Looking for conceptual answers?
- General questions: General FAQ
- Advanced methods: Advanced FAQ
- Spatial models: Spatial FAQ
Installation Issues¶
ModuleNotFoundError: No module named 'panelbox'
PanelBox is not installed or not in the active Python environment.
Solution:
If using conda:
Verify installation:
Optional dependency errors (plotly, kaleido, scipy)
Some PanelBox features require optional dependencies. If you see errors like ModuleNotFoundError: No module named 'plotly':
Solution — install all optional dependencies:
Or install specific extras:
Version conflicts with numpy/scipy/pandas
If you see errors like ImportError: numpy.core.multiarray failed to import:
Solution:
# Upgrade all dependencies together
pip install --upgrade panelbox numpy scipy pandas
# Or pin compatible versions
pip install "numpy>=1.24,<2.0" "scipy>=1.10" "pandas>=2.0"
Check your current versions:
ImportError: cannot import name 'PanelVAR'
Most PanelBox classes must be imported from their submodules, not from the top-level package.
# Correct
from panelbox.var import PanelVAR
from panelbox.gmm import DifferenceGMM, SystemGMM
from panelbox.models.spatial import SpatialLag
# Incorrect
from panelbox import PanelVAR # Will fail
See the API Reference for correct import paths.
Data Issues¶
ValueError: data must have MultiIndex or similar index errors
PanelBox models expect a DataFrame with entity and time columns specified as parameters.
Solution — provide entity and time columns:
from panelbox import FixedEffects
# Pass entity_col and time_col explicitly
model = FixedEffects(
formula="y ~ x1 + x2",
data=data,
entity_col="firm_id",
time_col="year"
)
If your data uses a MultiIndex, reset it:
Unbalanced panel warnings
Most PanelBox estimators handle unbalanced panels automatically. The warning is informational — it tells you that entities have different numbers of time periods.
To check balance:
obs_per_entity = data.groupby("entity_id").size()
print(f"Min periods: {obs_per_entity.min()}")
print(f"Max periods: {obs_per_entity.max()}")
print(f"Balanced: {(obs_per_entity == obs_per_entity.iloc[0]).all()}")
To force a balanced panel (if needed):
Missing values — how are they handled?
Behavior varies by model:
- Static models (FE, RE, Pooled OLS): drop observations with missing values in formula variables
- GMM: drops observations with missing in dependent or independent variables; lagged instruments handle their own missingness
- MLE models (logit, probit, Heckman): require complete cases
Best practice: Check missingness before estimation:
My panel has gaps (non-consecutive time periods)
Gaps in time periods can affect models that use lags (GMM, VAR, dynamic models).
Check for gaps:
def check_gaps(group):
times = sorted(group["year"].unique())
expected = list(range(min(times), max(times) + 1))
return set(expected) - set(times)
gaps = data.groupby("entity_id").apply(check_gaps)
entities_with_gaps = gaps[gaps.apply(len) > 0]
print(f"Entities with gaps: {len(entities_with_gaps)}")
Solutions:
- Remove entities with gaps:
data = data[~data["entity_id"].isin(entities_with_gaps.index)] - For VAR/VECM: ensure continuous time periods or use
allow_unbalanced=True - For GMM: gaps in instruments are handled automatically, but check results carefully
ValueError: could not convert string to float
Non-numeric data in variable columns.
Solution:
Estimation Errors¶
LinAlgError: Singular matrix
Causes:
- Perfect collinearity — two or more variables are linearly dependent
- Too many dummy variables — time dummies exceed available degrees of freedom
- Too many instruments in GMM — instrument matrix is rank-deficient
Solutions:
# 1. Check for collinearity
corr = data[["x1", "x2", "x3"]].corr()
print(corr)
# Remove variables with |correlation| > 0.95
# 2. For GMM: reduce instruments
from panelbox.gmm import SystemGMM
model = SystemGMM(
...,
collapse=True, # Reduce instrument count
max_lags=2 # Limit lag depth
)
# 3. Check for constant variables
for col in ["x1", "x2", "x3"]:
if data[col].std() == 0:
print(f"WARNING: {col} has zero variance")
ConvergenceWarning — model did not converge
For MLE-based models (logit, probit, Heckman, SFA, count models):
Solutions:
-
Increase maximum iterations:
-
Try a different optimizer:
-
Provide better starting values:
-
Simplify the model — remove interaction terms, reduce number of covariates
-
Scale your variables — variables on very different scales can cause numerical issues:
GMM results are unstable or unreasonable
Common causes:
- Too many instruments → overfitting
- Weak instruments → large standard errors, unstable coefficients
- N < L (entities < instruments) → rank-deficient weight matrix
Solutions:
from panelbox.gmm import SystemGMM, GMMOverfitDiagnostic
# 1. Use collapse to reduce instruments
model = SystemGMM(..., collapse=True, max_lags=2)
result = model.fit()
# 2. Check instrument count vs entities
print(f"Instruments: {result.n_instruments}")
print(f"Entities: {result.n_entities}")
# Instruments should be < N
# 3. Run overfit diagnostic
diag = GMMOverfitDiagnostic(result)
print(diag.summary())
# 4. Compare one-step vs two-step
result_1step = model.fit(two_step=False)
result_2step = model.fit(two_step=True)
# If very different → instrument issues
MLE did not converge (Heckman, SFA, logit)
For Heckman specifically:
from panelbox.models.selection import PanelHeckman
# 1. Use two-step as starting values for MLE
result_2step = heckman.fit(method="two_step")
result_mle = heckman.fit(
method="mle",
starting_values=result_2step.params
)
# 2. Reduce quadrature points (Heckman MLE)
result = heckman.fit(method="mle", quadrature_points=10)
# 3. Just use two-step (reliable fallback)
result = heckman.fit(method="two_step")
For SFA/Frontier:
Negative variance estimates
Negative variance components typically indicate model misspecification:
- Random Effects model assumes \(\text{Var}(\alpha_i) > 0\), but the data does not support this
- May occur with very small between-entity variation
Solutions:
- Switch to Fixed Effects (does not estimate variance components)
- Check that the entity and time columns are correctly specified
- Verify that there is meaningful cross-entity variation
Diagnostic Test Errors¶
Hansen J test returns NaN
Cause: The model is under-identified — the number of instruments is less than or equal to the number of parameters, giving \(df = n_{instruments} - n_{params} \leq 0\).
This commonly occurs with collapse=True combined with time_dummies=True (default), where the number of time dummies exceeds the collapsed instrument count.
Solutions:
# 1. Check degrees of freedom
print(f"Instruments: {result.n_instruments}")
print(f"Parameters: {result.n_params}")
print(f"Hansen df: {result.n_instruments - result.n_params}")
# Must be > 0 for a valid Hansen test
# 2. Use time_dummies=False with collapse=True
model = SystemGMM(..., collapse=True, time_dummies=False)
# 3. Add more instruments (increase max_lags)
model = SystemGMM(..., collapse=True, max_lags=3)
Hausman test statistic is negative
A negative Hausman statistic occurs when the variance difference matrix \((V_{FE} - V_{RE})\) is not positive semi-definite. The test is unreliable in this case.
Solution — use the Mundlak test:
from panelbox.validation import MundlakTest
mundlak = MundlakTest(data, "y ~ x1 + x2", "entity_id", "year")
result = mundlak.run()
print(result.conclusion)
# If p < 0.05: use Fixed Effects
# If p >= 0.05: Random Effects is consistent
The Mundlak test is always well-defined and provides a robust alternative.
Unit root tests give contradictory results
Different tests have different null hypotheses and power:
| Tests Agree? | Hadri | IPS/LLC | Interpretation |
|---|---|---|---|
| Hadri rejects, IPS fails to reject | Rejects stationarity | Fails to reject unit root | Unit root present |
| Hadri fails to reject, IPS rejects | Cannot reject stationarity | Rejects unit root | Stationary |
| Both reject their nulls | — | — | Borderline (near unit root) |
| Both fail to reject | — | — | Insufficient power |
Best practice: Run a battery of tests and look for consensus:
from panelbox.diagnostics.unit_root import hadri_test, breitung_test
from panelbox.validation import IPSTest, LLCTest
# Stationarity null
hadri = hadri_test(data, variable="y")
# Unit root null
ips = IPSTest(data, variable="y", entity_col="entity", time_col="time").run()
llc = LLCTest(data, variable="y", entity_col="entity", time_col="time").run()
print(f"Hadri (H0: stationary): p = {hadri.pvalue:.4f}")
print(f"IPS (H0: unit root): p = {ips.pvalue:.4f}")
print(f"LLC (H0: unit root): p = {llc.pvalue:.4f}")
If results are ambiguous, try the Breitung test as a tiebreaker and check for structural breaks in the data.
Moran's I test gives unexpected results
- Significant Moran's I on OLS residuals → spatial autocorrelation exists, consider spatial models
- Significant Moran's I on spatial model residuals → model has not fully captured spatial dependence; try SDM or GNS
- Non-significant Moran's I on data but theory suggests spatial effects → check weight matrix specification; try different W constructions
Report & Visualization Errors¶
HTML report is blank or not rendering
Cause: Plotly is not installed or not configured for your environment.
Solution:
For Jupyter notebooks:
Export to PNG fails
Cause: The kaleido package (static image export engine) is not installed.
Solution:
If kaleido installation fails on your platform:
Chart not rendering in Jupyter notebook
Solutions:
-
Use the HTML method:
-
Install Jupyter extension:
-
Set renderer explicitly:
Performance Issues¶
Estimation takes too long
Identify the bottleneck and apply targeted solutions:
| Problem | Solution |
|---|---|
| Too many GMM instruments | collapse=True, max_lags=2 |
| CUE-GMM optimization | Use two-step for exploration, CUE for final results |
| Bootstrap replications | Reduce n_boot to 499 (usually sufficient) |
| Spatial ML (large N) | Use sparse weight matrices, Chebyshev approximation |
| Heckman MLE | Reduce quadrature_points to 10, or use two-step |
| FE multinomial logit | Use RE if J > 4 or T > 10 |
| Cointegration bootstrap | Use asymptotic first; only bootstrap if borderline |
General advice: Use fast methods for exploration, robust methods for final results.
Out of memory (MemoryError)
Common causes and solutions:
- Bootstrap with large N*T: reduce
n_bootto 499 - Spatial weight matrix: use sparse format for large N
- Panel VAR IRF bootstrap: reduce
n_bootandperiods, or process one impulse at a time
Error Message Index¶
Quick reference for common error messages in alphabetical order:
| Error Message | Likely Cause | Solution |
|---|---|---|
ConvergenceWarning |
MLE optimization failed | Increase maxiter, try method="bfgs", simplify model |
ImportError: cannot import name ... |
Wrong import path | Check API Reference for correct imports |
KeyError: 'entity_col' |
Column name mismatch | Check data.columns for exact name (case-sensitive) |
LinAlgError: Singular matrix |
Perfect collinearity or rank deficiency | Remove correlated variables, reduce instruments |
MemoryError |
Dataset too large for available RAM | Reduce bootstrap, use sparse matrices, subset data |
ModuleNotFoundError |
Package not installed | pip install panelbox or pip install panelbox[all] |
RuntimeWarning: invalid value (NaN) |
Numerical instability | Scale variables, check for outliers, simplify model |
ValueError: could not convert |
Non-numeric data | Use pd.to_numeric(col, errors="coerce") |
ValueError: data must have MultiIndex |
Missing entity/time specification | Pass entity_col and time_col parameters |
Warning: Panel is unbalanced |
Informational — not an error | Most models handle automatically |
Warning: rho > 1 in Heckman |
Model misspecification | Check exclusion restriction, use two-step |
Warning: VAR is unstable |
Eigenvalues > 1 | Difference variables, use VECM, reduce lags |
Debugging Checklist¶
When you encounter an issue, work through this systematic checklist:
1. Data¶
- Panel structure correct (entity and time columns identified)?
- No missing values in key variables (or handled explicitly)?
- Variables are numeric (no strings in regression columns)?
- No duplicate entity-time pairs?
- Sufficient observations (N and T)?
2. Model Specification¶
- Formula is correct (dependent ~ independent)?
- Entity and time column names match exactly?
- Appropriate model for the data (static vs dynamic, FE vs RE)?
- No perfect collinearity among regressors?
3. Estimation¶
- Model converged (check warnings)?
- Coefficients are reasonable in magnitude and sign?
- Standard errors are finite and non-zero?
- For GMM: instruments < N, Hansen J p-value > 0.10?
4. Diagnostics¶
- Residuals look random (no patterns)?
- Diagnostic tests pass (Hausman, serial correlation, etc.)?
- Results robust to alternative specifications?
Getting Help¶
If this guide does not solve your problem:
-
Prepare a minimal reproducible example:
-
Include version information:
-
Open an issue on GitHub with:
- Description of the problem
- Reproducible example
- Full error traceback
- What you already tried
See Also¶
- General FAQ — getting started, model selection, results interpretation
- Advanced FAQ — GMM, VAR, Heckman, cointegration
- Spatial FAQ — spatial econometrics questions
- API Reference — correct import paths and signatures