Models & Their Conditions
Each model has explicit mathematical assumptions that must hold (or approximately hold) for results to be valid. Understanding when these conditions break down is just as important as the formulas themselves.
wᵢ ≥ 0 for long-only portfoliosReturns exhibit fat tails (kurtosis > 3) during market crashes. Covariance matrices estimated from short windows are often ill-conditioned — use Ledoit-Wolf shrinkage as a remedy: Σ̂ = (1-δ)Σ_sample + δ·F where F is a structured target.
Run a cross-sectional regression: E(Rᵢ) = γ₀ + γ₁·βᵢ + εᵢ. CAPM predicts γ₀ = Rf and γ₁ = E(Rm) - Rf. Fama-MacBeth (1973) tests this formally. Empirically, the SML is often too flat — low-beta stocks outperform CAPM predictions.
Σ_reg = Σ + λI0 ≤ wᵢ ≤ w_max and sector limits to prevent degenerate solutionsBacktesting Framework
Backtesting simulates how a strategy would have performed on historical data. A rigorous backtest follows a strict methodology to avoid false confidence in results.
Net Return = Gross Return − |Δw| × cost_ratet-stat = SR × √T. At 95% confidence, need t-stat > 1.96. Also check for multiple testing bias (adjust p-values with Bonferroni or Benjamini-Hochberg).| Model | Key Backtest Check | Statistical Test |
|---|---|---|
| MPT / Efficient Frontier | Out-of-sample Sharpe vs. in-sample Sharpe degradation | Diebold-Mariano test for return forecast accuracy |
| CAPM / Beta | Alpha stability across subperiods; beta rolling stability | Fama-MacBeth cross-sectional regression |
| Fama-French 5F | Factor loadings stable in OOS? Alpha > 0 after fees? | GRS test (Gibbons-Ross-Shanken) for joint α = 0 |
| VaR | Kupiec test: # VaR breaches must match expected frequency | Kupiec LR test (1995), Christoffersen test for independence |
| Monte Carlo | Coverage test: % of actual paths within predicted CI | Kolmogorov-Smirnov test for distribution fit |
| Risk Parity | Ex-post risk contribution equality; volatility of vol | Ex-post RC deviation from 1/n target |
| Sharpe Ratio | Significance test: t = SR × √T > 1.96 | Ledoit-Wolf (2008) corrected Sharpe SE |
Performance Metrics Reference
A comprehensive set of metrics that should be computed on every backtest. Never evaluate a strategy on Sharpe alone.
| Metric | Formula | Good Threshold | Use Case |
|---|---|---|---|
| CAGR | (Final/Initial)^(1/T) − 1 | > 10% p.a. | Absolute return |
| Sharpe Ratio | (Rp − Rf) / σp × √252 | > 1.0 | Risk-adjusted return |
| Sortino Ratio | (Rp − MAR) / σ_down | > 1.5 | Downside-focused |
| Max Drawdown | max(Peak − Trough) / Peak | < 20% | Worst loss from peak |
| Calmar Ratio | CAGR / Max Drawdown | > 0.5 | Return per unit MDD |
| Beta | Cov(Rp, Rm) / Var(Rm) | < 1.0 for conservative | Market sensitivity |
| Alpha (Jensen's) | Rp − [Rf + β(Rm − Rf)] | > 0 (significant) | Manager skill |
| Information Ratio | (Rp − Rb) / TE | > 0.5 | Active management |
| Tracking Error | std(Rp − Rb) × √252 | < 5% for index funds | Deviation from benchmark |
| Win Rate | # winning periods / total | > 50% | Consistency |
| Profit Factor | Gross Profit / Gross Loss | > 1.5 | Overall profitability |
| Skewness | E[(R−μ)³] / σ³ | > 0 preferred | Return asymmetry |
| Kurtosis | E[(R−μ)⁴] / σ⁴ − 3 | Near 0 preferred | Tail heaviness |
| VaR (95%) | −Percentile(R, 5%) | Context-dependent | Daily loss limit |
| CVaR (95%) | E[R | R < VaR] negated | Context-dependent | Expected tail loss |
| Turnover | Σ|Δwᵢ| / 2 per period | < 50% monthly | Transaction cost driver |
Python Implementation Pseudocode
Illustrative code showing how to implement each model and backtest in Python using NumPy, SciPy, and Pandas.
import numpy as np from scipy.optimize import minimize # --- 1. Compute inputs from price data --- returns = prices.pct_change().dropna() mu = returns.mean() * 252 # Annualized expected returns Sigma = returns.cov() * 252 # Annualized covariance matrix n = len(mu) # --- 2. Define portfolio metrics --- def portfolio_stats(w, mu, Sigma): ret = w @ mu vol = np.sqrt(w @ Sigma @ w) sharpe = (ret - Rf) / vol return ret, vol, sharpe # --- 3. Maximize Sharpe (min negative Sharpe) --- def neg_sharpe(w): ret, vol, sr = portfolio_stats(w, mu, Sigma) return -sr constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}] bounds = [(0, 1)] * n # Long-only constraint w0 = np.ones(n) / n # Equal weight initial guess result = minimize(neg_sharpe, w0, method='SLSQP', bounds=bounds, constraints=constraints) w_tangency = result.x # --- 4. Walk-Forward Backtest --- L_train, L_test = 252, 21 oos_returns = [] for t in range(L_train, len(returns) - L_test, L_test): train = returns.iloc[t-L_train:t] test = returns.iloc[t:t+L_test] mu_t = train.mean() * 252 Sigma_t = train.cov() * 252 w_t = optimize(mu_t, Sigma_t) # reoptimize period_ret = (test @ w_t).values oos_returns.extend(period_ret) # --- 5. Compute Sharpe on OOS results --- oos = np.array(oos_returns) sharpe = (oos.mean() * 252 - Rf) / (oos.std() * np.sqrt(252)) t_stat = sharpe * np.sqrt(len(oos) / 252) # Annualized t-stat
import scipy.stats as stats # --- Historical VaR --- confidence = 0.95 alpha = 1 - confidence VaR_hist = -np.percentile(oos_returns, alpha * 100) # CVaR (Expected Shortfall) tail_losses = [r for r in oos_returns if r <= -VaR_hist] CVaR = -np.mean(tail_losses) # --- Kupiec Test for VaR Validity --- T = len(oos_returns) breaches = sum(1 for r in oos_returns if r < -VaR_hist) p_hat = breaches / T # Observed breach rate p_expected = alpha # Should match alpha (e.g., 0.05) # Likelihood ratio statistic LR = -2 * ( breaches * np.log(p_expected / p_hat) + (T - breaches) * np.log((1-p_expected) / (1-p_hat)) ) p_value = 1 - stats.chi2.cdf(LR, df=1) # p_value > 0.05 → VaR model is valid (do NOT reject) # p_value < 0.05 → VaR model is mis-specified (too few or many breaches)
import statsmodels.api as sm # Download factors from Ken French Data Library # ff_factors: columns = ['Mkt-RF','SMB','HML','RMW','CMA','RF'] excess_ret = portfolio_returns - ff_factors['RF'] X = ff_factors[['Mkt-RF','SMB','HML','RMW','CMA']] X = sm.add_constant(X) # Adds alpha intercept # OLS regression with Newey-West HAC standard errors model = sm.OLS(excess_ret, X) result = model.fit(cov_type='HAC', cov_kwds={'maxlags': 6}) print(result.summary()) # Key outputs: # const (alpha): intercept — is it statistically > 0? # Mkt-RF beta: market exposure # R-squared: what % of returns explained by factors # t-stats: each factor loading significance (|t| > 2 = significant) # GRS Test — all alphas jointly = 0? # Use: from linearmodels.asset_pricing import LinearFactorModel
Critical Pitfalls & How to Avoid Them
Most backtests fail in live trading not because the model is wrong, but because of systematic biases introduced during the testing process.
- Look-Ahead Bias: Using future information (e.g., tomorrow's price, end-of-day close to compute today's signal). Fix: All signals must be computed using data available at time t only.
- Survivorship Bias: Testing only on companies that still exist today — ignoring bankruptcies inflates returns by 1–2% annually. Fix: Use point-in-time databases (CRSP, Compustat).
- Overfitting / Data Snooping: Testing hundreds of parameter combinations and reporting the best. Fix: Reserve a hold-out test set NEVER touched during strategy design. Use walk-forward OOS.
- Transaction Cost Omission: Ignoring brokerage, spreads, and slippage. High-turnover strategies can lose 2–5% p.a. to costs. Fix: Model costs explicitly per trade.
- Ignoring Liquidity: Assuming you can trade any size at market price. Fix: Apply market impact model:
impact = σ · (Q/ADV)^0.5for large trades. - Short History / Regime Bias: Testing only on a bull market (e.g., 2012–2021) and calling it robust. Fix: Include 2000–2002, 2008–2009, and 2022 in the test.
- Multiple Testing Without Correction: Running 50 strategy variants and reporting the one with Sharpe 1.8. Fix: Adjust significance threshold — if testing k strategies, require p < 0.05/k (Bonferroni) or use the Benjamini-Hochberg procedure.