Portfolio Mathematics & Backtesting Guide

Models & Their Conditions

Each model has explicit mathematical assumptions that must hold (or approximately hold) for results to be valid. Understanding when these conditions break down is just as important as the formulas themselves.

Modern Portfolio Theory (MPT)

Markowitz Mean-Variance Optimization

Foundational ▾

Core Formula

// Portfolio Expected Return E(Rp) = Σ wᵢ · E(Rᵢ) for i = 1 to n // Portfolio Variance (matrix form) σ²p = wᵀ · Σ · w // Two-asset expansion σ²p = w₁²σ₁² + w₂²σ₂² + 2·w₁·w₂·σ₁·σ₂·ρ₁₂ // Optimization problem min wᵀΣw subject to: wᵀμ = μ*, Σwᵢ = 1

Required Conditions

📐Normal Returns: Asset returns are normally distributed (or at minimum, described fully by mean and variance)

📊Stationary Covariance: The covariance matrix Σ is stable over the estimation window

🔢Positive Semi-Definite Σ: The covariance matrix must be PSD for the optimization to have a valid solution

⚖️Rational Investors: Investors are mean-variance optimizers — they prefer more return and less risk

🚫No Transaction Costs: The base model assumes frictionless trading (can be extended)

🔓Short-selling allowed in unconstrained form; add wᵢ ≥ 0 for long-only portfolios

When Conditions Break

Returns exhibit fat tails (kurtosis > 3) during market crashes. Covariance matrices estimated from short windows are often ill-conditioned — use Ledoit-Wolf shrinkage as a remedy: Σ̂ = (1-δ)Σ_sample + δ·F where F is a structured target.

Capital Asset Pricing Model

CAPM — Single Factor Pricing

Pricing ▾

Core Formula

// Security Market Line (SML) E(Rᵢ) = Rf + βᵢ · [E(Rm) - Rf] // Beta calculation β = Cov(Rᵢ, Rm) / Var(Rm) = ρᵢm · σᵢ / σm // Security Characteristic Line (statistical estimate) Rᵢ - Rf = α + β(Rm - Rf) + ε

Required Conditions

📈Efficient Markets: All investors have same expectations and access to information

🏦Risk-Free Asset Exists: Investors can borrow/lend unlimited amounts at Rf

⏱️Single Period: Model is for one holding period; multi-period requires extensions (ICAPM)

🌍Market Portfolio Proxy: A valid proxy for the true market portfolio is observable (e.g., S&P 500)

📉Linear Risk-Return: Expected return is strictly linear in beta — no other risk factors exist

🔄Beta Stability: Beta must be stable over the regression window (rolling beta helps)

Validity Test

Run a cross-sectional regression: E(Rᵢ) = γ₀ + γ₁·βᵢ + εᵢ. CAPM predicts γ₀ = Rf and γ₁ = E(Rm) - Rf. Fama-MacBeth (1973) tests this formally. Empirically, the SML is often too flat — low-beta stocks outperform CAPM predictions.

Sharpe, Sortino & Treynor Ratios

Risk-Adjusted Performance Metrics

Performance ▾

Formulas

// Sharpe Ratio — total risk S = (E(Rp) - Rf) / σp // Sortino Ratio — downside risk only Sortino = (E(Rp) - Rf) / σ_down where σ_down = √[ (1/n) Σ min(Rᵢ - MAR, 0)² ] // Treynor Ratio — systematic risk T = (E(Rp) - Rf) / β // Information Ratio — active return IR = (Rp - Rb) / TE // TE = tracking error std dev

Required Conditions

📊Sharpe: Returns are normally distributed. Fails when return distributions are skewed or heavy-tailed

⬇️Sortino: Use when return distribution is asymmetric. Set MAR (Minimum Acceptable Return) explicitly

📐Treynor: Valid only for well-diversified portfolios where idiosyncratic risk is minimal

📏Annualization: Multiply by √T (T = 252 for daily, 52 for weekly, 12 for monthly)

🔄Rf Selection: Rf must match the return frequency — don't mix annual Rf with daily returns without conversion

📈Sufficient History: Statistically reliable Sharpe estimates require at least 36 months of data

Value at Risk & CVaR

Tail Risk Quantification

Risk ▾

Formulas

// Parametric VaR (normal assumption) VaR_α = μ - z_α · σ // z₀.₀₅ = 1.645 for 95% conf // Historical VaR VaR_α = -Quantile(returns, α) // CVaR / Expected Shortfall CVaR_α = -E[R | R ≤ -VaR_α] = -(1/α) · ∫₋∞^(-VaR) r·f(r) dr // Portfolio VaR (positions vector) VaR_p = z_α · √(wᵀΣw) · √T · Portfolio_Value

Required Conditions

📐Parametric VaR: Requires returns to be normally distributed — severely underestimates tail risk in crisis periods

📅Historical VaR: Requires a long, representative return history. Lookback window must include stressed periods

⏱️Time Horizon: VaR scales with √T under i.i.d. returns (10-day VaR ≈ 1-day VaR × √10)

🎯CVaR is preferred: CVaR is sub-additive and coherent; VaR is NOT sub-additive — CVaR always ≥ VaR

🔢Sufficient Data: For 99% VaR, need at least 500 observations for reliable historical estimation

🔗Correlation Stability: Linear correlations used in parametric VaR fail in tail events (use copulas for tail dependence)

Fama-French Factor Models

3-Factor & 5-Factor Return Decomposition

Factor ▾

Formulas

// 3-Factor Model (1993) Rᵢ - Rf = α + β₁(Rm-Rf) + β₂·SMB + β₃·HML + ε // 5-Factor Model (2015) — adds profitability + investment Rᵢ - Rf = α + β₁(Rm-Rf) + β₂·SMB + β₃·HML + β₄·RMW + β₅·CMA + ε // Factor definitions SMB = Return(Small) - Return(Big) // Size premium HML = Return(High B/M) - Return(Low B/M) // Value premium RMW = Return(Robust) - Return(Weak) // Profitability CMA = Return(Conservative) - Return(Aggr) // Investment

Required Conditions

📉OLS Assumptions: Residuals ε must be i.i.d., zero-mean, and homoscedastic (use Newey-West SE for autocorrelation)

🔢No Multicollinearity: Factors must not be highly correlated — check VIF (should be < 5)

🌍Factor Data Availability: Ken French's data library provides US factors; international factors require separate sources

🏷️Alpha = 0 under H₀: CAPM predicts α = 0; a significant α means unexplained excess return (skill or missing factor)

⏱️Minimum Window: Run regression on at least 60 monthly observations (5 years) for reliable factor loadings

📊R² Interpretation: Model R² of 0.90+ for diversified portfolios; single stocks may have R² of only 0.3–0.6

Monte Carlo Simulation

Stochastic Return Path Generation

Simulation ▾

Core Formula

// Geometric Brownian Motion (GBM) dS = μ·S·dt + σ·S·dW Sₜ = S₀ · exp[(μ - σ²/2)·t + σ·√t·Z] Z ~ N(0,1) // Discrete simulation step Rₜ = (μ - σ²/2)·Δt + σ·√Δt·εₜ εₜ ~ N(0,1) // Correlated multi-asset paths (Cholesky) L = cholesky(Σ) // Σ = L·Lᵀ Z_corr = L · Z_indep // Inject correlations // Run N simulations, compute percentiles VaR = -Percentile(terminal_returns, α%)

Required Conditions

🎲GBM Assumption: Log-returns follow a normal distribution; prices follow a lognormal distribution

📌Parameter Estimation: μ and σ must be estimated from historical data — use sufficient history (> 5 years daily)

🔢N ≥ 10,000 paths: For reliable percentile estimates at 99% confidence, run at minimum 10,000 simulations

🔗Cholesky Validity: Covariance matrix must be positive definite for Cholesky decomposition to exist

⚠️Regime Blindness: Standard GBM misses volatility clustering — use GARCH or regime-switching extensions for stress tests

📅Time Step Δt: Smaller Δt increases accuracy; daily (1/252) is standard for equity portfolios

Risk Parity & Equal Risk Contribution

Volatility-Weighted Allocation

Allocation ▾

Formulas

// Marginal Risk Contribution of asset i MRC_i = (∂σp / ∂wᵢ) = (Σw)ᵢ / σp // Risk Contribution RC_i = wᵢ · MRC_i = wᵢ · (Σw)ᵢ / σp // Equal Risk Contribution condition RC_i = RC_j for all i,j (i.e. each asset = 1/n of total risk) // Naive Risk Parity (inverse vol weights) wᵢ = (1/σᵢ) / Σ(1/σⱼ) // Full ERC — solve numerically (gradient descent) min Σᵢ Σⱼ (RC_i - RC_j)²

Required Conditions

📊Reliable Volatility Estimates: Use EWMA or GARCH for dynamic vol estimation rather than static historical vol

🔢Positive-Definite Σ: The covariance matrix must be invertible. Regularize if near-singular

⚖️Leverage Neutral: Naive ERC often underweights equities — may require leverage to hit return targets

🔄Rebalancing Frequency: Weights drift as volatilities change — rebalance at minimum monthly

🌍Asset Breadth: Works best with uncorrelated asset classes (equities, bonds, commodities, real assets)

📉Correlation Regime: In crisis periods, correlations spike to 1 — risk parity diversification benefit collapses

Efficient Frontier & Optimal Portfolios

Quadratic Programming for Weight Optimization

Optimization ▾

Formulas

// Global Minimum Variance Portfolio w_GMV = (Σ⁻¹ · 1) / (1ᵀ · Σ⁻¹ · 1) // Maximum Sharpe Ratio (Tangency) Portfolio w* = (Σ⁻¹ · (μ - Rf·1)) / (1ᵀ · Σ⁻¹ · (μ - Rf·1)) // Two-Fund Separation Theorem // Any point on efficient frontier = combo of w_GMV and w* w(λ) = λ·w* + (1-λ)·w_GMV // Capital Market Line (CML) E(Rp) = Rf + [(E(Rm)-Rf)/σm] · σp

Required Conditions

🧮Invertible Σ: If Σ is singular (e.g., n > T), use regularization: Σ_reg = Σ + λI

📈Expected Returns: The frontier is highly sensitive to μ estimates — small errors in μ lead to extreme allocations (use Black-Litterman to stabilize)

🔒Constraints: Add box constraints 0 ≤ wᵢ ≤ w_max and sector limits to prevent degenerate solutions

🔄Out-of-Sample: In-sample frontier is always optimistic — must validate on out-of-sample data

📅Estimation Window: Rolling 1–3 year windows for covariance, 6–12 months for expected returns (shorter for means due to noise)

Backtesting Framework

Backtesting simulates how a strategy would have performed on historical data. A rigorous backtest follows a strict methodology to avoid false confidence in results.

Walk-Forward Backtesting

The Gold Standard Methodology

Methodology ▾

How It Works

// Split data into rolling windows Training window: [t₀, t₀ + L_train] Test window: [t₀ + L_train, t₀ + L_train + L_test] // Roll forward for each period t: 1. Estimate Σ, μ on training window 2. Optimize weights w* on training data 3. Apply w* to test window → record returns 4. Slide both windows forward by L_test 5. Aggregate all test returns → backtest equity curve

Step-by-Step Process

Data Collection & Cleaning

Gather adjusted price data (dividends + splits). Handle missing values, survivorship bias, and corporate actions. Source from reliable vendors (Bloomberg, Refinitiv, CRSP).

Define Universe & Benchmark

Fix your investable universe before the test starts. Choose the appropriate benchmark (e.g., S&P 500 for US equity, MSCI World for global). Universe must NOT be chosen with hindsight.

Specify All Rules Before Running

Define: signal generation, entry/exit conditions, rebalancing frequency, position sizing, max drawdown stops. No rule changes after seeing results.

Implement Transaction Costs

Apply realistic costs: brokerage (~0.01–0.1% per trade), market impact (for large trades), bid-ask spread, and slippage. Formula: Net Return = Gross Return − |Δw| × cost_rate

Run Walk-Forward Simulation

Typical windows: 252-day training (1 year), 21-day test (1 month). Collect out-of-sample returns into a single equity curve.

Compute Performance Metrics

Calculate Sharpe, Sortino, Max Drawdown, CAGR, Calmar Ratio, Alpha, Beta, and Information Ratio on the out-of-sample equity curve only.

Stress Testing & Scenario Analysis

Re-run strategy on crisis periods: 2000–2002 (dot-com), 2008–2009 (GFC), 2020 (COVID crash), 2022 (rate hikes). Verify the strategy survives.

Statistical Significance Test

Test if Sharpe > 0 using t-stat = SR × √T. At 95% confidence, need t-stat > 1.96. Also check for multiple testing bias (adjust p-values with Bonferroni or Benjamini-Hochberg).

Model-Specific Backtest Conditions

What to Validate for Each Model

Validation ▾

Model	Key Backtest Check	Statistical Test
MPT / Efficient Frontier	Out-of-sample Sharpe vs. in-sample Sharpe degradation	Diebold-Mariano test for return forecast accuracy
CAPM / Beta	Alpha stability across subperiods; beta rolling stability	Fama-MacBeth cross-sectional regression
Fama-French 5F	Factor loadings stable in OOS? Alpha > 0 after fees?	GRS test (Gibbons-Ross-Shanken) for joint α = 0
VaR	Kupiec test: # VaR breaches must match expected frequency	Kupiec LR test (1995), Christoffersen test for independence
Monte Carlo	Coverage test: % of actual paths within predicted CI	Kolmogorov-Smirnov test for distribution fit
Risk Parity	Ex-post risk contribution equality; volatility of vol	Ex-post RC deviation from 1/n target
Sharpe Ratio	Significance test: t = SR × √T > 1.96	Ledoit-Wolf (2008) corrected Sharpe SE

Performance Metrics Reference

A comprehensive set of metrics that should be computed on every backtest. Never evaluate a strategy on Sharpe alone.

★

Complete Metrics Compendium

Formulas, interpretation, and thresholds

Reference ▾

Metric	Formula	Good Threshold	Use Case
CAGR	(Final/Initial)^(1/T) − 1	> 10% p.a.	Absolute return
Sharpe Ratio	(Rp − Rf) / σp × √252	> 1.0	Risk-adjusted return
Sortino Ratio	(Rp − MAR) / σ_down	> 1.5	Downside-focused
Max Drawdown	max(Peak − Trough) / Peak	< 20%	Worst loss from peak
Calmar Ratio	CAGR / Max Drawdown	> 0.5	Return per unit MDD
Beta	Cov(Rp, Rm) / Var(Rm)	< 1.0 for conservative	Market sensitivity
Alpha (Jensen's)	Rp − [Rf + β(Rm − Rf)]	> 0 (significant)	Manager skill
Information Ratio	(Rp − Rb) / TE	> 0.5	Active management
Tracking Error	std(Rp − Rb) × √252	< 5% for index funds	Deviation from benchmark
Win Rate	# winning periods / total	> 50%	Consistency
Profit Factor	Gross Profit / Gross Loss	> 1.5	Overall profitability
Skewness	E[(R−μ)³] / σ³	> 0 preferred	Return asymmetry
Kurtosis	E[(R−μ)⁴] / σ⁴ − 3	Near 0 preferred	Tail heaviness
VaR (95%)	−Percentile(R, 5%)	Context-dependent	Daily loss limit
CVaR (95%)	E[R \| R < VaR] negated	Context-dependent	Expected tail loss
Turnover	Σ\|Δwᵢ\| / 2 per period	< 50% monthly	Transaction cost driver

Python Implementation Pseudocode

Illustrative code showing how to implement each model and backtest in Python using NumPy, SciPy, and Pandas.

MPT Portfolio Optimization

Efficient Frontier + Sharpe Maximization

Code ▾

import numpy as np
from scipy.optimize import minimize

# --- 1. Compute inputs from price data ---
returns = prices.pct_change().dropna()
mu      = returns.mean() * 252        # Annualized expected returns
Sigma   = returns.cov() * 252         # Annualized covariance matrix
n       = len(mu)

# --- 2. Define portfolio metrics ---
def portfolio_stats(w, mu, Sigma):
    ret = w @ mu
    vol = np.sqrt(w @ Sigma @ w)
    sharpe = (ret - Rf) / vol
    return ret, vol, sharpe

# --- 3. Maximize Sharpe (min negative Sharpe) ---
def neg_sharpe(w):
    ret, vol, sr = portfolio_stats(w, mu, Sigma)
    return -sr

constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
bounds = [(0, 1)] * n     # Long-only constraint
w0 = np.ones(n) / n      # Equal weight initial guess

result = minimize(neg_sharpe, w0,
                  method='SLSQP',
                  bounds=bounds,
                  constraints=constraints)
w_tangency = result.x

# --- 4. Walk-Forward Backtest ---
L_train, L_test = 252, 21
oos_returns = []

for t in range(L_train, len(returns) - L_test, L_test):
    train = returns.iloc[t-L_train:t]
    test  = returns.iloc[t:t+L_test]

    mu_t    = train.mean() * 252
    Sigma_t = train.cov() * 252
    w_t     = optimize(mu_t, Sigma_t)   # reoptimize

    period_ret = (test @ w_t).values
    oos_returns.extend(period_ret)

# --- 5. Compute Sharpe on OOS results ---
oos    = np.array(oos_returns)
sharpe = (oos.mean() * 252 - Rf) / (oos.std() * np.sqrt(252))
t_stat = sharpe * np.sqrt(len(oos) / 252)  # Annualized t-stat

VaR & CVaR Backtest (Kupiec Test)

Historical Simulation + Breach Testing

Code ▾

import scipy.stats as stats

# --- Historical VaR ---
confidence = 0.95
alpha      = 1 - confidence
VaR_hist   = -np.percentile(oos_returns, alpha * 100)

# CVaR (Expected Shortfall)
tail_losses = [r for r in oos_returns if r <= -VaR_hist]
CVaR        = -np.mean(tail_losses)

# --- Kupiec Test for VaR Validity ---
T          = len(oos_returns)
breaches   = sum(1 for r in oos_returns if r < -VaR_hist)
p_hat      = breaches / T          # Observed breach rate
p_expected = alpha                 # Should match alpha (e.g., 0.05)

# Likelihood ratio statistic
LR = -2 * (
    breaches * np.log(p_expected / p_hat) +
    (T - breaches) * np.log((1-p_expected) / (1-p_hat))
)
p_value = 1 - stats.chi2.cdf(LR, df=1)

# p_value > 0.05 → VaR model is valid (do NOT reject)
# p_value < 0.05 → VaR model is mis-specified (too few or many breaches)

Fama-French Factor Regression

OLS with Newey-West Standard Errors

Code ▾

import statsmodels.api as sm

# Download factors from Ken French Data Library
# ff_factors: columns = ['Mkt-RF','SMB','HML','RMW','CMA','RF']

excess_ret = portfolio_returns - ff_factors['RF']
X = ff_factors[['Mkt-RF','SMB','HML','RMW','CMA']]
X = sm.add_constant(X)     # Adds alpha intercept

# OLS regression with Newey-West HAC standard errors
model  = sm.OLS(excess_ret, X)
result = model.fit(cov_type='HAC', cov_kwds={'maxlags': 6})
print(result.summary())

# Key outputs:
# const (alpha): intercept — is it statistically > 0?
# Mkt-RF beta: market exposure
# R-squared: what % of returns explained by factors
# t-stats: each factor loading significance (|t| > 2 = significant)

# GRS Test — all alphas jointly = 0?
# Use: from linearmodels.asset_pricing import LinearFactorModel

Critical Pitfalls & How to Avoid Them

Most backtests fail in live trading not because the model is wrong, but because of systematic biases introduced during the testing process.

The 7 Cardinal Sins of Backtesting

Mistakes that make backtests look better than reality

Critical ▾

Look-Ahead Bias: Using future information (e.g., tomorrow's price, end-of-day close to compute today's signal). Fix: All signals must be computed using data available at time t only.
Survivorship Bias: Testing only on companies that still exist today — ignoring bankruptcies inflates returns by 1–2% annually. Fix: Use point-in-time databases (CRSP, Compustat).
Overfitting / Data Snooping: Testing hundreds of parameter combinations and reporting the best. Fix: Reserve a hold-out test set NEVER touched during strategy design. Use walk-forward OOS.
Transaction Cost Omission: Ignoring brokerage, spreads, and slippage. High-turnover strategies can lose 2–5% p.a. to costs. Fix: Model costs explicitly per trade.
Ignoring Liquidity: Assuming you can trade any size at market price. Fix: Apply market impact model: impact = σ · (Q/ADV)^0.5 for large trades.
Short History / Regime Bias: Testing only on a bull market (e.g., 2012–2021) and calling it robust. Fix: Include 2000–2002, 2008–2009, and 2022 in the test.
Multiple Testing Without Correction: Running 50 strategy variants and reporting the one with Sharpe 1.8. Fix: Adjust significance threshold — if testing k strategies, require p < 0.05/k (Bonferroni) or use the Benjamini-Hochberg procedure.

✓

Backtest Quality Checklist

Before you trust any backtest result

Checklist ▾

✅OOS Sharpe t-stat > 1.96

✅Transaction costs included

✅No look-ahead bias verified

✅Survivorship-bias-free data

✅Tested across multiple market regimes

✅Kupiec test passed for VaR models

✅Walk-forward OOS (not in-sample)

✅Multiple testing correction applied

✅Max drawdown < risk tolerance

✅Turnover is implementable

✅Liquidity constraints respected

✅Parameters pre-specified (no cherry-picking)