Four layers. 23 KPIs, 38 sub-scores. All thresholds, weights and formulas — public.
This page documents the complete methodology behind the Boiling Frog risk score. We publish every weight, every threshold and every data source. Anyone with a spreadsheet can recompute our results.
Before diving into the layers — a short note on methodology. Boiling Frog was not designed ad hoc. All thresholds, weights and calibrations are derived from established methods of statistical learning.
G. James, D. Witten, T. Hastie, R. Tibshirani — An Introduction to Statistical Learning, with Applications in R (ISL), 2nd Edition, Springer 2021. Four chapters are applied directly:
VIX itself is an input to Layer C. A ROC-AUC analysis with VIX > 25 as crisis label is therefore partially circular: a higher Layer-C weight trivially improves apparent performance. We additionally use the forward drawdown (maximum S&P decline over the next 20 trading days) as a non-circular benchmark — and publish both results, even when they disagree.
Full backtest results for each of these chapters are documented in the backtest report.
We split market risk into four independent layers. Each layer is computed separately, scored from 0 to 100, and then weighted into a single risk verdict. The weights are fixed and public.
Structural macro stress: central bank balance sheet, interest rates, real rates, USD system role, sovereign debt actor risk.
Geopolitical and policy events from RSS feeds, classified by severity and compound patterns. Decays over 48 hours.
Live market mechanics: equity, gold, FX, technical regime detection. Includes crisis overlay for sharp moves.
Energy market structure: oil, gas, electricity transmission, intraday dynamics, OVX volatility.
Layer A captures structural macro pressure. Seven KPIs with seventeen sub-scores in total, all sourced from public data (FRED, Treasury, TIC, H.4.1, ECB).
Fed balance sheet (WALCL) and Treasury General Account (TGA). Measures liquidity drain.
10-year yield level, 2s10s inversion, yield momentum, real-rate estimate. Refinancing pressure on US Treasuries.
Real interest rate estimate (10Y minus expected inflation).
DXY strength and USD momentum (Trade-Weighted Broad). RRP moved to Funding Stress in Phase 9 to avoid double-counting.
Treasury auction bid-to-cover, indirect bidder share, TIC foreign holdings, custody (H.4.1). Foreign demand for US debt.
CNH 10-day return (capital-flight indicator) and TED-spread level (interbank stress). Phase 1B.3.
Liquidity pressure in money markets — four sub-scores: DCPF3M−SOFR (USD corporate term funding), SOFR−EFFR (overnight bank stress), RRP volume momentum, Euribor 3M−€STR (EUR equivalent). Phase 9.
Debt Stress: WALCL · TGA · Maturity/Rates: 10Y · 2s10s · yield momentum · real rate · USD System Role: DXY · USD momentum · Actor Risk: auction BTC · indirect % · TIC · custody H.4.1 · Funding Stress: CP−SOFR · SOFR−EFFR · RRP · Euribor−€STR
Layer B watches political and geopolitical events from 22 RSS feeds, classifies them by severity, and decays the impact over 48 hours.
Only events with a base score of 15 or higher (HIGH/CRITICAL) enter the layer score.
Israel/IDF + Iran/Tehran + airstrike/bomb/strike/attack
Iran/Hormuz + carrier/fleet/strike/deploy/naval/troops
Russia/Kremlin + NATO/Poland/Baltics/Finland + invade/attack/strike
Nuclear/atom + use/deploy/launch/detonate/fired/explode
Taiwan + invasion/blockade/military/strike/attack/war
Greenland + annex/acquire/military/invade
Day 0 = +10 per event · Day 1 = +5 per event · Day 2+ = 0. Cap at +25 total.
Layer C reads live market mechanics from price data: equity, gold, FX, intraday volatility, and — since Phase 9 — the risk premium on corporate bonds (HY+IG OAS). The crisis overlay activates on sharp moves.
Drawdowns in S&P 500, Nasdaq, DAX, Euro STOXX 50 (20-day rolling-max distance).
60-day correlation between regions — high correlation = risk-off synchronization.
VIX level + 5-day spike. Phase 9: weight reduced from 0.225 to 0.15 (anti-redundancy with Credit Stress).
Gold selling during equity stress = liquidity crisis. 5-day net + 5-day peak drawdown.
Premium on corporate bonds over Treasuries — 0.7 × HY OAS + 0.3 × IG OAS, all capped at 90. Phase 9.
Regional Equity Stress (0.15) — Nikkei/EM/CSI/FTSE — and Commodity Stress (0.10) — Oil/Gas/Copper/Wheat — feed into the layer score without being shown as separate KPI cards.
Triggers when any of the four main indices (S&P 500, Nasdaq 100, DAX, Euro STOXX 50) drops more than 7 % over 10 days. The worst crash determines the overlay magnitude (worst-of logic).
Triggers when gold drops more than 5 % over 5 days.
Both triggers active simultaneously: gold ≤ −5 % AND at least one equity index ≥ −7 % decline.
1-day VIX jump > +12 points → overlay +15. Jump > +20 points → overlay +25. Very rare events (~5× in 20 years) signaling acute volatility eruptions — Volmageddon 2018-02-05 (+20pt → dark red), COVID 2020-03-12 (+18pt).
ICE BofA HY-OAS series widens by +50 bps in 5 trading days → overlay +12. Widens by +100 bps → overlay +25. Gated on equity_stress > 30 (prevents false positives from isolated junk refinancing worries). HY-OAS led the VIX in 2008/2020/2023 — Lehman, COVID, SVB bank run are correctly captured in the backtest with this trigger.
Layer D models energy market stress. Six sub-layers feed into the layer score.
Oil/gas market structure (contango/backwardation).
OVX (oil VIX), implied volatility regime.
Brent/WTI returns, momentum, regime breaks.
Energy-related shocks (production cuts, pipeline events).
Intraday Brent moves outside normal ranges.
ENTSO-E electricity grid stress (continental Europe).
Layer D is active from 2026-04-18. Historical backfills before this date use the legacy three-layer formula (Macro 40 % · Politics 30 % · Markets 30 %).
Three overlays modulate the aggregated score. They are additive on top of the weighted base score, and they are transparent — every active overlay is shown in the dashboard with its value and reason.
Up to +25 points when CRITICAL compound patterns from Layer B fire within the last 48 hours. Diplomatic statements do not trigger — only military escalation.
Boost when any single risk channel exceeds 70: channel_boost = min(15, max(0, max_channel − 70) × 0.5). Up to +15 points.
Layer B ≥ 50: factor 0.90 — Layer C ≥ 50: factor 0.90 — Layer A ≥ 80 (only structural extreme): factor 0.80. Prevents undershooting when one layer is in clear distress.
The complete daily risk score is computed in this order:
Before 2026-04-18, the legacy formula 0.40 × A + 0.30 × B + 0.30 × C is used (no Layer D, no channel overlay).
The dashboard translates the risk score into an expected drawdown range over the next 4–6 weeks. The bands are calibrated against 15+ years of market history and validated by a multinomial logistic regression on 3,872 trading days.
Score 33 ≈ 12 % drawdown (2018 correction) · Score 66 ≈ 30 % drawdown (COVID crash) · Score 85+ ≈ 40–50 % drawdown (2008 crisis). Linear interpolation between anchors.
The drawdown range is a market-wide expectation derived from the aggregated score, not a portfolio-specific forecast. It does not constitute investment advice.
The score is decomposed into five risk channels. Each channel aggregates specific sub-KPIs across layers. The dominant channel is the highest-scoring one and answers: "Where does today's risk come from?"
Equity stress (0.40) · Cross-region correlation (0.30) · Real-rate regime (0.30). Active when the world economy is decelerating or recession risk is rising.
Maturity wall (0.50) · Real-rate regime (0.50). Active when refinancing pressure or yield-curve regime breaks dominate.
Layer B score directly. Composed of 5 categories: geopolitics (0.30) · corporate events (0.25) · tariffs (0.20) · alliance shifts (0.15) · sanctions (0.10).
Debt stress (0.35) · Gold deleveraging (0.35) · Actor risk (0.15) · Volatility regime (0.15). Active when cash and balance-sheet pressure dominate.
USD system role (0.30) · Actor risk (0.20) · Cross-layer systemic factor (0.70 if average of A/B/C all exceed thresholds). Active when trust in the system itself is at risk.
confidence = min(1.0, 0.5 + margin)
Where margin = (highest_score − second_highest_score) / highest_score. Range: [0.5, 1.0]. Lower bound 0.5 means: even with two equal channels we report at least 50 % confidence — full confidence only when one channel is clearly leading.
For each tracked asset (Gold, S&P 500, Nasdaq 100, DAX, Euro STOXX 50) we run a per-asset technical analysis. The dashboard shows the most-stressed asset with: recent return, technical signals (EMA / MACD / RSI), confidence and whether the crisis surcharge is active.
bearish_signals ≥ 3 OR (bearish_signals ≥ 2 AND RSI is not an oversold-bounce). Strong technical regime break.
RSI shows oversold-bounce AND bearish_signals < 3, OR bearish_signals ≥ 1. Pull-back within an intact trend.
bearish_signals = 0. No technical pressure.
Fewer than 50 data points available — confidence too low to classify.
5-day return > +1.5 %. Clear uptrend.
5-day return < −1.5 % AND 1-day negative. Clear downtrend.
5-day return < −1.5 % BUT 1-day positive. Rebound within an intact downtrend.
5-day return between ±1.5 %. Sideways movement.
Donchian breakout (Phase 9.1): today's high > max(prev 5d high) AND close in the upper 5 % of the day's range. Only fires past the asset-specific anchor hour (US assets from 15:30 Berlin / EU assets from 10:00 Berlin) — opening volatility filtered out.
Symmetric to the upward breakout: today's low < min(prev 5d low) AND close in the lower 5 % of the day's range.
Bullish: Price > EMA50 > EMA200. Bearish: Price < EMA50 < EMA200.
Bullish: histogram > 0. Bearish: histogram < 0 for ≥ 4 of the last 5 candles.
Overbought: > 70. Oversold: < 30. Recovery: bounce above 40 after an oversold reading. Otherwise neutral.
confidence = |bearish_signals − bullish_signals| / (total_signals + 2)
EMA bearish/bullish counts as +2 points · EMA weakening as +1 · MACD as +1 · RSI oversold-bounce as +1. Saturated at 4 signals.
The technical analysis only triggers a surcharge on the GHI when the trend status is Trend reversal AND a price-based crash threshold is met (Equity 10-day drawdown ≥ 7 %, Gold 5-day return ≤ −5 %, or both for deleveraging). Surcharge values range from +8 to +22 points depending on severity. See section 06 (Overlays) for the full mechanic.
Every input is verifiable against a public source.
US Federal Reserve economic data — WALCL, TGA, yields, DXY, RRP, SOFR, EFFR, DCPF3M (Phase 9), BAML HY/IG OAS, VIX.
Auction results — bid-to-cover, indirect bidder share.
Foreign holdings of US debt and Fed custody data.
Equity, gold, EUR/USD, oil/Brent prices (intraday + EOD), OVX (oil VIX).
Eurozone money-market data — €STR (daily, EST/B.EU000A2X2A25.WT) and Euribor 3M (monthly, FM/M.U2.EUR.RT.MM.EURIBOR3MD_.HSTA). Phase 9.
US energy data (crude, Cushing, distillate).
Day-ahead electricity prices DE/FR/IT/Nord (Layer D power transmission).
22 news sources for political event ingestion.
Every methodology change — new weights, new thresholds, new compound patterns — is logged with date and reasoning. Recent entries:
Phase 9.1: Donchian breakout detection as new asset-regime classes (BREAKOUT_LONG / BREAKDOWN_SHORT). Asset-specific anchor hours (US 15:30 Berlin, EU 10:00 Berlin) filter opening volatility. V-shaped days with a new 5-day high are now correctly classified instead of NEUTRAL.
Phase 9: Credit Stress (HY+IG OAS) added as 5th exposed Layer-C KPI (weight 0.075, with Vola weight reduced 0.225 → 0.15 as anti-redundancy). Funding Stress (money-market spreads US+EUR) added as 7th Layer-A KPI (weight 0.12, Layer-A weights redistributed). New ECB ingestor for €STR + Euribor. Backtest 122 trading days: mean drift +0.63 points, ρ(credit, vola)=0.04, ρ(funding, rrp)=−0.23 — anti-redundancy holds.
Layer D (Energy) activated. Aggregation moved from 40/30/30 to 35/25/25/15. Channel Overlay introduced.
Geo Overlay calibrated to fire only on CRITICAL compound patterns of military escalation. Diplomatic statements removed from trigger logic.
Layer B severity scale recalibrated. Filter raised to score ≥ 15. Dominant Floor logic refined.
Crisis Overlay thresholds calibrated: Equity 7 % over 10 days, Gold −5 % over 5 days.
We tested all of the above against 3,872 trading days of historical data — including out-of-sample episodes like Lehman 2008, COVID-19, the Ukraine war and the 2025 Liberation Day tariffs. Read the public report including the limitations.
Open methodology is not a feature. It's the foundation.