Examples
The same calibration loop — claim, probability, outcome, scoring — applied across four domains. Every example uses real numeric values.
Prediction Markets
Type: crypto
Calibration Result
Your Brier contribution: 0.4225. Claude was the most accurate source (error: 0.61). Your crypto Brier moved from 0.21 to 0.24.
What GENYS Learns
You overestimate in the 60–70% range by 13pp on crypto decisions. Market was overconfident too, but less so.
Advertising / GTM
Type: conversion
Calibration Result
Growth Lead error: 0.30 (overestimated probability of success, but was directionally right). Claude underestimated by 48pp. Baseline underestimated by 42pp.
What GENYS Learns
Claude systematically underestimates conversion decisions by 9pp on average. Your Growth Lead is well-calibrated on campaign outcomes.
Product / Strategy
Type: pricing
Calibration Result
Sales Lead error: 0.72 (large). PM error: 0.55. GPT-4o error: 0.48 (most accurate). Team pricing Brier: 0.38.
What GENYS Learns
Sales overestimates pricing decisions by 22pp on average. GPT-4o is the most reliable source for pricing predictions on your team. PM is close to baseline.
AI Model Evaluation
Type: technology
Calibration Result
ML Engineer error: 0.80 (extreme overconfidence). Baseline A/B error: 0.38 (closest to reality). Technology Brier across team: 0.41.
What GENYS Learns
ML Engineers overestimate technology outcomes by 31pp. The A/B baseline was the most accurate signal. Claude was second-best.