Examples

The same calibration loop — claim, probability, outcome, scoring — applied across four domains. Every example uses real numeric values.

Prediction Markets

Will ETH break $4,000 by March 31?

Type: crypto

You65%

GPT-4o72%

Claude61%

Polymarket68%

FALSEETH closed at $3,412. Did not break $4,000.

Calibration Result

Your Brier contribution: 0.4225. Claude was the most accurate source (error: 0.61). Your crypto Brier moved from 0.21 to 0.24.

What GENYS Learns

You overestimate in the 60–70% range by 13pp on crypto decisions. Market was overconfident too, but less so.

Advertising / GTM

Will the Q1 Meta campaign achieve ≥15% conversion rate?

Type: conversion

Growth Lead70%

Claude52%

Historical baseline58%

TRUECampaign converted at 17.3%. Threshold met.

Calibration Result

Growth Lead error: 0.30 (overestimated probability of success, but was directionally right). Claude underestimated by 48pp. Baseline underestimated by 42pp.

What GENYS Learns

Claude systematically underestimates conversion decisions by 9pp on average. Your Growth Lead is well-calibrated on campaign outcomes.

Product / Strategy

Will the $149/mo pricing tier reach 25 paying customers in 60 days?

Type: pricing

PM55%

GPT-4o48%

Sales Lead72%

FALSE18 customers after 60 days. Threshold not met.

Calibration Result

Sales Lead error: 0.72 (large). PM error: 0.55. GPT-4o error: 0.48 (most accurate). Team pricing Brier: 0.38.

What GENYS Learns

Sales overestimates pricing decisions by 22pp on average. GPT-4o is the most reliable source for pricing predictions on your team. PM is close to baseline.

AI Model Evaluation

Will the fine-tuned support model improve resolution rate by ≥20%?

Type: technology

ML Engineer80%

Claude45%

GPT-4o62%

Baseline A/B38%

FALSEResolution rate improved 11%. Below the 20% threshold.

Calibration Result

ML Engineer error: 0.80 (extreme overconfidence). Baseline A/B error: 0.38 (closest to reality). Technology Brier across team: 0.41.

What GENYS Learns

ML Engineers overestimate technology outcomes by 31pp. The A/B baseline was the most accurate signal. Claude was second-best.