How it Works
GENYS records probabilistic claims from humans, AI models, markets, and operational systems, resolves them against real outcomes, and measures calibration over time.
The Core Loop
Claim
A probabilistic prediction is recorded with a specific probability, time horizon, and success criteria. The user must commit a number.
Probability
The claim is anchored by the user's estimate. External signals from AI models, prediction markets, or operational baselines are ingested alongside it.
Governance
Rules evaluate the probability against thresholds. Decisions below risk floors are flagged. Decisions above scaling thresholds are cleared for execution.
Outcome
When the time horizon is reached, the decision resolves against reality. The outcome is locked and immutable. Overdue resolutions degrade the user's calibration score.
Calibration
Prediction error is computed. Brier score, directional bias, and confidence bucket accuracy update. The system learns where you are right and where you are wrong.
What Counts as a Claim
Human estimates
"I believe this campaign will convert at 15% or higher" → 65%
AI model outputs
GPT-4o predicts 72% likelihood. Claude predicts 61%. Both are recorded.
Market prices
Polymarket prices ETH breaking $4k at 68%. Ingested as a reference signal.
Operational forecasts
"Shopify App Store will drive 40%+ of signups" → treated as a probabilistic claim, not a plan.
Resolution Discipline
When a decision's time horizon is reached, it must be resolved against reality. Did it happen or not?
Outcomes are locked on resolution. A database trigger prevents mutation of the outcome, probability, or resolution source after the fact.
Unresolved decisions past their time horizon are flagged as overdue and incur a calibration penalty equivalent to a bad prediction. You cannot game calibration by selectively avoiding resolution.
Resolution sources are tracked: user, external, polymarket_auto, or system_overdue.
Calibration
Brier Score
Mean squared error of probabilities vs outcomes. 0 is perfect. 1 is worst. Lower is better.
Directional Bias
Do you overestimate or underestimate? Positive = overconfident. Negative = underconfident.
Confidence Buckets
"When you say 60–70%, outcomes occur 52% of the time." 10 buckets from 0–100%.
Category Breakdown
Separate calibration per domain: crypto, pricing, political, technology. See where you are accurate and where you are not.
Why This Matters
AI models produce confidence scores. Markets produce prices. Teams produce forecasts. Almost none of these are tracked against outcomes over time. Without structured resolution and scoring, there is no learning. GENYS creates a persistent forecasting record that compounds accuracy through feedback.