A worked example
How Submind works
Submind is an investigator-grade verification layer for the agent economy: forward-looking predictions with auditable citation chains. Instead of explaining that in the abstract, this page walks one real-shaped question end-to-end so you can read the chain yourself. By the bottom you will know how to read any prediction Submind produces.
Along the way: how we decompose a claim, how we filter sources, what credibility tiers mean, how the probability is synthesized, what a counterfactual buys you, and how Submind inferred beyond what the sources directly said.
Will SpaceX launch a crewed mission to Mars by the end of 2030?
Well-formed, with a date the world will eventually answer
A question only enters the system if it can be checked against a real outcome. “Will SpaceX launch a crewed Mars mission by the end of 2030?” has a clear deadline (December 31, 2030) and a yes/no answer, so a future observer can grade us against what actually happened. “Is Mars worth visiting?” would be rejected here — there{’}s no way to score it.
Five smaller pieces, each one easy to research on its own
The question is split into pieces we can research separately: Starship hardware progress, regulatory clearance, life-support readiness, funding + crew, and stepping-stone missions. Each piece carries its own weight (how much the overall answer depends on it) and its own evidence. The bar chart below shows how each piece landed.
Claim breakdown
Every sub-claim with its own probability and the split of sources for vs against. A lopsided split with high confidence is worth a drill-in; balanced splits are where the engine’s aggregation did the real work.
12 sources across 9 different sites, filtered for relevance and diversity
For each piece, Submind searches the web and then runs three filters: drop results that aren{’}t actually about the question, catch sources where the headline doesn{’}t match the body, and prevent any single site (Wikipedia, wire copies) from dominating. The donut shows the mix of sources that survived the filters, grouped by how primary the source is.
Five wire-service copies of the same article count as one, not five
Each source gets a tier (primary reporting, secondary, tertiary) and an independence score that accounts for how unique it is. The example averages 0.68 — healthy. A lower number would mean we{’}re reading a filter bubble rather than a debate. The histogram shows how independent each source actually is.
22% likely, with the answer realistically anywhere from 14% to 31%
The pieces are weighted and combined with a measure of how much they agree. Hardware progress is real (~35% just for the uncrewed precursor) but every other piece pushes back hard — regulation, life-support, stepping-stones all land in the ~20–30% range. The weighted answer lands around 22%, with the range wider on the low side because the regulatory and life-support signals disagreed the most.
“Expect no” in plain English, with a short narrative and the key findings
The bottom line reads as a sentence, not a number. It names the verdict and the confidence band in one line. Below that, a short narrative explains why, and a row of finding chips shows which pieces are carrying the answer — so you can drill straight into the strongest evidence.
SpaceX has demonstrated rapid progress on Starship but has not yet flown a crewed orbital mission. Independent spaceflight analysts expect at minimum a second uncrewed Mars transit before any human landing, and the 2026–2028 launch windows look more plausible than 2030 for that uncrewed precursor. Regulatory, life-support, and crew-training timelines add years on top of the hardware track. The pipeline lands on roughly one-in-five because the hardware trajectory is real but every non-hardware sub-claim pushes back.
Re-run the math after removing the strongest sources for each side
For every piece with meaningful evidence, Submind re-runs the math with the top supporting (or opposing) sources removed — and with a few hypothetical new sources added — to report the smallest change that would flip the answer. It{’}s not LLM speculation; it{’}s the same weighted-evidence math run on a different source mix.
What would flip this verdict
baseline 28%Not speculation. Each row re-runs the weighted-evidence model with one input changed — so you can see exactly how fragile (or sturdy) the probability is.
When the deadline arrives, we read the outcome and score this prediction
On January 1, 2031 we{’}ll read the world, record what actually happened, and grade this prediction. The track record on /scores is built from those same grades — the same pipeline that produces every answer also produces the accountability number, so the accuracy score can{’}t drift from the forecasts.
Ready to try your own?
Drop a forward-looking, resolvable question and watch the pipeline run.
Ask a question →