A worked example

How Submind works

Submind is an investigator-grade verification layer for the agent economy: forward-looking predictions with auditable citation chains. Instead of explaining that in the abstract, this page walks one real-shaped question end-to-end so you can read the chain yourself. By the bottom you will know how to read any prediction Submind produces.

Along the way: how we decompose a claim, how we filter sources, what credibility tiers mean, how the probability is synthesized, what a counterfactual buys you, and how Submind inferred beyond what the sources directly said.

Curated teaching example. This is not a live prediction — it's a hand-verified walk-through of the pipeline on a single question. Every widget below is the same component the real result page renders.

The question

Will SpaceX launch a crewed mission to Mars by the end of 2030?

22%

Direct answer

Expect no. A small-but-real hardware trajectory is fighting every non-hardware constraint, and the non-hardware constraints win on a five-year clock.

Step 1 — The question

Well-formed, with a date the world will eventually answer

A question only enters the system if it can be checked against a real outcome. “Will SpaceX launch a crewed Mars mission by the end of 2030?” has a clear deadline (December 31, 2030) and a yes/no answer, so a future observer can grade us against what actually happened. “Is Mars worth visiting?” would be rejected here — there{’}s no way to score it.

Step 2 — Break it into pieces

Five smaller pieces, each one easy to research on its own

The question is split into pieces we can research separately: Starship hardware progress, regulatory clearance, life-support readiness, funding + crew, and stepping-stone missions. Each piece carries its own weight (how much the overall answer depends on it) and its own evidence. The bar chart below shows how each piece landed.

Claim breakdown

Every sub-claim with its own probability and the split of sources for vs against. A lopsided split with high confidence is worth a drill-in; balanced splits are where the engine’s aggregation did the real work.

(untitled claim)

35%

for 1against 2

(untitled claim)

18%

for 0against 2

(untitled claim)

17%

for 0against 3

(untitled claim)

44%

for 1against 1

(untitled claim)

30%

for 0against 2

Step 3 — Gather the evidence

12 sources across 9 different sites, filtered for relevance and diversity

For each piece, Submind searches the web and then runs three filters: drop results that aren{’}t actually about the question, catch sources where the headline doesn{’}t match the body, and prevent any single site (Wikipedia, wire copies) from dominating. The donut shows the mix of sources that survived the filters, grouped by how primary the source is.

Source credibility mix — 12 unique source(s)

Primary650%

Secondary650%

Tertiary00%

Unknown00%

Step 4 — Weigh each source

Five wire-service copies of the same article count as one, not five

Each source gets a tier (primary reporting, secondary, tertiary) and an independence score that accounts for how unique it is. The example averages 0.68 — healthy. A lower number would mean we{’}re reading a filter bubble rather than a debate. The histogram shows how independent each source actually is.

Independence distribution · avg 76%

Step 5 — Add it up

22% likely, with the answer realistically anywhere from 14% to 31%

The pieces are weighted and combined with a measure of how much they agree. Hardware progress is real (~35% just for the uncrewed precursor) but every other piece pushes back hard — regulation, life-support, stepping-stones all land in the ~20–30% range. The weighted answer lands around 22%, with the range wider on the low side because the regulatory and life-support signals disagreed the most.

Probability

CI low 14%CI high 31%

Step 6 — Write the answer

“Expect no” in plain English, with a short narrative and the key findings

The bottom line reads as a sentence, not a number. It names the verdict and the confidence band in one line. Below that, a short narrative explains why, and a row of finding chips shows which pieces are carrying the answer — so you can drill straight into the strongest evidence.

Bottom line

Lean no (22%, Grade F)

SpaceX has demonstrated rapid progress on Starship but has not yet flown a crewed orbital mission. Independent spaceflight analysts expect at minimum a second uncrewed Mars transit before any human landing, and the 2026–2028 launch windows look more plausible than 2030 for that uncrewed precursor. Regulatory, life-support, and crew-training timelines add years on top of the hardware track. The pipeline lands on roughly one-in-five because the hardware trajectory is real but every non-hardware sub-claim pushes back.

Key findings

Supports the answer · 3 sources

Supports the answer · 2 sources

Supports the answer · 3 sources

Supports the answer · 2 sources

Step 7 — What would flip this

Re-run the math after removing the strongest sources for each side

For every piece with meaningful evidence, Submind re-runs the math with the top supporting (or opposing) sources removed — and with a few hypothetical new sources added — to report the smallest change that would flip the answer. It{’}s not LLM speculation; it{’}s the same weighted-evidence math run on a different source mix.

What would flip this verdict

baseline 28%

Not speculation. Each row re-runs the weighted-evidence model with one input changed — so you can see exactly how fragile (or sturdy) the probability is.

Remove top source (nationalacademies.org) from claim #3

weight 0.63 · against

17%→2%

Δ -15%

Flip claim #3

28%→42%

Δ +14%

Add 4 for-sources at credibility 0.7 to claim #3

17%→56%

Δ +39%

Step 8 — Score ourselves when the world answers

When the deadline arrives, we read the outcome and score this prediction

On January 1, 2031 we{’}ll read the world, record what actually happened, and grade this prediction. The track record on /scores is built from those same grades — the same pipeline that produces every answer also produces the accountability number, so the accuracy score can{’}t drift from the forecasts.

Ready to try your own?

Drop a forward-looking, resolvable question and watch the pipeline run.

Ask a question →