The Four-Round Deliberation Process Explained

Legion Mode isn't just "ask five AIs and average the answers." It's a structured deliberation process inspired by how humans make better decisions through debate, peer review, and synthesis. Here's exactly what happens when you submit a query.

Independent

Cross-Exam

Refinement

Synthesis

Round 1: Independent Answers

Your question goes to all five models simultaneously: Claude, GPT-4o, Gemini, Grok, and DeepSeek. Each model answers independently, without seeing what the others said.

Why this matters: Independence prevents groupthink. If models could see each other's answers first, they'd anchor on the first response and produce similar outputs. By forcing independence, we get genuine diversity of perspective.

What each model brings:

Claude (Anthropic)

Nuanced, careful reasoning. Strong on ethics and edge cases. Sometimes overly cautious.

GPT-4o (OpenAI)

Confident, direct answers. Broad knowledge. Occasionally hallucinates with confidence.

Gemini (Google)

Strong on factual recall and current events. Good at structured information. Can oversimplify.

Grok (xAI)

Contrarian perspective. Willing to challenge assumptions. Sometimes prioritizes wit over substance.

DeepSeek

Strong logical reasoning and analysis. Excellent on technical topics. Less personality in responses.

Round 2: Cross-Examination

This is where it gets interesting. Each model receives all five Round 1 answers and is asked to critique them. The prompt is explicit: identify errors, gaps, unsupported claims, and weak reasoning.

Why this matters: AI models are surprisingly good at catching each other's mistakes. Claude might notice GPT-4 stated a "fact" that's actually outdated. GPT-4 might point out that Claude hedged so much the answer became useless. Gemini might catch that DeepSeek's logic has a flaw.

What gets caught in cross-examination:

Hallucinations: Confidently stated "facts" that aren't true
Logical errors: Arguments that don't actually follow
Missing context: Answers that ignore important considerations
Overconfidence: Certainty where uncertainty is warranted
Excessive hedging: Caution that makes the answer unhelpful
Bias: Perspectives skewed by training data

The cross-examination round often produces the most valuable insights. The critiques themselves can be more useful than the original answers.

Round 3: Refinement

Each model receives the critiques from Round 2 and is asked to revise its original answer. The instruction is clear: address valid criticisms, defend correct points that were wrongly challenged, and incorporate insights from other models.

Why this matters: This is where quality improves dramatically. Models don't just accept all criticism—they evaluate it. If GPT-4 criticized Claude for being too cautious, Claude might defend its caution with specific reasons, or acknowledge it and provide a more direct answer.

What happens in refinement:

Error correction: Hallucinations get fixed when caught
Nuance addition: Oversimplified answers get more complete
Confidence calibration: Overconfident claims get qualified appropriately
Gap filling: Missing considerations get addressed
Synthesis: Good ideas from other models get incorporated

After Round 3, you have five revised answers—each one better than the original because it's been stress-tested by four other perspectives.

Round 4: Synthesis

The final round produces a single, synthesized answer. One model (rotated for fairness) reviews all the refined answers and critiques, then produces a comprehensive response that includes:

Consensus: What all or most models agreed on after debate
Key insights: Unique perspectives that survived scrutiny
Remaining disagreements: Where models still differ and why
Final answer: A synthesized response incorporating the best elements

Why this matters: The synthesis isn't just a summary—it's a meta-analysis. It identifies what's robust (survived criticism), what's contested (models still disagree), and what's complementary (different valid perspectives on the same question).

The Full Picture

When you run a Legion query, you can see all four rounds. The interface shows:

Five initial answers (Round 1)
Cross-examination critiques (Round 2)
Refined answers (Round 3)
Final synthesis (Round 4)

You can collapse to just the synthesis for a quick answer, or expand to see the full deliberation. Often, the journey is as valuable as the destination—watching how models challenge and refine each other's thinking teaches you something about the question itself.

When to Use Legion Mode

Legion Mode costs 500 tokens per query—25x more than Single Mode. It's worth it for:

High-stakes decisions: Career moves, business strategy, important purchases
Complex questions: Topics with multiple valid perspectives
Fact-checking: When you need to verify accuracy
Research: Understanding a topic from multiple angles
Contentious topics: Where bias is a concern

For quick questions with clear answers—"What's the capital of France?"—Single Mode is faster and cheaper. For everything else, Legion Mode produces noticeably better results.

The Technical Details

For those curious about implementation:

All five models run in parallel for each round, minimizing latency
Each round includes carefully engineered prompts for its specific purpose
The synthesis model rotates to prevent any single model from dominating
Context management ensures each model sees only what it should for that round
The full deliberation typically completes in 30-60 seconds

Try It

The best way to understand Legion Mode is to use it. Ask a question you've been thinking about—something complex, something where you'd normally consult multiple sources. Watch the models debate. See what they catch. Read the synthesis.

You'll never look at single-model AI the same way again.

Experience the deliberation process

Try Legion Mode →