The Four-Round Deliberation Process Explained
Legion Mode isn't just "ask five AIs and average the answers." It's a structured deliberation process inspired by how humans make better decisions through debate, peer review, and synthesis. Here's exactly what happens when you submit a query.
Independent
Cross-Exam
Refinement
Synthesis
Round 1: Independent Answers
Your question goes to all five models simultaneously: Claude, GPT-4o, Gemini, Grok, and DeepSeek. Each model answers independently, without seeing what the others said.
Why this matters: Independence prevents groupthink. If models could see each other's answers first, they'd anchor on the first response and produce similar outputs. By forcing independence, we get genuine diversity of perspective.
What each model brings:
Claude (Anthropic)
Nuanced, careful reasoning. Strong on ethics and edge cases. Sometimes overly cautious.
GPT-4o (OpenAI)
Confident, direct answers. Broad knowledge. Occasionally hallucinates with confidence.
Gemini (Google)
Strong on factual recall and current events. Good at structured information. Can oversimplify.
Grok (xAI)
Contrarian perspective. Willing to challenge assumptions. Sometimes prioritizes wit over substance.
DeepSeek
Strong logical reasoning and analysis. Excellent on technical topics. Less personality in responses.
Round 2: Cross-Examination
This is where it gets interesting. Each model receives all five Round 1 answers and is asked to critique them. The prompt is explicit: identify errors, gaps, unsupported claims, and weak reasoning.
Why this matters: AI models are surprisingly good at catching each other's mistakes. Claude might notice GPT-4 stated a "fact" that's actually outdated. GPT-4 might point out that Claude hedged so much the answer became useless. Gemini might catch that DeepSeek's logic has a flaw.
What gets caught in cross-examination:
- Hallucinations: Confidently stated "facts" that aren't true
- Logical errors: Arguments that don't actually follow
- Missing context: Answers that ignore important considerations
- Overconfidence: Certainty where uncertainty is warranted
- Excessive hedging: Caution that makes the answer unhelpful
- Bias: Perspectives skewed by training data
The cross-examination round often produces the most valuable insights. The critiques themselves can be more useful than the original answers.
Round 3: Refinement
Each model receives the critiques from Round 2 and is asked to revise its original answer. The instruction is clear: address valid criticisms, defend correct points that were wrongly challenged, and incorporate insights from other models.
Why this matters: This is where quality improves dramatically. Models don't just accept all criticism—they evaluate it. If GPT-4 criticized Claude for being too cautious, Claude might defend its caution with specific reasons, or acknowledge it and provide a more direct answer.
What happens in refinement:
- Error correction: Hallucinations get fixed when caught
- Nuance addition: Oversimplified answers get more complete
- Confidence calibration: Overconfident claims get qualified appropriately
- Gap filling: Missing considerations get addressed
- Synthesis: Good ideas from other models get incorporated
After Round 3, you have five revised answers—each one better than the original because it's been stress-tested by four other perspectives.
Round 4: Synthesis
The final round produces a single, synthesized answer. One model (rotated for fairness) reviews all the refined answers and critiques, then produces a comprehensive response that includes:
- Consensus: What all or most models agreed on after debate
- Key insights: Unique perspectives that survived scrutiny
- Remaining disagreements: Where models still differ and why
- Final answer: A synthesized response incorporating the best elements
Why this matters: The synthesis isn't just a summary—it's a meta-analysis. It identifies what's robust (survived criticism), what's contested (models still disagree), and what's complementary (different valid perspectives on the same question).
The Full Picture
When you run a Legion query, you can see all four rounds. The interface shows:
- Five initial answers (Round 1)
- Cross-examination critiques (Round 2)
- Refined answers (Round 3)
- Final synthesis (Round 4)
You can collapse to just the synthesis for a quick answer, or expand to see the full deliberation. Often, the journey is as valuable as the destination—watching how models challenge and refine each other's thinking teaches you something about the question itself.
When to Use Legion Mode
Legion Mode costs 500 tokens per query—25x more than Single Mode. It's worth it for:
- High-stakes decisions: Career moves, business strategy, important purchases
- Complex questions: Topics with multiple valid perspectives
- Fact-checking: When you need to verify accuracy
- Research: Understanding a topic from multiple angles
- Contentious topics: Where bias is a concern
For quick questions with clear answers—"What's the capital of France?"—Single Mode is faster and cheaper. For everything else, Legion Mode produces noticeably better results.
The Technical Details
For those curious about implementation:
- All five models run in parallel for each round, minimizing latency
- Each round includes carefully engineered prompts for its specific purpose
- The synthesis model rotates to prevent any single model from dominating
- Context management ensures each model sees only what it should for that round
- The full deliberation typically completes in 30-60 seconds
Try It
The best way to understand Legion Mode is to use it. Ask a question you've been thinking about—something complex, something where you'd normally consult multiple sources. Watch the models debate. See what they catch. Read the synthesis.
You'll never look at single-model AI the same way again.
Experience the deliberation process
Try Legion Mode →