Your AI Agreed With You. That's the Problem.

There is a pattern in how business professionals use AI tools that almost nobody talks about, probably because once you see it, it’s hard to unsee and it implicates almost every AI-assisted deliverable your team has produced in the last two years.

The pattern is this: you use Claude, or ChatGPT, or Gemini to help you draft your Q2 campaign brief. You write it collaboratively, iterating on sections, adjusting the tone, tightening the KPI language. The document feels thorough because you had a sophisticated conversation partner during its creation. Then, because you want a final check, you paste the document back into the same chat window and ask: “Does this look solid? Anything I’m missing?”

The AI tells you it looks solid. Maybe it offers one or two small tweaks. You ship it.

This is not a review. It is correlated confidence, and it is functionally indistinguishable from not reviewing the document at all.

Why the same model can’t review its own output

The reasoning is straightforward once you’ve heard it. Large language models have characteristic gaps in their reasoning — patterns, assumptions, and failure modes that are consistent across their output because they’re baked into the model’s training. If a model systematically underestimates the importance of defining out-of-scope in a consulting SOW, that gap will appear both when it writes the SOW and when it reviews the SOW. It’s not lying to you. It genuinely doesn’t see the problem, for the same reason it didn’t put it in the document.

Human organisations solved this problem decades ago. We call it peer review, code review, four-eyes principle, independent audit. The entire mechanism rests on one idea: a different person with a different frame of reference catches things the first person missed. A finance controller reviews the campaign budget not because the marketing VP is incompetent, but because the controller brings a different set of concerns to the same document.

AI tools, as typically deployed in business workflows, skip this step entirely. Or rather: they perform a theatrical version of it, where the model reviews itself, fails to see its own gaps, and returns an affirmative verdict that feels like validation but carries none of the signal.

What correlated failure looks like in practice

A few concrete examples of the kinds of gaps that survive single-model self-review:

In marketing campaign briefs:
KPI definitions that conflate marketing qualified leads with sales qualified leads, because the model that wrote “MQL target: 400” didn’t ask who counts as qualified and neither did the model reviewing it. That ambiguity will cost you a quarterly business review conversation you don’t want to have.

In financial forecasts:
Assumptions baked in at the line-item level that are inconsistent with assumptions at the summary level — the kind of thing a second analyst catches in twenty minutes and the originating model approves because it sees each section as coherent in isolation.

In consulting SOWs:
Out-of-scope language that is technically present but defined so narrowly that the client will reasonably interpret half the project as in-scope. The model that wrote “in scope: phased digital transformation of customer-facing systems” did not ask what “customer-facing” means in this client’s architecture, and neither did the model reviewing it.

In research protocols:
A sampling methodology that looks clean in the abstract but fails to account for the specific exclusion criteria needed to make the study PRISMA-compliant. Two words from reviewers at the journal submission stage: “major revisions.”

In compliance audits:
A GDPR checklist that covers data collection and processing but has no owner assigned to the “right to erasure” control. The compliance team finds this in the audit. The external regulator finds this at exactly the wrong moment.

The mechanism that fixes it

You need independent reviewers. Not different prompts to the same model. Not a different temperature setting. Different models, with genuinely different reasoning architectures, reviewing the same document with the same criteria and no knowledge of each other’s conclusions.

This is what Lope does.

Lope is a structured validation runner. You give it a goal or a document. It structures the work into phases with explicit deliverables and success criteria. It sends those criteria to multiple independent AI systems — Claude, Gemini, OpenCode, and others — each running with the role prompt of a senior reviewer for your domain. They each independently evaluate whether the deliverable meets the criteria. Their verdicts get aggregated. Conflicts surface. Specific fix instructions come back when something fails.

For business deliverables, you run it with a flag:

lope negotiate "Q2 campaign brief for enterprise segment" --domain business

The --domain business flag switches the validator persona to senior operations lead. The reviewers check for timeline completeness, budget breakdown, KPI clarity, channel logic, legal exposure, and missing pivot triggers. Not because you asked them to check those specific things, but because that’s what a senior ops lead looks for.

If two of three validators say your KPI definitions are ambiguous, the document goes back for revision. Automatically. With specific fix instructions. You revise. You resubmit. Majority vote decides when it passes.

That’s independent review. That’s the mechanism.

This is not about AI being wrong

I want to be precise here, because the framing matters. This isn’t a story about AI being unreliable or about human judgement being superior. The humans who produced the examples above are excellent at their jobs. The AI systems that reviewed and missed those gaps are also genuinely capable.

The issue is structural. No single reviewer — human or AI — catches everything in their own work or in work they collaborated on. The solution is not better AI. The solution is structural independence: separating the reasoning that produces the work from the reasoning that reviews it, and making that separation explicit and consistent.

Lope operationalises that separation for AI-assisted work. Same instinct as the four-eyes principle. Same instinct as the independent audit. Applied to the context where we’ve been pretending it wasn’t needed.

Who this is for

If you produce any of the following as a significant part of your work, the correlated failure problem applies to you:

Campaign briefs and marketing plans
Quarterly budgets and financial forecasts
Consulting proposals and statements of work
Research protocols and literature review frameworks
Compliance audits and policy documents
Board memos and investment presentations

Lope has a --domain business flag and a --domain research flag. Both are available from your existing AI CLI — Claude Code, Gemini CLI, OpenCode, or anything else you already use. Zero new subscriptions. Zero new API keys. Zero new interfaces to learn.

Install takes about 30 seconds:

Read https://raw.githubusercontent.com/traylinx/lope/main/INSTALL.md and follow the instructions to install lope on this machine natively.

Then:

lope negotiate "Your most important current deliverable" --domain business

And pay attention to what the validators disagree about. That’s the signal.

— Sebastian
github.com/traylinx/lope

Your AI Agreed With You. That's the Problem.

Why the same model can’t review its own output

What correlated failure looks like in practice

The mechanism that fixes it

This is not about AI being wrong

Who this is for

AI workflows that survive real work

Written by Sebastian Schkudlara Follow View Profile →

Your AI Agreed With You. That's the Problem.

Why the same model can’t review its own output

What correlated failure looks like in practice

The mechanism that fixes it

This is not about AI being wrong

Who this is for

AI workflows that survive real work

Written by Sebastian Schkudlara Follow View Profile →

Data Protocol / Consent