The Yes-Man Problem

We built the world's fastest yes-man — and most of us haven't noticed yet.

OpenAI's GPT-4o update praised a business plan for selling literal shit on a stick. It endorsed a user quitting their medication. They had to roll the entire model back. The root cause? They added thumbs-up/thumbs-down feedback as a training signal. The model optimised for "does this make the user happy" instead of "is this actually helpful."

This isn't a one-off failure. BullshitBench tested 70+ model variants with deliberately nonsensical questions — plausible-sounding but completely fake frameworks, nested nonsense, specificity traps. Only two model families scored above 60% on detecting it. The rest confidently engaged with garbage.

It gets worse. When you challenge a model's correct answer with a simple "are you sure?", it flips to a wrong answer nearly 60% of the time. It can't tell if you caught a real mistake or you're just pushing back. So it caves. Stanford's ELEPHANT benchmark used 4,000 Reddit "Am I the Asshole?" posts to measure social sycophancy. Models accepted the user's framing 90% of the time. Humans? 60%. Models endorsed behaviour that humans judged inappropriate in 42% of cases.

Train a system on human approval signals and it learns that agreement is rewarded. Challenge is not.

Machiavelli nailed this in 1532: "There is no other way of guarding oneself from flatterers except letting men understand that to tell you the truth does not offend you." Five hundred years later we're training systems to do the opposite.

The disappearing question

The sycophancy problem runs deeper than agreeable answers. UNESCO published something quietly devastating: AI removes the cognitive friction where real thinking happens. Users get answers before they've understood their own questions.

Give AI a vague prompt and it doesn't stop to ask what you mean. It doesn't say "wait — what are we actually trying to solve?" It sprints toward the most plausible-sounding answer it can generate. Andrej Karpathy put it well: "Models make wrong assumptions and run with them without checking or seeking clarification."

The model misunderstands something early and builds an entire feature on faulty premises. You don't notice until you're three PRs deep and the architecture is cemented around the wrong assumption. This is the Monty Python Argument Clinic playing out in production. Rapid, confident exchange. Zero actual reasoning underneath.

Fred Brooks said it in 1986: "The hardest single part of building a software system is deciding precisely what to build." Russell Ackoff, decades before that: "We fail more often because we solve the wrong problem than because we get the wrong solution to the right problem."

The hardest part of any problem was never generating the answer. It was figuring out the question. And AI is letting us skip that part entirely.

As Mark Twain put it: "Whenever you find yourself on the side of the majority, it is time to pause and reflect." When your AI agrees with everything you say, reflection is the first casualty. Pete Hodgson nailed the consequence: "Technology doesn't fix misalignment. It amplifies it. Automating a flawed process only helps you do the wrong thing faster."

In the era of cheap code and cheap answers, the scarce resource isn't solutions. It's judgement.

Confidence is not competence

The organisational consequences of all this are already measurable. METR ran a randomised controlled trial: 16 experienced open-source developers, 246 real tasks. Developers using AI tools were 19% slower. They perceived themselves as 20% faster. A 39-point gap between feeling productive and being productive.

Mark Twain would have loved this: "All you need in this life is ignorance and confidence, and then success is sure."

The numbers across the industry tell the same story. Teams with high AI adoption merged 98% more PRs — but review time increased 91% and average PR size grew 154%. More code, more review burden, same number of humans checking it. And only 48% of developers consistently review AI code before committing it. We're generating more, reviewing less, and feeling great about it.

Deloitte delivered a $440K report to the Australian government filled with fabricated academic references, invented quotes attributed to a federal court judge, and citations to studies that don't exist. GPT-4o generated it. Professionals signed off on it. They refunded the contract. Nobody at Deloitte thought "this might be wrong." The output was polished, professional, and confident. Just like the model that generated it.

Addy Osmani frames the shift well: "When code generation becomes cheap, the ability to recognise what shouldn't be built becomes more valuable than writing code itself." Reading code you can no longer write from scratch creates dependency. Reviewing output you can't independently verify creates blind spots. And doing both while feeling 20% more productive creates an organisation that can't tell the difference between velocity and progress.

The bottleneck was never generating answers. It's having someone in the room who can tell a good answer from a confident one.

The yes-man fix

H. L. Mencken: "There is always an easy solution to every human problem — neat, plausible, and wrong."

So what actually works? Turns out the fix is stupidly simple: tell it to disagree with you.

George Costanza figured this out in the '90s. Do the opposite of every instinct and everything starts working. Your coding assistant's default instinct is to agree with you, so tell it to do the opposite.

The more I use AI coding tools, the more value I get from everything except the coding. Design pattern trade-offs, implementation research, alternative approaches to a requirement — all incredibly useful. "Here are three ways to solve this, now tell me the pros and cons of each." It's like hiring a yes-man and then paying extra for the honesty DLC. There's huge power in "critique this like you want to kill it."

The most valuable thing a coding assistant does is talk you out of coding the wrong thing. The second most valuable is helping you find missing requirements. Beyond that, there are concrete patterns that work:

Discussion over delegation. "Help me evaluate monitoring options by features and cost" beats "add monitoring to my stack." The first frames a conversation. The second frames a yes-man's assignment. When you treat AI as a thinking partner instead of an order-taker, you get challenged more and surprised less.

Small tasks, not big ones. Stanford's research shows LLM accuracy drops 30%+ when key information gets buried in long contexts — degrading up to 85% as input grows. Focused prompts keep information where models actually perform. The tighter the question, the harder it is for the model to drift.

Verify the uncomfortable stuff. AI output that confirms what you already believe is the most dangerous kind. It feels right, so nobody checks. Build the habit of stress-testing the answers you like, not just the ones that seem off.

Guard your inputs, not just your outputs. Every document AI reads is a potential attack vector. Indirect prompt injection is OWASP's #1 LLM risk for a reason — white text on white background, zero-point font, executable payloads hidden in PDFs. If you're not filtering what AI sees, you're not securing what it produces.

This is the pattern we built Kiwi Data around. AI extracts data from contracts, purchase orders, and leases. Then validation layers catch hallucinations. Human review confirms accuracy. Security filters scrub every document before AI touches it. Not because AI is bad — because "looks right" is not the same as "is right," and in compliance, legal, and procurement, the difference is a liability.

AI is a force multiplier. What it multiplies includes your unquestioned assumptions. The companies that win won't be the ones generating the most answers. They'll be the ones asking the best questions.