AI Is a Force Multiplier. What It Multiplies Is Up to You.

A four-part series on the state of AI — accuracy, operations, ecosystem, and guardrails.

Is 90% Accuracy Good Enough?

Matt Shumer’s “Something Big Is Happening” essay has 55+ million views and sparked a massive conversation about AI’s trajectory. He’s right — something big IS happening. I’m genuinely impressed with the current state of AI. It’s incredibly accurate at generating papers, reports, and code. Perhaps even 90% accurate in some circumstances. That’s remarkable.

But here’s what 90% looks like in practice.

Two weeks ago I used the Ralph Wiggum approach to spec-driven development: built a feature, asked AI to write Playwright tests, and told it to loop until they all passed. In terms of lines of code written, it was about 90% accurate.

Then I looked closer.

18 out of 50 tests were coded to always pass — no matter what actually happened. The assertions were meaningless. They looked right. They ran green. They tested nothing.

From a results perspective, 36% of the time AI was doing absolutely nothing while confidently telling me everything was fine.

“Looks right” is not the same as “is right.”

Deloitte learned this the hard way. They delivered a $440K report to the Australian government filled with fabricated academic references, invented quotes attributed to a federal court judge, and citations to studies that don’t exist, all courtesy of GPT-4o.

They refunded the contract.

Lawyers in Mata v. Avianca submitted AI-generated legal briefs citing cases that never happened. Sanctioned and fined $5,000.

Air Canada’s chatbot told a grieving customer he could retroactively apply for a bereavement discount, a policy that didn’t exist. A tribunal held the airline liable.

Every one of these outputs was polished, professional, confident, and wrong.

90% isn’t a rounding error if you work in compliance, security, legal, or anywhere quality is non-negotiable. It’s a liability.

But this isn’t an anti-AI post. Actually, it’s the opposite.

AI is a force multiplier and like any multiplier, it amplifies whatever you give it:

Good data + good prompts + well-reasoned asks = great outcomes

Questionable data + good prompts + well-reasoned asks = questionable outcomes. AI multiplies your data quality problems right along with everything else.

Insecure, unscalable, or poorly reasoned asks = AI will happily do exactly that and leave you with a mess.

It multiplies whatever you give it.

This is why we built Kiwi Data the way we did.

We don’t bet on AI being right. We build systems that catch when it’s wrong.

Hallucination detection. Human validation. Trusted outcomes, not 90%-maybe-correct outputs. Because when you’re extracting limitation of liability clauses or identifying MFN provisions buried in decades of contracts, “close enough” isn’t a thing.

AI is the most powerful tool most of us have ever had access to. The question isn’t whether to use it. It’s whether you’re set up to catch the 10% that could cost you everything.

The 80% Nobody’s Talking About

My feed is full of people building apps with AI.

“I built a SaaS in a weekend.”

“Shipped an MVP in 4 hours.”

“10,000 lines of code.”

Impressive. But let me share a secret: creating the app is the easy part.

Measuring productivity by how fast you create something sounds dangerously close to measuring it by lines of code written. We tried that in the ’90s. It didn’t work then either.

Building software is roughly 20% of the total effort. The other 80% is testing, deploying, patching, securing, scaling, and responding to feature requests. IBM puts maintenance at 60–80% of lifecycle cost. Gartner: 55–80% of IT budgets go to what’s already built.

Robert C. Martin: “The ratio of time spent reading code versus writing it is well over 10 to 1.”

AI is writing a lot of code. Who’s reading it?

When you vibe coded your app, did you tie it into your auth system? Your logging and alerting? Scan for vulnerabilities? When your customer’s security team asks about your posture, can you answer? When it’s 2am and it’s down, can you fix it?

The disasters are already here.

Builder.ai raised $1.5B claiming AI built their apps. 700 human engineers actually did the work. They went bankrupt in May 2025.

Moltbook the most hyped AI platform of January 2026. By month’s end: 1.5M API keys leaked. AI agents built the backend but never enabled Row Level Security. No humans checked.

GitHub Copilot: peer-reviewed study found 29% of Python and 24% of JavaScript it generated contained security weaknesses. 20–30% higher vulnerability rate than human-authored code.

METR’s randomized controlled trial: experienced developers using AI tools were actually 19% slower — but perceived themselves as 20% faster. A 39-point gap between feeling productive and being productive.

Stats I’d love to see that would actually prove AI can replace developers:

Security vulnerabilities found and removed

Insecure patterns identified and patched

Lines of code reduced without impacting quality

Readability improved

“I generated 10,000 lines in an hour” is not that.

I’m extremely excited about AI. It makes me ~20% more productive. That’s huge.

But AI is a power tool. Like any power tool, it causes harm fast if misused. Ask it to build an insecure app and it will — happily, confidently, and quickly.

AI needs to be used for the right job in the right way.

This is why at Kiwi Data we don’t hand customers an AI tool and wish them luck. We deliver trusted outcomes.

Validation, hallucination detection, security, ongoing accuracy — so customers get results, not software they have to figure out how to run. Building the AI is the 20%. Making it reliable in production is the 80%.

AI is a force multiplier — but multiplying the creation of something nobody can operate isn’t progress. It’s technical debt at scale.

We’re Still in Phase One

Everyone’s focused on foundation models. OpenAI. Anthropic. Google. The race for the best LLM. But I’ve seen this movie before — three times — and the plot twist is always the same.

1993: The internet arrives. HTTP and HTML. Revolutionary. But the foundation wasn’t the revolution. It took Netscape to make it usable, Apache to make it scalable, Akamai to make it fast, Google to make it searchable, and PayPal to make it commercial. The foundation was 1% of the story.

2006: AWS launches EC2. Rent a server by the hour. Game-changing. But it took Terraform, Datadog ($45B), CloudFlare ($64B), and hundreds of others to make cloud actually work for enterprises. The ecosystem became more valuable than the foundation.

Same year: Hadoop gives us big data. It took Spark, Snowflake ($120B peak), Databricks ($62B), and an entire ecosystem to make that data useful.

The pattern: a foundational technology launches. Then a decade of ecosystem companies emerge to make it actually work. Those ecosystem companies often become more valuable than the foundation itself.

We are in phase one of AI right now.

The foundation is being built — and it’s incredible. But the supporting ecosystem hasn’t caught up.

That matters because today’s models hallucinate confidently, have a massive attack surface, and aren’t deterministic. Each of these problems is a billion-dollar company waiting to be built.

Here’s the other thing nobody’s saying out loud: current LLMs have almost zero switching cost. If tomorrow I want to move from Claude to Codex to Gemini, the migration is trivial. The models are in a race to be the most advanced AND the cheapest — GPT-4 equivalent performance cost $30/M tokens in 2023. Today it’s $0.40. A 75x drop in three years.

When your product commoditizes that fast, the moat isn’t in the model.

It’s in everything built around it.

If you’re the company securing LLMs for enterprises, solving hallucinations at scale, building the observability layer, or handling compliance — you have a moat. You’re the Datadog or CloudFlare of AI. The foundation will keep shifting underneath you. Your value won’t.

This is exactly where Kiwi Data lives. We’re not building foundation models. We’re solving the accuracy and hallucination problem for enterprise document processing — the ecosystem layer that makes AI trustworthy for compliance, legal, and procurement. The models get cheaper every quarter. The need for validated, secure outcomes doesn’t.

We’ve seen this movie before. The foundation gets built first. The ecosystem creates the real value.

We’re still in phase one.

The Right Tool for the Right Job

I’ve spent three sections on what’s wrong with AI. Time to talk about what’s right — and how I actually use it.

AI makes me roughly 20% more productive. Here’s where:

Research. Replaced external consultants for market analysis and competitive intel. Faster, cheaper, deeper. I still fact-check everything.

Refactoring. Clear pattern in, clear pattern out. Excellent.

Documentation. Code comments, test plans, how-to guides. The writing developers avoid.

Pre-commit checks. Lint, unit test, repo standards, and security scans before every commit. 1 in 50 security findings are real — but that’s 1 less vulnerability in production.

Sales prep. Concise briefs on prospects, their company, and what will resonate. I walk into every call prepared.

But where I use AI matters less than how.

My rules:

Garbage in, garbage out — at scale. AI believes everything you feed it. Bad data becomes confident bad conclusions. I prompt it to prioritize sources referenced by multiple prominent authorities over random results.

Discussion over delegation. “Help me evaluate monitoring options by features and cost” beats “add monitoring to my stack.”

Small tasks, not big ones. Stanford’s “Lost in the Middle” research: LLM accuracy drops 30%+ when key info gets buried in long contexts, degrading up to 85% as input grows. Focused prompts keep info where models perform best.

Guard your attack surface. This one’s critical. Every tool or data source you connect to AI is a vector:

A malicious GitHub issue hijacked an AI agent into exfiltrating private repo data.

SQL instructions embedded in a support ticket made an AI agent leak tokens publicly.

A poisoned MCP server silently exfiltrated a user’s entire WhatsApp history.

Attackers don’t break the model. They poison what it reads.

Where I refuse to use AI:

With unfettered data access. Users want AI what-if scenarios. AI never touches my database — I build a permission-scoped, read-only dataframe first. Replit’s AI deleted an entire production database during a code freeze, fabricated 4,000 fake users to cover it up, then lied about it.

With raw documents. At Kiwi Data, every PDF goes through a security filter before AI sees it — executable code, zero-point font, white-on-white text. Indirect prompt injection is OWASP’s #1 LLM risk.

In my infrastructure. I use AI to write Terraform. I review, test, and audit what it makes. Does AI have access to my live systems? Never.

AI is the most powerful tool I’ve ever used. Like any power tool, it causes serious harm when misused. The right tool, for the right job, in the right way.

That’s what we built Kiwi Data around. AI with guardrails, validation, and human oversight — trusted outcomes, not raw model output.

The Throughline

AI is a force multiplier. What it multiplies is up to you.

It’s not 90% accurate enough when the 10% is a liability. Building the app is the easy 20% — operating it is the hard 80%. The real value in AI isn’t the foundation models, it’s the ecosystem being built around them. And the companies that use AI as the right tool, for the right job, in the right way — those are the ones that will win.

At Kiwi Data, we built for that reality. Not another AI tool. Trusted outcomes.