The Model Is Only 10%: The Real Lesson of the New SDLC

📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent Google whitepaper reveals that in AI-driven software development, the model’s size and capabilities are less critical than the surrounding harness and context engineering. This shift impacts how organizations should invest in AI systems.

A new whitepaper from Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the AI model constitutes only about 10% of the system’s behavior. The majority of the performance and reliability depend on the harness — the prompts, tools, rules, and context management surrounding the model — marking a significant shift in AI development strategies.

The paper argues that the traditional focus on acquiring the latest or largest AI models is misplaced. Instead, it emphasizes that the harness, which includes prompts, middleware, and configuration, accounts for roughly 90% of the system’s effectiveness. Evidence from public benchmarks demonstrates that tweaking the harness can dramatically improve performance, even with the same underlying model.

Furthermore, the authors introduce the concept of agentic engineering, where AI systems are structured with formal specifications, automated testing, and human oversight, moving away from casual vibe coding towards disciplined, verifiable workflows. This approach not only enhances accuracy but also reduces long-term costs, as it minimizes token waste, maintenance burdens, and security vulnerabilities.

At a glance
reportWhen: published early 2026
The developmentThe Google whitepaper highlights that the core of effective AI systems is not the model itself but the harness and context management, changing traditional development priorities.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Implications for AI Development Strategies

This insight shifts the focus for organizations from constantly chasing the newest models to investing in robust harnesses and precise context engineering. By doing so, companies can achieve higher performance at lower costs and with greater reliability. It also suggests that competitive advantage lies in the configuration and management of AI systems, not just in model size or access to cutting-edge models.

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Evolution of AI Development Practices

Prior to this, AI development often prioritized acquiring larger models, believing that bigger models equated to better performance. The whitepaper builds on recent trends showing widespread AI adoption, with 85% of developers using AI coding agents and 41% generating code with AI. It challenges the notion that model improvements alone will lead to better outcomes, emphasizing instead the importance of system architecture and context management.

“The model is only 10% of the system; the rest is how you harness and manage it.”

— Addy Osmani

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unresolved Questions About Implementation

It is not yet clear how organizations will effectively transition from model-centric to harness-centric development at scale. Specific best practices, tooling, and training for large teams remain to be established. Additionally, the long-term impact on AI innovation and model development priorities is still uncertain.

AI Integrated Software Automation Testing JAVA with Selenium: Selenium WebDriver with JAVA | Software Automation Testing with AI Tools | TestNG ... 2025 | Code with AI | Auto Coding with AI

AI Integrated Software Automation Testing JAVA with Selenium: Selenium WebDriver with JAVA | Software Automation Testing with AI Tools | TestNG … 2025 | Code with AI | Auto Coding with AI

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI Development and Adoption

Organizations are likely to invest more in developing sophisticated harnesses, including testing frameworks, context management, and configuration tools. Future research and industry standards may emerge to formalize best practices for harness design, aiming to reduce costs and improve AI system reliability.

AI-Native Platforms for Agentic Systems: A Practical Guide to Runtime Architecture, Evaluation, Governance, and Enterprise Operating Models

AI-Native Platforms for Agentic Systems: A Practical Guide to Runtime Architecture, Evaluation, Governance, and Enterprise Operating Models

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system’s behavior?

The whitepaper shows that the majority of AI system performance depends on how the model is integrated, configured, and managed — through prompts, tools, rules, and context — rather than the model size itself.

How can organizations improve their AI systems based on this insight?

Focus on building robust harnesses, refining context management, and implementing formal testing and evaluation processes to maximize the effectiveness of existing models.

Does this mean model size and access are less important?

While models remain fundamental, their size and raw capabilities are less critical than how they are employed within a well-engineered system. The whitepaper emphasizes system architecture over model complexity.

What are the risks of neglecting harness and context engineering?

Ignoring system configuration can lead to increased errors, security vulnerabilities, and higher operational costs, as failures often stem from misconfiguration rather than model deficiencies.

Will this shift affect AI research priorities?

Potentially, yes. The focus may move from developing larger models to creating better tools, frameworks, and standards for harness and context management.

Source: ThorstenMeyerAI.com

You May Also Like

Scholarship application organizer for school counselors

A new scholarship application organizer for high school counselors is being tested to streamline tracking student scholarship requirements, deadlines, and references.

Operational SOP drift detector for franchise operators

A new SOP drift detection tool for multi-location franchise operators is being tested to identify procedural deviations and maintain consistency.

IdeaClyst: The Validation Council

IdeaClyst launches a model council for idea validation, using opposing AI models to rigorously stress-test new concepts before inclusion in roadmaps.

When a Content Network Starts Publishing to Itself

A large automated publishing network is quietly favoring certain sites, causing imbalance and atrophy across the network, with no errors triggered.