AI

Why most enterprise AI coding pilots underperform (Hint: It's not the model)

Gen AI in software engineering has gone far beyond just autocomplete. The emerging frontier is agentic coding: AI systems capable of planning changes, executing them in multiple steps, and iterating based on feedback. But despite the excitement around “AI agents that code,” most enterprise implementations are underperforming. The limiting factor is no longer the model. Are context: The structure, history, and intent surrounding the code being changed. In other words, companies now face a system design problem: they have not yet developed the environment in which these agents operate.

The shift from assistance to agency

The past year has seen a rapid evolution from supportive coding tools to agentic workflows. Research is beginning to formalize what agentic behavior means in practice: the ability to reason about design, testing, execution, and validation rather than generating isolated fragments. Work like dynamic action resampling shows that allowing agents to branch, reconsider, and revise their own decisions significantly improves results in large, interdependent codebases. At the platform level, providers like GitHub are now building dedicated agent orchestration environments, such as Co-pilot agent and agent headquartersto support multi-agent collaboration across real enterprise pipelines.

But early field results tell a cautionary tale. When organizations introduce agentic tools without paying attention to the workflow and environment, productivity can decline. A randomized control trial this year found that developers using AI support in unmodified workflows completed tasks more slowly, largely due to verification, rework, and confusion around intent. The lesson is clear: autonomy without orchestration rarely delivers efficiency.

Why context engineering is the real enabler

In every implementation failure I’ve observed, the failure came from context. When agents don’t have a structured understanding of a codebase, especially its relevant modules, dependency graph, test harness, architectural conventions, and change history. They often generate output that seems correct, but is disconnected from reality. Too much information overwhelms the agent; too little forces you to guess. The goal is not to give the model more tokens. The goal is to determine what should be visible to the agent, when and in what form.

See also  The case for embedding audit trails in AI systems before scaling

The teams that see meaningful gains treat the context as a technical surface. They create tools to snapshot, compress, and version the agent’s working memory: what is retained across turns, what is discarded, what is summarized, and what is linked instead of inline. They design consultation steps rather than triggering sessions. They make the specification a first-class artifact, something that can be reviewed, tested, and owned, not a passing chat history. This shift is in line with a broader trend that some researchers describe as “specifications becoming the new source of truth.”

The workflow must change along with the tooling

But context alone is not enough. Companies need to redesign the workflows around these agents. If McKinsey’s 2025 report “One year of Agentic AI” As noted, productivity gains come not from layering AI on top of existing processes, but from rethinking the process itself. When teams simply drop an agent into an unmodified workflow, friction is created: engineers spend more time verifying AI-written code than they would have spent writing it themselves. The agents can only strengthen what is already structured: well-tested, modular codebases with clear ownership and documentation. Without these foundations, autonomy becomes chaos.

Safety and governance also require a change in mentality. AI-generated code introduces new forms of risk: unvetted dependencies, subtle license violations, and undocumented modules that escape peer review. Mature teams are beginning to integrate agent activity directly into their CI/CD pipelines, treating agents as autonomous contributors whose work must pass the same static analysis, audit logging, and approval gates as any human developer. GitHub’s own documentation highlights this journey, positioning Copilot Agents not as replacements for engineers, but as orchestrated participants in secure, auditable workflows. The goal is not to make an AI ‘write everything’, but to ensure that when it acts, it does so within certain guardrails.

See also  OpenAI is aiming for AGI but landing on Studio Ghibli

What business decision makers should focus on now

For tech leaders, the path forward starts with preparedness rather than hype. Monoliths with sparse tests rarely produce net gains; Agents thrive where testing is authoritative and can drive iterative refinement. This is exactly the loop Anthropic calls for cipher agents. Pilots in tightly defined domains (test generation, legacy modernization, isolated refactoring); treat every deployment as an experiment with explicit metrics (defect escape rate, PR cycle time, change failure rate, security findings burnout). As your usage grows, you can treat agents like a data infrastructure: every plan, context snapshot, action log, and test run is data that comes together into a searchable memory of technical intent and sustainable competitive advantage.

Under the hood, agentic encryption is not so much a tool problem as it is a data problem. Every context snapshot, test iteration, and code revision becomes a form of structured data that must be stored, indexed, and reused. As these agents expand, companies will find themselves managing an entirely new layer of data: one that captures not just what was built, but how it was reasoned about. This shift turns technical logs into a knowledge graph of intent, decision-making, and validation. Over time, the organizations that can search and play back this contextual memory will surpass those that still view code as static text.

The coming year will likely determine whether agentic coding becomes a cornerstone of business development or another inflated promise. The difference will depend on context engineering: how intelligently teams design the information substrate their agents rely on. The winners will be those who see autonomy not as magic, but as an extension of disciplined system design: clear workflows, measurable feedback and rigorous governance.

See also  Zencoder drops Zenflow, a free AI orchestration tool that pits Claude against OpenAI’s models to catch coding errors

In short

Platforms are converging on orchestration and guardrails, and research continues to improve context control at the time of inference. The winners of the next 12 to 24 months won’t be the teams with the most distinctive model; they will be the ones who develop context as an asset and treat the workflow as the product. Do that, and autonomy increases. Skip it, and the review queue does too.

Context + agent = leverage. Skip the first half and the rest collapses.

Dhyey Mavani accelerates generative AI on LinkedIn.

Read more of our guest writers. Or consider posting yourself! See our guidelines here.

Source link

Back to top button