Why most enterprise AI coding pilots underperform (Hint: It's not the model)

2 4 minutes read

Gen AI in software engineering has gone far beyond just autocomplete. The emerging frontier is agentic coding: AI systems capable of planning changes, executing them in multiple steps, and iterating based on feedback. But despite the excitement around “AI agents that code,” most enterprise implementations are underperforming. The limiting factor is no longer the model. Are context: The structure, history, and intent surrounding the code being changed. In other words, companies now face a system design problem: they have not yet developed the environment in which these agents operate.

The shift from assistance to agency

The past year has seen a rapid evolution from supportive coding tools to agentic workflows. Research is beginning to formalize what agentic behavior means in practice: the ability to reason about design, testing, execution, and validation rather than generating isolated fragments. Work like dynamic action resampling shows that allowing agents to branch, reconsider, and revise their own decisions significantly improves results in large, interdependent codebases. At the platform level, providers like GitHub are now building dedicated agent orchestration environments, such as Co-pilot agent and agent headquartersto support multi-agent collaboration across real enterprise pipelines.

But early field results tell a cautionary tale. When organizations introduce agentic tools without paying attention to the workflow and environment, productivity can decline. A randomized control trial this year found that developers using AI support in unmodified workflows completed tasks more slowly, largely due to verification, rework, and confusion around intent. The lesson is clear: autonomy without orchestration rarely delivers efficiency.

Why context engineering is the real enabler

In every implementation failure I’ve observed, the failure came from context. When agents don’t have a structured understanding of a codebase, especially its relevant modules, dependency graph, test harness, architectural conventions, and change history. They often generate output that seems correct, but is disconnected from reality. Too much information overwhelms the agent; too little forces you to guess. The goal is not to give the model more tokens. The goal is to determine what should be visible to the agent, when and in what form.

The teams that see meaningful gains treat the context as a technical surface. They create tools to snapshot, compress, and version the agent’s working memory: what is retained across turns, what is discarded, what is summarized, and what is linked instead of inline. They design consultation steps rather than triggering sessions. They make the specification a first-class artifact, something that can be reviewed, tested, and owned, not a passing chat history. This shift is in line with a broader trend that some researchers describe as “specifications becoming the new source of truth.”

The workflow must change along with the tooling

But context alone is not enough. Companies need to redesign the workflows around these agents. If McKinsey’s 2025 report “One year of Agentic AI” As noted, productivity gains come not from layering AI on top of existing processes, but from rethinking the process itself. When teams simply drop an agent into an unmodified workflow, friction is created: engineers spend more time verifying AI-written code than they would have spent writing it themselves. The agents can only strengthen what is already structured: well-tested, modular codebases with clear ownership and documentation. Without these foundations, autonomy becomes chaos.

Safety and governance also require a change in mentality. AI-generated code introduces new forms of risk: unvetted dependencies, subtle license violations, and undocumented modules that escape peer review. Mature teams are beginning to integrate agent activity directly into their CI/CD pipelines, treating agents as autonomous contributors whose work must pass the same static analysis, audit logging, and approval gates as any human developer. GitHub’s own documentation highlights this journey, positioning Copilot Agents not as replacements for engineers, but as orchestrated participants in secure, auditable workflows. The goal is not to make an AI ‘write everything’, but to ensure that when it acts, it does so within certain guardrails.

What business decision makers should focus on now

For tech leaders, the path forward starts with preparedness rather than hype. Monoliths with sparse tests rarely produce net gains; Agents thrive where testing is authoritative and can drive iterative refinement. This is exactly the loop Anthropic calls for cipher agents. Pilots in tightly defined domains (test generation, legacy modernization, isolated refactoring); treat every deployment as an experiment with explicit metrics (defect escape rate, PR cycle time, change failure rate, security findings burnout). As your usage grows, you can treat agents like a data infrastructure: every plan, context snapshot, action log, and test run is data that comes together into a searchable memory of technical intent and sustainable competitive advantage.

Under the hood, agentic encryption is not so much a tool problem as it is a data problem. Every context snapshot, test iteration, and code revision becomes a form of structured data that must be stored, indexed, and reused. As these agents expand, companies will find themselves managing an entirely new layer of data: one that captures not just what was built, but how it was reasoned about. This shift turns technical logs into a knowledge graph of intent, decision-making, and validation. Over time, the organizations that can search and play back this contextual memory will surpass those that still view code as static text.

The coming year will likely determine whether agentic coding becomes a cornerstone of business development or another inflated promise. The difference will depend on context engineering: how intelligently teams design the information substrate their agents rely on. The winners will be those who see autonomy not as magic, but as an extension of disciplined system design: clear workflows, measurable feedback and rigorous governance.

In short

Platforms are converging on orchestration and guardrails, and research continues to improve context control at the time of inference. The winners of the next 12 to 24 months won’t be the teams with the most distinctive model; they will be the ones who develop context as an asset and treat the workflow as the product. Do that, and autonomy increases. Skip it, and the review queue does too.

Context + agent = leverage. Skip the first half and the rest collapses.

Dhyey Mavani accelerates generative AI on LinkedIn.

Read more of our guest writers. Or consider posting yourself! See our guidelines here.

Source link

Why most enterprise AI coding pilots underperform (Hint: It's not the model)

The shift from assistance to agency

Why context engineering is the real enabler

The workflow must change along with the tooling

What business decision makers should focus on now

In short

ServiceNow bets $40 million on Indian banking software specialist to expand its financial services push

After shocking quarter, IBM insists that AI isn’t killing the mainframe

Yope raises $12.3M to build a private social network without algorithms or ads

The shift from assistance to agency

Why context engineering is the real enabler

The workflow must change along with the tooling

What business decision makers should focus on now

In short

Robin Williams' mental crisis was 'hidden in plain sight' before death

Brooks Nader and sisters wear tight and tiny dresses during iHeartRadio Z100's Jingle Ball

Related Articles

Cluely’s Roy Lee on the ragebait strategy for startup marketing

The most important OpenAI announcement you probably missed at DevDay 2025

Google just fired a warning shot in the AI subscription price wars

The Pentagon is developing alternatives to Anthropic, report says

ServiceNow bets $40 million on Indian banking software specialist to expand its financial services push

After shocking quarter, IBM insists that AI isn’t killing the mainframe

Yope raises $12.3M to build a private social network without algorithms or ads