McKinsey Quarterly: A Time for Courage (161/164)

159 Q UA RT E R _ 0 2 _ 2 0 2 6 willing to rethink their operating model. quality. AI agents take the night shift, doing the heavy execution work—coding, testing, reviewing, documenting—inside a controlled, well-designed workflow. But reaching this point requires careful factory setup. To begin with, the organization must prepare the environment in which agents will operate. Agents need structured requirements, clear user stories, and unambiguous acceptance criteria— they cannot infer business intent. They also need rich context about the system: domain knowledge, architecture diagrams, API contracts, data models, service boundaries, and nonfunctional expecta tions (such as performance and reliability). All of this is fed into the agent environment so the AI understands what it is building and why. - Once the factory is set up, the human team works the day shift. Their role is to decide what matters and convert that intent into agent-ready tasks. They refine user stories, translate features into specifications, break the work into well- scoped tasks, and define what “good” looks like. They provide architectural direction—explaining which modules can be touched, which should not be altered, and why. They set priorities, tune guardrails, and update tests for areas where agents have made mistakes. In short, humans move from typing code to directing, decomposing, and quality-controlling the work. As evening comes, the night shift of AI agents takes over. A coordinated fleet of them performs multistep workflows: Coding agents implement changes or refactor modules; test agents gener ate and run new test suites; QA agents identify regressions; security agents scan for vulnera bilities or leaked secrets; performance agents benchmark critical paths; and documentation agents rewrite and update API references and “what changed” summaries. - - An orchestrator agent manages handoffs: If tests fail, it routes work back to a fix agent; if performance declines, it invokes a performance-checking agent; if a policy is violated, it halts the workflow. By morning, the factory has produced a set of ready- for-review pull requests, each containing code, tests, logs, analysis results, and a natural lan guage rationale. - The next day, the human team resumes the day shift by reviewing the output of the night. They examine the summaries, approve or refine code update requests, assess architectural fit, and give the AI new direction. They adjust priorities based on what the agents achieved overnight, tighten guardrails where needed, and mark more parts of the code base as “safe to automate” as confi dence grows. - In this model, software development becomes a continuous, high-speed loop rather than a two- week sprint cycle. The humans guide the system; the agents do the work; the engineering platform ensures safety and quality. The result is a factory that produces more, at higher quality, with humans focusing on the parts of the work that genuinely require expertise and judgment. If you ask us, this is absolutely incredible. Suc cess cases are still few and far between as of this writing, but breakthroughs are emerging. One large financial-services firm, for example, has stood up this exact AI agent factory to develop a greenfield payment system and is improving productivity by 40 to 70 percent. LATAM Airlines has also experimented with a version of this and is delivering 50 percent increases in productivity (with smaller teams). - What does it take to run an AI agent factory like the one described above? Don’t skimp on the foundations. Every suc cessful implementation of AI agents has relied on strong foundations. LATAM highlights two in particular: a robust engineering platform that gives agents the tools and environments they need, and a product-oriented operating model -

McKinsey Quarterly: A Time for Courage - Page 161

McKinsey Quarterly: A Time for Courage Page 160 Page 162