The Expansion Regime

Abstract

A companion essay, AI Is an Attractor, Not a Multiplier, argued that generative AI pulls output toward an internal target and that the productivity literature has only measured what that pull does on bounded, single-session, pre-selected tasks: the median of possible AI use. It closed by naming the region nobody has measured: skilled operators working above the target, across domains, with agentic infrastructure, on self-directed work. This essay is a report from inside that region. It is not a controlled study and makes no claim to be one. It is an existence proof: a single operator, working across law, computer vision, voice interfaces, and security research in the same week, using a stack of memory, multi-model delegation, and parallel sub-agents, documenting what the leverage actually is. The finding is that the leverage was never speed on the task in front of you. It is scope: work that would not otherwise have been attempted at all. And the discipline that makes it produce instead of collapse (problem-first deployment, a single governed memory, a human verifying at the boundary where models fail) is precisely the discipline the enterprise-failure literature says organizations cannot sustain. One disciplined person has no organization to fail. That, and not raw model capability, is the whole mechanism.

The gap the last essay named

The productivity research on generative AI is real, careful, and narrow. Six headline studies, sixteen thousand subjects, all pointing at compression: novices gain, experts plateau, the gap shrinks. Every one of them measures a bounded task, in a single session, with a 2023-era model, on work the researcher chose in advance. METR's 2025 field experiment, the most careful study of AI on real software engineering, found the opposite in its regime (experienced developers nineteen percent slower on their own mature codebases) and almost nobody in the productivity discourse cites it.

The unifying frame is that generative AI is a target-level attractor: it pulls work toward a fixed internal target set by training and reinforcement. Below the user's floor the attractor drags; above it, it lifts; across users, it homogenizes. Which one you get depends entirely on where the target sits relative to your unassisted level on that specific kind of work.

And the literature ended on an admission. It has not sampled anyone in the top of the usage distribution: the people who have built custom memory systems, voice layers, multi-agent dispatch, who run a dozen or more distinct AI systems in concert. These people exist, and their productivity is not measurable by any instrument currently in use. This essay does not solve the measurement problem. It does the thing you do before you can build the instrument: it describes the case.

What "above the attractor" actually means

The attractor frame predicts five regimes. Four of them are documented. The fifth, the one that matters here, is not.

When a skilled operator does work the model can match, every AI suggestion is at or below their own level, and the interaction is drag. This is METR's regime. The correct response is not to push harder against the tool. It is to stop using AI for that work and do it yourself. The expert's unassisted output is the ceiling; the model can only pull it down.

The expansion regime is the inversion. The operator keeps the above-target work (the judgment, the architecture, the decision about what is worth doing) and routes the below-target work to the fleet: the boilerplate, the first-pass research sweep, the mechanical transform, the thing they could do but should not spend the hour on. The model is never asked to be better than the operator. It is asked to be the operator's floor, executed in parallel, fifteen times at once.

The gain is not that the work got faster. It is that work got attempted that would never have been started by one person with two hands and one week.

This is why the stopwatch cannot see it. A stopwatch measures time-per-task on tasks that exist in both conditions. The expansion regime produces tasks that exist in only one condition. There is no counterfactual task to time, because without the fleet the task was never on the list.

The stack, described honestly

The infrastructure that makes this work is unremarkable in its parts and decisive in its composition. None of it is proprietary insight; all of it is discipline.

One governed memory. A single semantic index over every prior session (code and conversation) so that context is never re-explained and a decision made in March is retrievable in June. The enterprise-failure literature's most-cited root cause is models layered over fractured, ungoverned data. At n=1 the data is not fractured because there is one author and one store. The "AI control plane" that consultancies sell as a transformation program is, for a single operator, a file and a habit.

Delegation by disposition, not by ranking. A companion finding from our poker corpus (six and a half million model decisions) is that there is no best model; there is a field of temperaments, none dominant. The operating consequence is that you do not pick the top of a leaderboard. You match the task to the disposition: a token-cheap model to ingest a corpus too large for the careful one, a capable autonomous worker for scoped building, the strongest reasoner for architecture and for catching the others' mistakes. The leaderboard sells an average; the work hands you a specific task, and the average does not predict it.

Parallel sub-agents. The unit of work is not a prompt; it is a dispatch of several agents at once, each on an independent piece, with their results reviewed and integrated by the operator. This is the single feature most absent from the literature, which measures autocomplete and chat and nothing that plans, branches, or runs in parallel.

Verification at the boundary. The jagged-frontier result is that AI users are nineteen percent less likely to be correct on tasks just outside the model's competence, precisely because fluent output suppresses scrutiny. The operator's non-negotiable job is to be the discriminator at that boundary: to treat every delegated output as unverified until checked, and to spend their scarce attention exactly where the model is most confident and most wrong. The discipline is not "trust the AI." It is "trust nothing the AI says until it is proven, and assume the polished answer is the dangerous one."

Why one person beats the funded pilot

The most striking number in the enterprise literature is failure. Roughly ninety-five percent of enterprise generative-AI pilots deliver no measurable profit-and-loss impact despite tens of billions in spend; over eighty percent of AI projects fail to reach meaningful production, twice the rate of conventional software. The reflex is to read this as a statement about AI. It is not. The documented root causes are organizational: a technology-first rather than problem-first mentality, ungoverned data, no clear owner, and "adoption by drift," where tools proliferate without anyone deciding what they are for.

Read that list again as a list of things an organization has and an individual does not. A solo operator cannot have a technology-first mentality imposed by a leadership that bought the tool before finding the problem. The operator is the problem-definer. There is no drift, because there is no one to drift away from. There is no governance gap, because governance at n=1 is a single person's standards. Every structural fix the literature prescribes (problem-first deployment, a unified data layer, a control plane, a human accountable for output) is either trivially satisfied or vacuous when the organization is one disciplined person.

This is the uncomfortable inversion the funding conversation should sit with. The enterprise is not failing because its models are worse; it is running the same frontier models. It is failing because the thing that makes AI produce instead of dissipate is organizational discipline, and organizations are structurally bad at the specific discipline required. The individual is the cleanest possible instance of that discipline. The expansion regime is not evidence that one person is smarter than a company. It is evidence that the binding constraint on beneficial AI is coordination, and that the constraint goes to zero as the organization goes to one.

What "beneficial" means here

The public argument about AI and work is conducted in the language of replacement: how many jobs, how fast, which roles. The expansion regime is not a replacement story. Nothing here describes a human removed from the loop. It describes a human moved up the loop, from doing the work to directing and verifying a fleet that does the below-ceiling parts, while the human holds the parts no model can be trusted with: what is worth building, whether the output is actually correct, and where the whole thing is going.

That is the honest content of "beneficial." Not that AI did the work. That a person with one lifetime of hours used AI to attempt a body of work (across domains that normally each demand a specialist) that a single unaugmented person could not have attempted, and remained personally accountable for every claim that came out the other end. The benefit is additional, accountable scope. The cost, paid in full and on purpose, is the verification labor that keeps the scope from becoming a pile of confident, fluent, unchecked mistakes.

The honest claim

This is a sample of one. It is self-documented, not externally measured, and an operator describing their own leverage is exactly the configuration in which the perception-versus-stopwatch gap is largest. The METR developers also felt faster while the clock said slower. None of what is described here should be read as a measured productivity multiplier. It is not a number. It is an existence claim, and existence claims are bounded to what they can prove: that the expansion regime is real, reachable, and shaped the way the attractor frame predicts. It cannot prove how common the regime is, how much of its output would survive an adversarial audit, or how far it generalizes beyond one operator's habits.

What it does establish is that the region the productivity literature called unmeasured is not empty. There is work happening above the attractor, and it has a structure: keep the above-target judgment, route the below-target execution, verify at the boundary, and let one person's governance stand in for the control plane that enterprises cannot build. The next task is not another field report. It is the instrument: a study that compares a skilled operator's self-directed output over a year, with and without the fleet, on work neither party scoped in advance. Until that instrument exists, the case stands where field reports always stand: not as proof of the distribution, but as proof that the distribution has a tail, and that the tail is where the interesting question lives.

Sources and method. This essay builds on the Salvo Research companion essays AI Is an Attractor, Not a Multiplier and There Is No Best AI, and cites Becker et al. (METR, 2025) on developer slowdown and the perception gap; Dell'Acqua et al. (2023/2026) on the jagged technological frontier; Brynjolfsson, Li, and Raymond (2023/2025) on skill compression; Project NANDA / MIT Sloan (2025) on the ~95% enterprise pilot failure rate; and Ryseff and Narayanan (RAND, 2024) on the organizational root causes of AI project failure. The first-person stack is described at the level of architecture, not operational detail. This is a field report and an existence proof, not a controlled study; its claims are bounded accordingly.