Author’s Note
When I published The Token Tax three weeks ago, I expected the cost conversation around enterprise AI to keep growing.
I did not expect it to accelerate this quickly.
In the time since, the same pattern has surfaced across the industry in plain view: companies exhausting an entire year’s AI budget in a single quarter, a major vendor moving its flagship coding assistant to metered billing, executives asking aloud whether rising usage maps to anything a customer would actually notice.
The topic has grown, but the cause has not. We are not watching a new problem appear; instead, we are watching a wider audience finally notice the one that was already in the room. What read as an architectural argument a few weeks ago has become an operating-budget story, and the budget is a more persuasive teacher than the architecture ever was.
~Dom
The argument, briefly
The Token Tax made a single claim, and the rest of this piece depends on it, so I will restate it rather than ask you to go back.
AI does not make organizational ambiguity disappear; it converts it into operating expense.
When you point a language model at a clean process – defined inputs, unambiguous rules, named owners – it is cheap to run, because there is very little for it to work out. Many processes meeting those requirements may not need AI at runtime at all, which makes them more efficient still.
When you point AI at an ambiguous process, however, something else happens. The LLM has to reconstruct, on every invocation, the interpretive work an experienced employee used to do once and then simply remember. Context windows swell to hold the institutional memory no one ever wrote down. Retries and fallbacks multiply to cover the exceptions no one documented. Orchestration accretes around the cases the base model gets wrong. Each of those is billable, in tokens, engineering hours, or both.
So token cost, I argued, is not fundamentally a technical expense. Instead, it is a measurement of organizational ambiguity. The meter does not run only because intelligence is expensive. It runs because the organization asked intelligence to keep re-deriving the structure it declined to build.
That was the architecture. The last few weeks have turned it into an invoice, and the invoice is doing something the argument never could.
It is making people look.
The bill is the diagnosis
If token cost is a measurement of organizational ambiguity, then the itemized AI bill is the most honest operational diagnostic your company has ever produced.
Read it by use case and it is a ranked map of exactly where you are undefined. The processes where the meter spikes are the processes where the most undocumented, unreconciled, unowned ambiguity is being reconstructed on every call. It may register as a budget overrun.
It is also an X-ray you did not intend to buy, but now possess.
Now look at how the market is responding to its own X-ray: Cut the licenses. Cap the budget. Kill the dashboard. Go “back to the drawing board” on what we’re allowed to spend.
Every one of those is a maneuver to stop seeing the number. The organization has been handed a diagnostic instrument and is responding by smashing the gauge because the reading is high. It mistakes the measurement for the disease.
The disciplined response is the opposite: Find where the meter is hot and put each hot spot to the same question The Token Tax ended on: is this ambiguity unavoidable, or merely undefined?
The bill itself has already sorted your remediation backlog for you, for free, in priority order. The best way to waste that gift is to look away from it. And there is a further turn, one I suspect will be unpopular:
Usage-based billing is more honest than the seat license it replaces, not less.
A flat per-seat fee was an anesthetic. It took the cost of ambiguity and flattened it into a fixed line that no one had to feel, which meant no one had to act on it. Metered billing removes the anesthetic. The pain returns to the exact site of the injury. The move to metered billing is, in a sense, the installation of a diagnostic instrument the organization may never have been willing to build for itself.
The vendor is not the villain of this story for charging you by the token. The vendor is, inadvertently, the radiologist. That does not make the vendor neutral. It only means the invoice can reveal something true even when the pricing model exists for the vendor’s benefit.
From architecture to budget
The clearest signal comes from GitHub. As of June 1, every Copilot plan moves to usage-based billing: a monthly allotment of credits metered against tokens consumed – input, output, and cached – priced at each model’s underlying API rate. Code completions stay free, but chat, the coding agent, the CLI, and code review all draw the meter down. The base subscription price did not change. What changed is that the cost of how you use it is now visible on a line item instead of absorbed into a flat fee.
Then there is Uber, which is the more instructive case because nothing actually broke. Uber’s engineering organization adopted AI coding tools enthusiastically. So much so that, by its own disclosure, the company spent its full 2026 budget for those tools roughly four months into the year. Its president and COO, reflecting on it afterward, made the point worth sitting with: he could not draw a clean line from rising token consumption to a measurable increase in useful features shipped to customers. The usage was real. The bill was real. The link between them was the thing in question.
It also matters how Uber got there. The spend was not an accident; it was incentivized, in part, by an internal leaderboard that ranked teams by how much AI they used. Which is the bridge to the third signal.
The Financial Times reported, and several outlets confirmed, that Amazon shut down an internal usage leaderboard at the end of May after employees gamed it – running low-value tasks through agents largely to climb the rankings. An Amazon executive’s instruction to staff was almost too on the nose: stop using AI for the sake of using AI; use it to solve actual problems.
The trade press gave the behavior a name, tokenmaxxing, and reached for Goodhart’s Law: when a measure becomes a target, it stops being a good measure.
Even Microsoft, according to reporting from The Verge, is planning to remove most of its direct Claude Code licenses for engineers in its Experiences + Devices organization and redirect many of them toward Copilot CLI. This is not a retreat from AI; it is a dependency-routing decision.
Four signals, producing one shape: The cost of enterprise AI stopped being an abstraction and started being a number someone in finance can point at. The reflex, everywhere, has been to make the number smaller.
That reflex is the mistake.
Two diagnoses, one corrupted gauge
If the bill is a set of readings, then every hot reading resolves into one of two diagnoses, and the entire strategic question is telling them apart.
Some ambiguity is genuinely unavoidable. It lives in the input, not in your organization. A document arrives in one of three hundred vendor formats you have never standardized because you cannot; they are not yours to standardize. A customer writes in with a genuinely novel problem. A team is exploring a space that has not yet been narrowed, where divergence is the point.
In these cases interpretation is the task, no deterministic system could do it without changing the nature of the work, and the cost of running a model against it is proportionate to the cost of every alternative. This is where AI earns the meter, and where it makes sense to leave it in the runtime path.
Other ambiguity is merely undefined. It lives in your teams and processes. The field that was never specified. The exception path that was never documented. The owner who was never named. The definition that drifts between three systems because no one did the reconciliation.
From inside a process these two look identical; both present as “messy work the model handles.” Economically, though, they are opposites. The first is a property of reality. The second is a deferred decision wearing reality’s clothes, and every token spent on it is rent paid to avoid a meeting.
Most expensive enterprise deployments are the second case being run as if it were the first. The bill reflects the mismatch with perfect fidelity, if you let it.
Then there is a third thing, and it is neither a reading nor a diagnosis. It is the corruption of the instrument itself. Tokenmaxxing is what happens when you point the meter at the wrong target, when adoption and usage become the metric of merit. In these cases, people optimize for the number on the board rather than the problem on the desk.
In those cases, the result is a bill that is high for reasons that have nothing to do with real ambiguity, which means the diagnostic is now polluted upward and can no longer be read at all. Amazon’s leaderboard did not measure productivity; it manufactured consumption and then mistook it for productivity. That is Goodhart’s Law operating exactly on schedule.
So there are two ways to lose the signal and one way to use it. You can pollute it: reward usage, and the meter fills with noise. You can destroy it: cap and cut, and the meter goes dark. Or you can treat it as instrumentation: read where it is hot, ask which of the two diagnoses applies, and route accordingly.
The companies that I suspect will win the next phase of AI adoption will not be the ones that used the most AI or the least. They will be the ones that learned to read their own gauge.
Build the aqueduct
Reading the gauge only matters if it changes what you build. Routing accordingly comes down to a few moves, in roughly this order.
First, stop making the most expensive model the first model. A frontier model can do simple work; that is not a reason to price simple work like frontier reasoning. Classification, field extraction, tone normalization, routine summarization, OCR cleanup… these do not need the strongest model available, they need the cheapest model that clears the reliability bar for a defined job.
The frontier belongs at the end of an escalation path, not at the front of every request. Tiered routing is the single highest-leverage change most organizations can make, and it requires no new hardware and no new heroics.
Second, prefer a specialized model over a general agent wherever the task is itself specialized. Transcription, text-to-speech, embeddings, entity extraction, document classification, and translation all have models built for each, many of which run quickly and predictably on hardware you could buy off a shelf. A purpose-built model is usually cheaper, faster, and far easier to evaluate than a general-purpose agent improvising the same task at frontier prices.
Third, for the undefined ambiguity, resolve it and leave LLMs out of the runtime path entirely. Let the model help you design the system: surface the exceptions, propose the fields, draft the routing logic, write the documentation, generate the workflow, expose the contradictions you had been papering over. Then run the result deterministically, with no model in the loop.
I described one such system last time, a feedback intake that an LLM helped me design and that now runs without consuming a single token at runtime. That is the general shape, not the special case.
Let AI help you build the aqueduct. Do not pay it, forever, to carry the buckets.
Own the core
Once the meter shows where AI genuinely belongs in the runtime path, the question becomes whether that capability is peripheral enough to rent or strategic enough to own.
In regulated industries and anywhere real trade secrets are in motion – healthcare, finance, defense-adjacent work, anything where transmitting your data to a third party’s servers is itself the problem – the external API is not the more expensive option. In many cases, it may simply be off the table. There, self-hosted inference is a constraint, and the cost comparison never enters the room.
The second case is continuity and control: the decision to place a metered, externally governed dependency in the middle of every transaction.
For the work you resolve to deterministic systems, that dependency vanishes. But for the irreducible core that must run on a frontier-class model in production, an external API is exactly the thing the argument warns about: metered, repriced at the vendor’s discretion, rate-limited, deprecatable, and governed by terms you did not write.
Betting process continuity entirely on that is the same structural error as betting it on the one undocumented employee who could walk out the door. It has only been relocated onto someone else’s balance sheet.
And there is a sharper edge worth naming. A great deal of what gets called AI-first is really API-first. If AI is the capability you claim defines you, renting all of it from a third party (possibly one you also compete with) is a strange way to own your core; companies do not usually outsource the thing they are built on.
The bill, read as a diagnostic, shows you where your dependency is concentrated; the company that genuinely means “AI-first” should be the most alarmed of anyone to find its supposed core competency itemized as someone else’s recurring invoice.
The marginal-cost case for self-hosting comes third, because it is the most audience-dependent. For workloads that are stable, high-volume, specializable, and where you already employ people who can run the stack, owning the hardware converts an open-ended meter into a one-time capital cost amortized over years. For everyone else it is simply a different, lumpier bill.
One honest caveat keeps this from overreaching: owning inference is not owning the frontier; almost no one should be training frontier models, and “we have a datacenter and a dev team”, while necessary, is not sufficient to replace tools like GPT, Claude, or Gemini entirely. The credible version of self-reliance is not build your own ChatGPT. It looks more like owning and maintaining the infrastructure to support specialized or fine-tuned models for your stable, high-volume work, and renting the frontier only for the hard residual.
In this case, sovereignty and cost management can point at the same shape.
Closing
The answer is not to turn away from AI. That would be an overcorrection, and a foolish one in many cases.
AI has real value: it can accelerate discovery, clarify ambiguity, surface patterns, generate options, and help people design better systems faster than they could have built them alone. The companies cutting access to make the number smaller are, in some cases, making the same category error as the companies that drove the number up: both are treating the meter as the point.
But marketing is not architecture. Marketing exists to sell a product that generates revenue for the seller, and a product can be genuinely valuable and still be sold in a way that quietly cultivates dependency. A model can be powerful and still be the wrong thing to wire into the middle of every transaction.
The vendor’s incentive is for the meter to run; yours is for it to run only where it earns its keep.
The work AI cannot do for you is the work of deciding what problem is actually being solved. It can help you define the field, map the exception, draft the workflow, and test the structure. But it cannot absolve you of the responsibility to build one, and it cannot tell you which of your costs are the price of reality and which are the price of a decision you keep declining to make.
The bill will tell you that. It is telling many organizations their own version right now. The question is whether you read the meter, or smash it.




Leave a comment