Why I'm trying to max out AI token usage in software delivery
The real metric for AI-native software delivery is not lower token spend. It is higher tokens per completed loop: shipped product, verified deploy, public artifact, and a stronger operating system.
Most people are trying to reduce AI token usage.
I think that is backward.
The metric that matters is not tokens spent. It is tokens per completed loop. If a bigger token bill buys a shipped feature, a verified deploy, a tested workflow, a published artifact, and a tighter operating system, then the right move is to spend more tokens, not fewer.
That sounds irresponsible only if you are still treating AI like autocomplete.
Once AI becomes the execution layer, token spend stops behaving like chat overhead and starts behaving like operating budget.
The repo already shows the difference
The local numbers on this machine are strong enough to make the point without fiction.
Codex logged 2.84 billion tokens over the last 30 days across 125 threads. The biggest spike was March 19, 2026, with 1.19 billion tokens in a single day. Inside the infoproducts workspace alone, Codex used 888.9 million tokens since March 17. Almost all of that came from gpt-5.4 running at xhigh reasoning, and two explorer subagent threads alone consumed 192.4 million tokens inside this workspace on March 21.
Claude's local session metadata tells the same story from the other side. In the same 30-day window it logged only 40,782 tokens total, with 21,532 inside infoproducts. Codex became the heavy execution lane. Claude stayed a much lighter planning and editing lane.
That token spend did not produce more chat.
It produced more finished loops.
Since March 19, apps/lmachine has shipped 47 commits, added 53,093 lines, removed 1,228, and moved without pull-request theater. The repo now contains 26 tracked posts, including 17 build logs and 9 blog posts. The work did not stay trapped in git history either. It was converted into public narrative, build logs, tests, and operator rules.
That is the part most people still miss.
The output is not generic code generation
This repo did not use tokens to generate random components faster.
It used them to ship a real operating surface.
RankWar now has a creator cockpit with ranked next moves, shared campaign timeline, ambassador tracking, persistent weekly review memory, lifecycle sequences, machine-readable sitemap and robots surfaces, generated execution packs, and an AI GTM copilot grounded in live campaign truth instead of prompt sludge.
That is a more interesting result than "AI wrote some code."
It means the system can now decide, package, publish, and remember instead of forcing a human to reconstruct the context every week.
That is what a completed loop looks like:
- product behavior
- verified tests
- operator memory
- public proof
- stronger process for the next run
If more tokens buy more loops like that, then cutting tokens is not discipline. It is sabotage.
The real leverage came from the operating system
The bigger story is underneath the product.
The workspace now runs with explicit skills, custom agents, Docs MCP, multi-worktree rules, publishing rules, social distribution rules, GTM rules, and an improvement log that recorded 24 process mutations on March 21 alone.
That matters because raw model capability is not enough.
Without SOPs, skills, subagents, and disciplined handoff surfaces, higher token spend just creates larger piles of text.
With them, the same tokens create compounding leverage.
The more we optimized the interaction model, the more tokens the system wanted to consume. That is exactly what should happen. Better decomposition, better delegation, better verification, and better publishing all cost tokens. They also collapse more human labor than cheap prompting ever will.
This is why I care less about cost per token than cost per finished loop.
What I think dies next
I increasingly think the execution-heavy version of software engineering is entering terminal decline.
The same is true for a lot of what companies currently call devops and developer productivity.
I do not mean that all humans vanish.
I mean the old headcount model gets weaker fast.
The surviving human layer is not "the person who still writes tickets and code by hand." It is the operator who can set direction, define constraints, judge taste, control downside, and manage AI systems that manage more AI systems underneath them.
In other words:
- fewer human executors
- more AI executors
- one or two humans giving higher-order commands and making the non-obvious calls
That is not futuristic. It is already visible in the repo.
The loser move
Most teams will optimize for lower token bills, safer prompts, and narrower use cases.
They will use AI to patch tickets faster and then wonder why nothing compounding happened.
They are optimizing the unit cost of a broken system.
The better move is harsher and simpler:
spend tokens aggressively where they close real loops, kill human handoffs, and leave behind stronger operating infrastructure.
That is the standard I care about now.
I am not trying to minimize token usage.
I am trying to maximize useful token usage.