Why I'm trying to max out AI token usage in software delivery
Codex consumed 1.46B tokens in the last 24 hours and 3.15B in the last 30 days on this machine. The lesson was not to cut spend. It was to maximize useful tokens per completed loop.
Over the last 24 hours, Codex consumed 1.46 billion tokens on this machine. The rolling 30-day total is 3.15 billion.
Most people would treat that as proof the system is wasteful.
I think that is backward.
The metric that matters is not tokens spent. It is tokens per completed loop. If a bigger token bill buys a shipped feature, a verified deploy, a tested workflow, a published artifact, and a tighter operating system, then the right move is to spend more tokens, not fewer.
That sounds irresponsible only if you are still treating AI like autocomplete.
Once AI becomes the execution layer, token spend stops behaving like chat overhead and starts behaving like operating budget.
The repo already shows the difference
The local numbers on this machine are strong enough to make the point without fiction.
Codex consumed 1.46 billion tokens in the last 24 hours and 3.15 billion in the last 30 days. The last 24 hours alone accounted for 46% of the rolling 30-day total and 86.6% of the previous 29 days combined.
Those are rolling windows, not vanity screenshots based on whichever old threads happened to get touched today.
The spike also did not come from random prompt spam. It came from a small set of long-running operator threads running gpt-5.4 at xhigh reasoning across infoproducts and lbp/repos, while moving product, identity, publishing, worktree discipline, and subagent infrastructure forward.
That token spend did not produce more chat.
It produced more finished loops.
Since March 19, apps/lmachine has shipped 67 commits, added 56,904 lines, removed 4,555, and moved without pull-request theater. The repo now contains 32 tracked posts. The work did not stay trapped in git history either. It was converted into public narrative, tests, and operator rules.
That is the part most people still miss.
The output is not generic code generation
This repo did not use tokens to generate random components faster.
It used them to ship a real operating surface.
RankWar now has a creator cockpit with ranked next moves, shared campaign timeline, ambassador tracking, persistent weekly review memory, lifecycle sequences, machine-readable sitemap and robots surfaces, generated execution packs, and an AI GTM copilot grounded in live campaign truth instead of prompt sludge.
That is a more interesting result than "AI wrote some code."
It means the system can now decide, package, publish, and remember instead of forcing a human to reconstruct the context every week.
That is what a completed loop looks like:
- product behavior
- verified tests
- operator memory
- public proof
- stronger process for the next run
If more tokens buy more loops like that, then cutting tokens is not discipline. It is sabotage.
The real leverage came from the operating system
The bigger story is underneath the product.
The workspace now runs with explicit skills, custom agents, Docs MCP, multi-worktree rules, publishing rules, social distribution rules, GTM rules, and an improvement log that recorded 24 process mutations on March 21 alone.
That matters because raw model capability is not enough.
Without SOPs, skills, subagents, and disciplined handoff surfaces, higher token spend just creates larger piles of text.
With them, the same tokens create compounding leverage.
The more we optimized the interaction model, the more tokens the system wanted to consume. That is exactly what should happen. Better decomposition, better delegation, better verification, and better publishing all cost tokens. They also collapse more human labor than cheap prompting ever will.
This is why I care less about cost per token than cost per finished loop.
What I think dies next
I increasingly think the execution-heavy version of software engineering is entering terminal decline.
The same is true for a lot of what companies currently call devops and developer productivity.
I do not mean that all humans vanish.
I mean the old headcount model gets weaker fast.
The surviving human layer is not "the person who still writes tickets and code by hand." It is the operator who can set direction, define constraints, judge taste, control downside, and manage AI systems that manage more AI systems underneath them.
In other words:
- fewer human executors
- more AI executors
- one or two humans giving higher-order commands and making the non-obvious calls
That is not futuristic. It is already visible in the repo.
The loser move
Most teams will optimize for lower token bills, safer prompts, and narrower use cases.
They will use AI to patch tickets faster and then wonder why nothing compounding happened.
They are optimizing the unit cost of a broken system.
The better move is harsher and simpler:
spend tokens aggressively where they close real loops, kill human handoffs, and leave behind stronger operating infrastructure.
That is the standard I care about now.
I am not trying to minimize token usage.
I am trying to maximize useful token usage.