The Agent Era? Let's Be Practical: What Does Your "New-Energy Brain" Cost?

Agent. A lot of people talk about it as if it can do everything. It can order food for you, fix code, even write research reports on its own. But before you get carried away by that fantasy of “full automation,” I want to step away from all the mystical marketing language and talk about something much more concrete:

money.

When you hire an agent to work for you, how exactly is the bill calculated?

1. What a token actually is#

Many people do not really distinguish between an agent and a large model. A simple way to think about it is this:

The agent is the “body.” It has the hands that operate a computer, and the eyes and ears that perceive the world. If you ask it to search for a piece of news and turn it into a summary, it is the agent’s eyes reading the webpage and its hands typing on the keyboard.
The large language model is the “brain.” The body itself has no soul. All logical reasoning and decision-making come from that “new-energy brain” behind it.

That brain is not attached to the body the way a human brain is. The brain lives in a cloud data center, while the body, the agent, is on your computer. The brain has to direct the body to do work, and the body has to report back what it “saw.” That requires constant high-frequency information exchange.

In AI, that neural signal has a name: token.

Input Tokens#

These are the raw materials you hand to the model.

They include the documents you upload, the background material an agent finds automatically, and, most importantly, your conversation with it, in other words, the prompt.
One thing to notice is that input usually also includes the running history of the conversation. If you do not control it, the input gets longer and longer as the conversation continues. Fortunately, most agents now have automatic truncation and compression so the context does not grow without limit.

Output Tokens#

These are the finished goods the model produces.

They include the model’s reply, the code it generates, or the final report it produces.

Thinking / Reasoning Tokens#

This is a special cost attached to higher-end reasoning models such as gpt-o1 or DeepSeek-R1.

This is not true reasoning in the human sense. It is more like the model drafting internally and running through a line of logic before it formally speaks. That invisible inner dialogue also produces tokens. You do not see it in the final answer, but it still consumes compute.

2. How the bill is actually calculated#

Before the formula, we need to introduce one familiar source of discounted pricing: cache.

If you ask an agent to repeatedly work on the same block of tokens, say the tokenized form of a 100,000-word manual, a smart provider will often keep that content around temporarily and give you a discounted input cost.

That brings us to the tokenizer.

Large models do not read text directly. A tokenizer first chops the text into small fragments, tokens. Caching is basically prefix matching. As long as the beginning of your conversation, such as the system prompt or reference document, turns into exactly the same token sequence as before, the brain can pull that part from memory instead of tokenizing it all over again.

The pricing formula for each request is simple enough:

The market is now very consistent on this point: usage-based billing.

\text{Total Cost} = (\text{Input}_{miss} \times \text{Full Price}) + (\text{Input}_{hit} \times \text{Cache Price}) + (\text{Output} \times \text{Output Price}) + (\text{Thinking} \times \text{Thinking Price})

In other words, your bill usually has four separate line items:

Input: new information the brain has to read
Cache Hit: old information the brain “remembers” and reuses. The unit price is much lower, but because the token count is often huge, the actual spend may still add up quickly.
Output: the final product the brain gives back to you
Thinking: the silent internal run-through a reasoning model generates before it gives you an answer

Pricing is usually quoted per million tokens. Here is a reference table for three representative models in 2026:

Model family	Standard input (Miss)	Cached input (Hit)	Output / Thinking	Notes
GPT-5.4	$2.50	$0.25 (10%)	$15.00	Flagship performance, very strong reasoning
GPT-5.4 mini	$0.75	$0.075 (10%)	$4.50	Extremely cost-effective, very fast
DeepSeek-V3.2	￥2	￥0.2 (10%)	￥3	A price butcher, ideal for cost reduction

Why agents get expensive so easily#

In an ordinary conversation, you ask one question and the model gives one answer. An agent is different because it calls the model in a loop.

To finish a complicated task, such as “build a webpage and deploy it,” an agent may generate several internal requests: first think through the steps, then search for information, write code, inspect its own errors, and only then deliver the result.

Every cycle of think, act, and observe produces tokens. And as context accumulates, the input side grows like a snowball. If you let an agent loop recklessly, it is very easy to burn through the equivalent of a tank of gas in a single day.

3. A shift in mindset: from worker to cyber-capitalist#

Once you understand the bill, the biggest feeling that remains is this: we need to abandon the laborer mindset completely.

What is the worker mindset?#

Traditionally, we focus on physical effort and immediate reward. If you spend three hours writing a report, you expect to be paid for those three hours. Inside that mindset, we are used to polishing details without counting the cost, because our own time feels fuzzy and “free.”

What is the capitalist mindset?#

Once you start using agents, you are no longer just a person writing code. You become a cyber-capitalist. Every task an agent runs for you has an explicit cash cost, and your attention shifts sharply:

Calculate profit: If this automated agent flow costs 2 dollars to run, can the time it saves create more than 2 dollars of value somewhere else? Or, if not, are you willing to spend 2 dollars just to buy yourself leisure or emotional value? If not, that agent spend should not exist.
Reduce cost and increase efficiency:
- If you were using the most expensive flagship model, can you split the task up and let cheaper models handle the simpler parts? Or trim the prompt so you reduce unnecessary input cost?
- How do you optimize the workflow so the agent produces better output and less nonsense while consuming the same number of tokens?
Be result-oriented: A cyber-capitalist should not care how hard the agent seemed to labor. They should not care whether it reduced human stress, and they should not even care whether it can solve some specific class of problem in the abstract. The only real standard is whether the final output of those tokens creates value. That value can be money, or it can simply be emotional value.

The real dividing line in the age of agents is whether you have the ability to manage digital labor. Stop thinking of yourself as the screw in the machine. Look at your bill. Optimize your workflow. The moment you start calculating the break-even point of every line of output, you are finally holding a ticket into the agent era.

Postscript#

It took me about an hour to go from brainstorming to the final piece. That included sorting out the logic, writing the first draft, revising it repeatedly, formatting the article, reviewing the translation, testing locally, and pushing the post to the cloud.

This is what one month of using agents intensely as a serious workstation has produced for me.