The Agent Era? Let's Be Practical: What Does Your "New-Energy Brain" Cost?

Mon, 13 Apr 2026 00:00:00 GMT

Agent. A lot of people talk about it as if it can do everything. It can order food for you, fix code, even write research reports on its own. But before you get carried away by that fantasy of “full automation,” I want to step away from all the mystical marketing language and talk about something much more concrete:

money.

When you hire an agent to work for you, how exactly is the bill calculated?

1. What a token actually is#

Many people do not really distinguish between an agent and a large model. A simple way to think about it is this:

The agent is the “body.” It has the hands that operate a computer, and the eyes and ears that perceive the world. If you ask it to search for a piece of news and turn it into a summary, it is the agent’s eyes reading the webpage and its hands typing on the keyboard.
The large language model is the “brain.” The body itself has no soul. All logical reasoning and decision-making come from that “new-energy brain” behind it.

That brain is not attached to the body the way a human brain is. The brain lives in a cloud data center, while the body, the agent, is on your computer. The brain has to direct the body to do work, and the body has to report back what it “saw.” That requires constant high-frequency information exchange.

In AI, that neural signal has a name: token.

Input Tokens#

These are the raw materials you hand to the model.

They include the documents you upload, the background material an agent finds automatically, and, most importantly, your conversation with it, in other words, the prompt.
One thing to notice is that input usually also includes the running history of the conversation. If you do not control it, the input gets longer and longer as the conversation continues. Fortunately, most agents now have automatic truncation and compression so the context does not grow without limit.

Output Tokens#

These are the finished goods the model produces.

They include the model’s reply, the code it generates, or the final report it produces.

Thinking / Reasoning Tokens#

This is a special cost attached to higher-end reasoning models such as gpt-o1 or DeepSeek-R1.

This is not true reasoning in the human sense. It is more like the model drafting internally and running through a line of logic before it formally speaks. That invisible inner dialogue also produces tokens. You do not see it in the final answer, but it still consumes compute.

2. How the bill is actually calculated#

Before the formula, we need to introduce one old friend of a discount: cache.

If you ask an agent to repeatedly work on the same block of tokens, say the tokenized form of a 100,000-word manual, a smart provider will often keep that content around temporarily and give you a discounted input cost.

That brings us to the tokenizer.

Large models do not read text directly. A tokenizer first chops the text into small fragments, tokens. Caching is basically prefix matching. As long as the beginning of your conversation, such as the system prompt or reference document, turns into exactly the same token sequence as before, the brain can pull that part from memory instead of tokenizing it all over again.

The pricing formula for each request is simple enough:

The market is now very consistent on this point: usage-based billing.

Total Cost=(Inputmiss×Full Price)+(Inputhit×Cache Price)+(Output×Output Price)+(Thinking×Thinking Price)\text{Total Cost} = (\text{Input}_{miss} \times \text{Full Price}) + (\text{Input}_{hit} \times \text{Cache Price}) + (\text{Output} \times \text{Output Price}) + (\text{Thinking} \times \text{Thinking Price})Total Cost=(Inputmiss×Full Price)+(Inputhit×Cache Price)+(Output×Output Price)+(Thinking×Thinking Price)

In other words, your bill usually has four separate line items:

Input: new information the brain has to read
Cache Hit: old information the brain “remembers” and reuses, often extremely cheap per token even though the token count can be huge
Output: the final product the brain gives back to you
Thinking: the invisible draft a reasoning model generates before it opens its mouth

Pricing is usually quoted per million tokens. Here is a reference table for three representative models in 2026:

Model family	Standard input (Miss)	Cached input (Hit)	Output / Thinking	Notes
GPT-5.4	$2.50	$0.25 (10%)	$15.00	Flagship performance, very strong reasoning
GPT-5.4 mini	$0.75	$0.075 (10%)	$4.50	Extremely cost-effective, very fast
DeepSeek-V3.2	￥2	￥0.2 (10%)	￥3	A price butcher, ideal for cost reduction

Why agents get expensive so easily#

In an ordinary conversation, you ask one question and the model gives one answer. An agent is different because it calls the model in a loop.

To finish a complicated task, such as “build a webpage and deploy it,” an agent may generate several internal requests: first think through the steps, then search for information, write code, inspect its own errors, and only then deliver the result.

Every cycle of think, act, and observe produces tokens. And as context accumulates, the input side grows like a snowball. If you let an agent loop recklessly, it is very easy to burn through the equivalent of a tank of gas in a single day.

3. A shift in mindset: from worker to cyber-capitalist#

Once you understand the bill, the biggest feeling that remains is this: we need to abandon the laborer mindset completely.

What is the worker mindset?#

Traditionally, we focus on physical effort and immediate reward. If you spend three hours writing a report, you expect to be paid for those three hours. Inside that mindset, we are used to polishing details without counting the cost, because our own time feels fuzzy and “free.”

What is the capitalist mindset?#

Once you start using agents, you are no longer just a person writing code. You become a cyber-capitalist. Every task an agent runs for you has an explicit cash cost, and your attention shifts sharply:

Calculate profit: If this automated agent flow costs 2 yuan to run, can the time it saves create more than 2 yuan of value somewhere else? Or, if not, are you willing to spend 2 yuan just to buy yourself leisure or emotional value? If not, that agent spend should not exist.
Reduce cost and increase efficiency:
- If you were using the most expensive flagship model, can you split the task up and let cheaper models handle the simpler parts? Or trim the prompt so you reduce unnecessary input cost?
- How do you optimize the workflow so the agent produces better output and less nonsense while consuming the same number of tokens?
Be result-oriented: A cyber-capitalist should not care how hard the agent seemed to labor. They should not care whether it reduced human stress, and they should not even care whether it can solve some specific class of problem in the abstract. The only real standard is whether the final output of those tokens creates value. That value can be money, or it can simply be emotional value.

The real dividing line in the age of agents is whether you have the ability to manage digital labor. Stop thinking of yourself as the screw in the machine. Look at your bill. Optimize your workflow. The moment you start calculating the break-even point of every line of output, you are finally holding a ticket into the agent era.

Postscript#

It took me about an hour to go from brainstorming to the final piece. That included sorting out the logic, writing the first draft, revising it repeatedly, formatting the article, reviewing the translation, testing locally, and pushing the post to the cloud.

This is what one month of using agents intensely as a serious workstation has produced for me.

Don't Judge Personal Needs by Universal Standards

Sun, 12 Apr 2026 00:00:00 GMT

Sometimes I say that if I ever become financially free, I want to replace my computers with top-end machines every year. The first reaction is almost always the same: Why? Is that really worth it? Isn’t that wasteful? The value for money sounds terrible.

Questions like that used to stop me cold. All I could say, a little helplessly, was that it was my dream. Eventually I realized they were not really asking what I wanted. They were asking whether the purchase could be justified inside a market logic. What they cared about was exchange value: price, depreciation, resale value, specs, return. In other words, a set of standards that presents itself as objective, rational, and universal.

But that has never been the point for me.

Every little freeze I deal with, every moment when memory and CPU usage hit the ceiling, tells me the same thing: when I have the money, buy yourself a good one. So what matters more to me is use value, whether something actually has a living relationship with me. Does it fit the person I am right now? Will I really use it? Will it make me more focused, more free, more at ease in my own work? It is not a price tag, not a benchmark score, not a depreciation chart. Wanting something is an extension of my will.

I believe more and more in one very simple idea: needs come first, tools come after. If there is nothing you genuinely want to do, then even the most expensive computer is just a pile of premium parts. The real question is not the tool itself, but what you are trying to make possible. Ideas, desire, direction: that is where need actually comes from.

And those supposedly objective, rational, standardized value analyses? They may look like discussions about things, but what they often do is erase the person. They flatten why someone wants something, why they love it, why they believe it is worth it. They assume everyone should live by the same scale, judge desire by the same standard, and explain passion with the same answer.

So if someone asks me again whether a top-spec computer is worth it, I probably will not be left speechless anymore. It may not be worth it to everyone, but it is worth it to me. Not because it satisfies some universal standard, but because I know why I want it, how it would enter my life, and how it would help me turn something vague in my head into something real.

The meaning of many things is never really in the price. It lies in whether a real relationship can exist between an object and a particular person.

About This Site

Thu, 09 Apr 2026 00:00:00 GMT

Not Nothing, Not Yet / 星火之息

This is a personal blog, also a long-term experiment in writing and knowledge deconstruction.

Here I record:

Fragments: essays and side notes on technology and life.
Practice: workflows and in-depth hands-on use.
Reconstruction: the collaborative logic of tools, automation, and prompting.
Accumulation: a long-term archive that keeps growing and can be revisited.

This site is more than a loose collection of posts. I hope it can grow like a seed, and over time become a reusable personal knowledge system.

The primary language of the site is Simplified Chinese. English content is translated with the help of large language models and then lightly refined by hand.