· Steven Mays · AI & Development  · 11 min read

Slow Agents and Spare Tokens: Building an Investing Agent That Optimizes for Being Right

My stock-research agent evaluates one company at a time, fans out a dozen subagents to verify every claim, and runs only when I have spare tokens to burn. Slow by every metric that matters for an agent — which is exactly why I trust it. How it is built, and the first stock it picked.

My stock-research agent evaluates one company at a time, fans out a dozen subagents to verify every claim, and runs only when I have spare tokens to burn. Slow by every metric that matters for an agent — which is exactly why I trust it. How it is built, and the first stock it picked.

My investing agent takes most of an evening to evaluate one stock. It fans out subagents, reads filings, checks claims, writes down the bear case, and stops before moving on. It runs maybe a few times per day.

By normal agent standards, this is bad design. Slow. Expensive. Low throughput. That is intentional.

Most agents assume two things: the user is waiting, and tokens cost money. Neither is true here. I invest on a 5–10 year horizon, so ten minutes versus ten hours does not matter. I also run the agent on subscription capacity I already paid for. If that capacity resets unused, the marginal cost of a deep run is effectively zero.

So the design question changes. Not: How do I make this faster? But: How do I make this harder to fool?

That swap drives everything below: why slow is the feature, the spare-token economics that make a deep run effectively free, how the agent is built, and the first stock it actually advanced.

Why Slow Is a Feature

Every agent trades latency against error. A support bot has high latency cost because someone is waiting. The error cost is usually low because the next message can fix it.

A stock-research agent is the opposite. Latency barely matters. Error matters a lot. A missed winner is one idea out of thousands. A wrong “pass” can cost real money.

So I do not need the agent to find every winner. I need it to avoid confidently advancing bad ideas. When I can trade speed for verification, I do.

Slow is not the goal. Slow is the budget: time budget, token budget, verification budget. For this use case, the scarce thing is not speed. It is trust.

The Spare-Token Economics

If I ran this through a metered API, the design would be hard to justify. A single run can burn a lot of tokens. It pulls filings, verifies claims, runs red-flag sweeps, writes memos, and sends subagents after specific questions. On a normal meter, that price pushes you toward shortcuts.

I am not running it that way. I run it inside flat-rate subscription capacity that resets and does not roll over. If I do not use it, it disappears. Like a gym membership: I already paid. Not showing up does not save money.

So when I have spare capacity, I let the agent spend it. That changes the economics. The question is not “how many tokens did this use?” The question is “did I turn unused capacity into a better decision?”

A metered agent rewards compression. This setup rewards thoroughness.

What the Agent Does

The project is called agent-moonshot-research. It looks for asymmetric investment ideas: equity moonshots with a plausible 20–50x path over 5–10 years, lower-risk ~10x ideas with real downside protection, and macro or structural trades visible in primary data before the market fully reprices them.

It is not supposed to confirm what I already believe. That was an early bug. At first, it kept forcing ideas into my favorite themes: AI infrastructure, nuclear, GLP-1s. After thirty-plus candidates produced zero advances and clustered around those themes, I removed that logic.

Now the screener is shape-driven. A boring industrial 10-bagger counts the same as an AI infrastructure 10-bagger.

The agent also does not trade. It returns a verdict — ADVANCE, WATCH, PASS, or KILL — then gives a sizing plan and kill-switches. I place the orders. The agent is a research analyst, not a trader.

The Architecture

Three choices make the system slow.

The first is one play at a time. The agent analyzes a single stock, finishes the run, and clears context before the next idea. No long-running chat. No old assumptions leaking forward. No context rot. The cost is obvious. Nothing survives the wipe unless it is written down. Which leads to the second choice.

The second is disk-first state. So everything gets written down. Each ticker gets a directory: triage notes, pre-mortem, filing extracts, claim checks, red-flag sweep, adversarial review, scorecard, memo, sizing spec, kill-switches, and an evidence ledger. The rule is simple. A step does not count until the artifact is on disk. That buys two things. The evidence ledger gives every claim a source. For SEC filings, that means an accession number, so the reasoning is auditable later. And the work becomes resumable. If a run dies halfway, the next one picks up where it stopped. I do not want a chat transcript full of half-remembered context. I want files I can inspect, diff, commit, and rerun from.

The third is subagent fan-out. This is where the tokens go. The main agent orchestrates while focused subagents do the work. Filing-fetchers pull 10-Ks, 10-Qs, 8-Ks, proxies, and other primary sources. Claim-verifiers check one material claim at a time. Red-flag-sweepers look for accounting issues, insider selling, litigation, dilution, auditor changes, and related-party weirdness. An adversarial reviewer argues the bear case before the final scorecard. The orchestrator should not read a full filing if a subagent can extract the right section and cite it. That keeps the main context clean.

The loop looks roughly like this — the real version lives in the agent’s CLAUDE.md:

while operator says "continue":
    validate_state()
    if inbox.has_priority(): handle_it()

    for pos in held_positions:
        if stale(pos, days=7): re_evaluate(pos)

    if watchlist.stale(days=7):
        sweep_watchlist()

    play = queue.next() or run_screener()
    result = run_pipeline(play)
    write_to_disk(result)
    commit_and_push()
    stop_and_ask_operator()

The order matters. Owned positions come before new ideas because that is where the money is. Watchlist triggers come before new ideas because that is where something changed. New research comes last.

That is backwards if you want output. It is right if you want to avoid dumb losses.

The Pipeline

Inside run_pipeline, each stock goes through the same gates. Phase 0 is triage: a quick look to kill obvious losers. If the stock proceeds, it gets assigned to one bucket — moonshot, low-risk 10x, asymmetric value, or catalyst-driven. One bucket. One bar. No moving goalposts halfway through.

Then comes the pre-mortem. Before deep research starts, the agent writes the bear case. If this stock fails, why? That question has to be answered before the agent gets attached to the upside.

Phase 1 is primary sources. The agent pulls the actual documents: 10-Ks, 10-Qs, 8-Ks, proxies. No price targets as evidence. No company deck as proof. A deck can point to something worth checking. It does not get to be the check.

Phase 2 is verification. The agent checks each important claim against strong sources. Bull claims and bear claims both get tested. If confidence is too low, the play dies. If central claims are contradicted, the play dies.

Phase 3 is red flags. The agent sweeps for accounting issues, dilution, insider selling, auditor changes, litigation, customer concentration, and related-party weirdness. The goal is not to prove the company is perfect. The goal is to avoid missing the thing that should have killed the idea in ten minutes.

Phase 3.5 is adversarial review. A separate reviewer argues against the thesis before the final scorecard. Before, not after. Once a confident memo exists, the bear case becomes too easy to treat as a formality.

Phase 4 is outputs. If the play passes, the agent writes the scorecard, memo, sizing plan, kill-switches, and evidence ledger. Most ideas die before the memo. That is the point.

A Worked Example: WEAV

That is the theory. Let’s run one stock through every gate.

One real example: WEAV — Weave Communications. Weave sells communication and payments software to dental and medical practices. Small-cap. Sub-$10 at the time. Not flashy. Exactly the kind of company a theme-agnostic screener should find.

The pre-mortem wrote the bear case first: net revenue retention was in linear decline, customers were leaking, and growth was quietly ending. If true, the thesis was dead.

So the agent pulled retention data from 10-Ks, 10-Qs, the proxy, and 8-Ks. Every figure went into the evidence ledger. The bear framing did not hold. Retention was softening, but it was not linearly collapsing. The multi-year data contradicted the simple bear case.

Then Q1 2026 came out during the analysis. Revenue was up about 17% year over year. Non-GAAP operating income turned positive. Full-year guidance was raised. That did not make it a guaranteed winner. It did mean “growth is ending” was too simple.

The red-flag sweep helped too. The insider-selling detector found plenty of selling. A fast skim might stop there. But the sales were scheduled 10b5-1 vesting auto-sales, not discretionary open-market dumping. That distinction matters. “Insiders selling” sounds bad. “Employees automatically selling vested shares on a pre-scheduled plan” is mostly noise.

Final verdict: ADVANCE, 7/10. Not a screaming buy. More like a borderline moonshot that looked cleaner as asymmetric value, with a base case around 3x over five years. The sizing spec gave me a staged entry band of $5.20–$5.50 and twenty-one kill-switches: twelve hard, nine soft. The key one was retention drifting toward the hard-kill floor. Same metric the bear case targeted.

I bought WEAV on 2026-04-20 at $5.47, inside the band, and later scaled in. Since then, the weekly held-position gate checks it against the kill-switches. New filings, Form 4s, retention, free cash flow, guidance, and thesis changes all get reviewed. So far, none of the twenty-one kill-switches has tripped. Three are on watch.

That is the point of the system. Not a hot tip. A paper trail. I can see what I believed, why I believed it, which sources supported it, what would falsify it, and what has happened since. The full WEAV trail is public: triage, memo, scorecard, evidence ledger, and kill-switches.

What this is. I am describing my own position to make the example concrete. I am not recommending the stock. This is one small, speculative bet. “The agent advanced it” is not a substitute for doing your own work. See the disclaimer.

The Loop That Makes It Better

A slow agent that never learns is just slow. The useful part is the retro. Every quarter, it reviews past calls: RIGHT, WRONG, TOO EARLY, TOO LATE. Then it updates base rates, scoring rules, and verification recipes.

That is how it caught the theme problem. The meta-review surfaced the pattern: thirty-plus candidates, clustered around my preferred narratives, zero advances. So I removed the theme slots.

The line I keep coming back to is: decision quality over time, not artifacts per play. Beautiful memos on dead theses are not success. Speed gives you more output. The retro gives you a shot at being less wrong next quarter. Only one of those compounds.

Trade-offs

Step back, and the whole system is one trade. I spend tokens I cannot save to avoid a decision I cannot take back. One play at a time keeps context clean. Disk-first state makes the reasoning auditable and the run resumable. Fan-out spends the spare capacity on verification. The retro keeps it honest over time. Trust is the scarce resource, and every choice protects it.

This is wrong for most agents. For many use cases, it would be absurd. The slow design buys me a paper trail, cleaner source discipline, less hallucinated confidence, explicit kill-switches, and a decision I can audit later.

It costs throughput. It burns tokens. It is overkill for anything reversible or low-stakes. I would never build a normal chatbot this way. For this use case, I would not build it any other way.

The agent can still be wrong. Slowness does not make a thesis true. It does not protect against bad judgment, bad assumptions, or a market that disagrees for good reasons. It only reduces one kind of mistake: being carelessly wrong. That is the kind I can control.

Not Financial Advice

I own the stock named here. Nothing in this post is investment advice, a recommendation, or a solicitation to buy or sell anything. I am a software person describing a tool I built. I am not a licensed financial advisor.

An agent’s verdict is not diligence. It is an input to my diligence. The agent can be confident and wrong. I can be confident and wrong. Small-cap stocks can move violently. Markets can ignore a thesis longer than the thesis survives.

Do your own research. Size for being flat wrong. Do not buy a stock because someone on the internet — human or model — sounded sure.

Back to Blog

Related Posts

View All Posts »