Indie Agent Builders — Week of June 27
July 4, 2026 · 10:25 AM

Indie Agent Builders — Week of June 27

This week’s issue focuses on the shift from agent demos to harness engineering: subagent delegation, prompt evaluation, review gates, resumability, and token/context control. It covers Simon Willison’s Fable-driven build logs, AI Engineer World’s Fair’s loops and software-factory debate, and the GitHub projects with the clearest implementation signals for agent builders.

This issue covers June 27 at 10:00 through July 4 at 10:00 in the channel timezone, UTC-8. It was the week agent builders stopped treating "agent" as the interesting noun. The useful action moved into harnesses: how to delegate work to cheaper subagents, how to keep humans in the outer loop, how to make CLIs legible to coding agents, how to review long-running work, and how to compress the context that agents burn through.
Two events set the tone. Anthropic said the U.S. Department of Commerce lifted export controls on Claude Fable 5 and Mythos 5 on June 30, with access resuming July 1. 1 The same week, AI Engineer World's Fair ran in San Francisco from June 29 to July 2 with 29 tracks, more than 300 speakers, more than 100 expo partners, and more than 6,000 attendees. 2 The shared lesson from both stories: frontier models matter, but the durable advantage is the operating system around them.

Simon Willison: build logs you can copy

Simon Willison, creator of Datasette and co-creator of Django, had the most directly replicable week. His posts are useful because they expose the actual prompts, tools, and failure modes behind the work.

llm-coding-agent: a small Claude Code-style agent built with Fable

On July 2, Simon released llm-coding-agent 0.1a0, a Claude Code-style coding agent built on top of his llm Python library. He created it with Claude Fable 5 through two rounds of prompting: first asking for spec.md, then using a red/green test-driven-development loop to implement it. 3
The initial release has six tools: edit_file, execute_command, list_files, read_file, search_files, and write_file. It also supports a --yolo mode that skips approvals, an --allow command whitelist, and a Python API with CodingAgent(model="gpt-5.5", root="/path", approve=True).run(...). 3 Simon's verdict was restrained but useful: "It's pretty good for a first attempt!" 3
The part worth copying is not the tool list. It is the development loop: ask the model to write the spec, force the implementation through failing tests, then inspect what the model added without being asked. Simon noticed that Fable implemented a CodingAgent(...) class he had not requested, which gave him a reusable API surface rather than only a CLI. 3

Delegating implementation to cheaper subagents

On July 3, Simon published a Fable usage pattern from his AI Engineer World's Fair fireside chat with Claude Code team members Cat Wu and Thariq Shihipar, plus a tip from Jesse Vincent. The prompt was simple: "For all coding tasks use your judgement to decide an appropriate lower power model and run that in a subagent." 4
Claude then wrote a memory file that encoded the policy: implementation work can usually run on Sonnet, mechanical edits can run on Haiku, while design, audit, and synthesis stay with the main model. 4 Simon said the approach was working: "I'm getting a ton of work done and my Fable allowance is shrinking less quickly than before." 4
For engineers running expensive agent loops, this is a small pattern with a large surface area. Do not hard-code every routing rule up front. Give the top model authority to route routine work down, and keep the review loop where the expensive judgment is needed.

DSPy exposed a prompt bug in Datasette Agent

Simon also used Claude Fable 5 and DSPy to evaluate Datasette Agent's SQL system prompt on July 2. The harness ran a DSPy agent against a live in-process Datasette instance, using Datasette Agent's real tools and prompt. 5
The failure mode was specific. The schema listing showed table names but not column names. The prompt also told the agent not to call describe_table if it already had the information. That combination led the agent to guess column names such as page_count, o.order_id, and first_name, then fall into error-retry loops. 5
This is a good example of harness testing finding a bug that a one-off demo might miss. The fix is likely boring and valuable: put column names into the schema listing, or soften the instruction that discouraged describe_table. 5

shot-scraper video: make your tool's help text agent-readable

On June 30, Simon released shot-scraper 1.10 with a new shot-scraper video command. The command accepts a storyboard.yml file that defines browser actions, then uses Playwright to record a demo video. 6
Simon used GPT-5.5 xhigh in Codex Desktop to generate the storyboard YAML and docs for a Datasette bulk-insert demo. The feature depends on Playwright 1.61.0's screencast mechanism, which fixed earlier white-frame and fixed-width recording problems. 6 His takeaway is the part to steal: command --help output can work like a bundled SKILL.md file when it gives a coding agent enough structure to use the tool correctly. 6

Sonnet 5 changed the operating cost calculation

Claude Sonnet 5 arrived on June 30 with a 1M-token context window, 128K maximum output, adaptive thinking enabled by default, and no temperature, top_p, or top_k sampling parameters. 7 Anthropic kept nominal pricing at $3 per million input tokens and $15 per million output tokens, with promotional pricing of $2/$10 through August 31. 7
The hidden cost change is the tokenizer. Simon measured the same English Universal Declaration of Human Rights text rising from 2,356 to 3,341 tokens, or 1.42x. Python code rose from 44,014 to 56,113 tokens, or 1.27x. Simplified Chinese barely moved, from 3,334 to 3,360 tokens. 7 If your agent loop is English-heavy and tool-output-heavy, the model may be materially more expensive even when the price sheet looks unchanged.

AI Engineer World's Fair: the harness becomes the product

The conference material from Swyx and Latent Space is less directly copy-pasteable than Simon's posts, but it explains why the same vocabulary kept appearing across independent builders: loops, factories, review debt, outer loops, and human agency.
Swyx opened the Day 2 Software Factories track with "Loopcraft: The Art of Stacking Loops." His framing was that AI engineering has moved from chat to tools to goals, and now to automations, cron jobs, and loops. 8 In the same dispatch, Sonar CEO Tariq Shaukat called 2026 the "year of the harness," meaning the system around the model has become the main engineering object. 8
The most concrete software-factory formulation came from Tereza Tizkova of Factory: "A software factory is the whole loop, the whole lifecycle of developing software with autonomy." 8 Warp CEO Zach Lloyd pushed the same direction in a Latent Space interview. Warp has open-sourced its core CLI, reports more than 60,000 GitHub stars and more than 800,000 developers, and has launched Oz as a platform that connects models, coding harnesses, local and cloud sandboxes, Jira, Linear, Slack, Teams, and GitHub. 9 Lloyd said he has not hand-written code for six months and predicted that within a year every important software project will have some kind of automated factory. 9
The useful counterweight came from the agency track and the closing loops debate. Addy Osmani framed the split as an Agency Ladder: "That inner loop is capability. The outer loop is agency." 10 Geoffrey Litt, now at Notion, objected to the software-factory metaphor and argued that developers need enough understanding to participate rather than merely approve. 10 Simon separately captured Litt's "Understand to Participate" talk on July 2, where Litt warned that lack of conceptual fluency limits a developer's ability to take part in the project. 11
Paul Bakaus of Impeccable made the same constraint sharper: "There is no auto, and there will be no auto." His design philosophy is that agents handle the first 80% and humans complete the last 20%. 10 Dex Horthy was more skeptical in the closing loops debate: "I haven't seen proof that we are at a point where we can just step up an abstraction level. I actually think we need to step down an abstraction level, if anything." 12 Geoffrey Huntley, arguing pro-loop, offered the operating image: "[We're] kind of like locomotive engineers now. That's our job: to keep the locomotive on the rails." 12
For builders, the practical middle ground is clear. Treat loops as production infrastructure, but do not let the loop hide review debt. Amplify's 2026 AI Engineer Survey, released during the closing dispatch, found that 95% of respondents use agents, 89% of agents can write data, 40% of respondents say AI costs limit usage, and 59% worry that AI-generated code creates long-term debt. 12
Two workshop/interview notes make that operational. Zack Proser of WorkOS and Nick Nisi ran a one-hour "Lifestyles of the AI-Native" workshop with four modules: voice coding, loops and goals, verification gates, and scheduled tasks. 13 Proser's rule is the one to keep near your CI config: done is a checklist, not a vibe; operators trust gates, not the model's confidence. 13 Andrew Qu, Vercel's Chief of Software, described agents as "a new type of software" that needs primitives such as context, tools, resumability, and long-running work. 14 Qu's Vercel example is concrete: Vercel is putting agents into its website, Slack, and dashboard, while also serving Markdown directly to agent requests and visual HTML to humans. 14

GitHub: context, web access, governance

The GitHub set was uneven, so the useful view is not "what is biggest." It is which repos expose a pattern worth inspecting.
ProjectWhy builders should inspect itThis week's signal
NousResearch/hermes-agentA self-improving coding agent that creates skills from experience and uses a ~/.hermes/hermes-agent layout.209,000 stars, the highest-starred agent repo in this tracking set. 15
anomalyco/opencodeA TypeScript open-source coding agent with separate build and plan agents, where plan is read-only.182,000 stars, about 14,772 commits, more than 5,000 open issues, and more than 1,100 open pull requests. 16
Panniantong/Agent-ReachA web-access layer that gives agents read and search access across 15 platforms including X, Reddit, YouTube, GitHub, and Facebook.Grew from 43,300 to 50,500 stars, crossing 50,000 and becoming the fastest percentage grower among tracked repos. 17
headroomlabs-ai/headroomA compression layer for tool outputs, logs, files, and RAG chunks before they reach the model.Grew from 52,500 to 56,500 stars; the README claims 60-95% fewer tokens with the same answers. 18
google/agents-cliA CLI and skills package that turns existing coding agents into Google Cloud agent builders rather than acting as a coding agent itself.v0.6.1 shipped June 28; the repo has 4,700 stars and has shipped 13 releases in 71 days since April 21. 19
microsoft/agent-governance-toolkitProduction governance for agents, including an MCP Security Gateway for tool poisoning, drift, and hidden-instruction checks.4,600 stars; the toolkit includes 127 checks and maps to OWASP Agentic Top 10 concerns. 20
NVIDIA/skillsA skills ecosystem with plugins for Claude, Cursor, and other agent surfaces.Grew from 1,900 to 2,200 stars, with 384 commits and 251 forks. 21
Two repo movements line up with the conference narrative. Agent-Reach's growth suggests builders still want more agent-visible internet, but through installable tools and health checks rather than brittle one-off scraping. 17 Headroom's growth points at the other constraint: once agents call many tools, raw context becomes a cost center. 18
That also explains why Swyx's July 3 post about CLI tools landed with engineers. He argued that "tools for thought" spent years making polished canvas demos, while low-contrast CLIs won because they do "commodity thinking" for the user. 22 The post had 207 likes, 141 bookmarks, 36 replies, 4 retweets, 3 quotes, and 27,714 views in the captured data. 22 For agent builders, the interface lesson is blunt: if the CLI gives the model stable affordances, the visual layer can be ugly and still win.

What to try next week

If you are building an agent system, the highest-leverage experiments from this issue are small enough to run in an existing repo.
First, add a routing instruction that lets your strongest model delegate implementation to cheaper subagents while keeping design and review in the main loop. Simon's Fable memory-file pattern is the cleanest reference. 4
Second, run one harness evaluation that targets a boring prompt failure. Datasette Agent's column-name guessing problem is the model example: the issue was not model intelligence, it was missing schema context plus a prompt rule that discouraged the right tool call. 5
Third, treat every long-running agent feature as a review and resumability problem. Vercel's Qu named context, tools, resumability, and long-running work as primitives for this new software type. 14 The AIEWF debate added the constraint: the loop can carry more execution, but the outer loop still needs human authority, verification gates, and enough understanding to intervene. 10
Cover image: Swyx presenting Loopcraft at AI Engineer World's Fair 2026, from Latent Space's AIEWF dispatch.

More from this channel

Related content

  • Sign in to comment.