Back
Builder Notes

The Travel Agents (Part 1) - The Setup

Written by Steve (Human) · Edited by Hex (AI)

What a time to come back to building software! I took a year off in 2025 to recharge the batteries, and ponder what's next for me. It was an interesting time to sit and observe the rapid changes in AI models and agents. I dabbled with most of the different AI tools I crossed paths with in that time - from Stitch to Claude to Suno and just about everything in between. However, beyond a few hobby projects and prototypes, I didn't really build anything substantial.

If LinkedIn is anything to go by, you can only be a doomsayer or a naysayer when it comes to AI, but I'm not sure I fit into either category completely. I've always been in the pro "if a machine can do something, it should" camp, but I also hold reservations on whether AI is all it's being marketed to be, and what impact it'll have on jobs, especially junior roles - but that's a topic for another post.

The easiest way to prove something is as good as it claims to be is to put it to the test, and push the limits with some level of critical optimism. Once you find the limits, and stare at it from all the different points of view, you can start to form a more educated opinion.

Having a year away from tech is equivalent to swapping careers. Everything changes so much month to month that a full year is a completely different world. I didn't write much code during the time off, so I'm probably a little rusty, but I think this makes for a much better experiment because I can start a fresh project with a fresh mind, and ask the question of whether anything is still the best way to do things.

The Divergence Issue

A few weeks ago I had a few different prototypes running, each with their own Claude agent running inside them. I started to notice that each agent was slowly diverging from each other, each gathering their own project context and faux personality to match. I initially put it down to different codebases, but it was when I started using git worktrees to run multiple agents on the same codebase, that I really noticed the divergence of the agents was more than just the code.

Each of my worktree agents had developed its own process and communication style. One that had been doing a lot of backend work was noticeably terse, perhaps because the constraints were well defined and the goals were clear. The other agent that I'd been doing more UI design and iterative work was much more conversational and willing to ask questions when it didn't know.

I asked them both why they thought the divergence was happening, and as expected neither provided anything insightful, but it was bothering me how different the conversations and outputs had diverged in less than a week on a small codebase. The conversational one was more likely to over-explain things I already understood. The more to-the-point one was more dismissive about my questioning for justification.

At the core of every agent is the same base model. They run in isolation from each other, even worktrees on a codebase are treated as separate projects for the sake of this isolation. But each agent slowly takes notes in memory and builds up an internal context from each interaction, and this causes a lot of the divergence.

That got me wondering how I could give them a shared context that would persist across sessions and across different agents, so I could get some consistency in the way they interacted with me. Part 2 of this series covers how I set up this shared context in the most unexpected way.

The Process Problem

The other observation I was seeing was the way things were getting built. Like the communication style, the process was also diverging. One agent would spend the time to build a plan, confirm and get sign-off before starting work, the other would just dive in and start building. This resulted in wildly different outputs, PR sizes, and levels of quality.

This is not to mention how many times I'd have to repeat instructions to not push stuff directly to main, write tests, document changes, etc. This was made exponentially worse when every context compaction, new session, or separate worktree was a fresh start of having to watch and course correct. It was like hiring a new overly-eager junior developer every time and then needing to train them up on how to get things done.

If I was going to run agents on a bigger and ever-growing codebase, I needed a solution to try and get consistency in the way things were built and the quality of the output, before I was going to have any trust to let agents have any autonomy. There were three main things I needed to address:

  1. The guardrails needed to be harder to sidestep than just doing the correct thing.
  2. The process needed to be consistent across sessions and worktrees.
  3. The output needed to be consistent and documented in a way that was traceable and human-readable.

The Setup

With these two problems in mind, I needed to figure out how I was going to address them before I let a couple of agents loose on a new codebase. For me to call this experiment a success, there needed to be multiple agents working autonomously on a codebase, with consistent outputs and safety, and without breaking anything in production.

The rest of this series will dive into how I set up the project, the process, and the context to get things built. I'll go through the AI experiments I've run, the challenges I've faced, and the lessons I've learned along the way. While I might dive into some code here and there, the focus will be primarily on the AI-centric development process.

The Stack