Context compounds — until it doesn't!

I spent most of the last two days on the tools with Claude and Replit, and came out of it with a clearer view of where the real work now sits when you build with AI. It isn't in the prompting, and it isn't in the model choice. It's in managing what's in the context window — and, just as importantly, what's been left behind.

Monday: one design system, three decks

On Monday I rebuilt my slide deck design system in Claude Design. I started by uploading my website code repo so it had something to work with, then built three decks:

Deck one: ~90 minutes. A lot of riffing. I tweaked the design system as I went, including adding some background slide images to both the project and the design system (both Claude projects/sessions).
Deck two: ~30 minutes. Less riffing as I added output JSX slide files from deck one as a model for the second deck. Still all sorts of logo issues though, so I had to really tie down the files in the design system, including adding a tone-of-voice file.
Deck three: ~5 minutes. One shot from a simple prompt, plus a few minor edits.

As one would expect, the upfront investment in tight reference files pays back quickly — when the reference files are clean. The investment isn't the prompt; it’s the context around the prompt.

Tuesday: the mess I'd left behind

The website was a different story. I built the patching.co site in a vibe-coding session using Replit, and in my haste to ship I left orphaned files, dead imports, and lines of code from three or four different ideas I'd tried and abandoned. It worked. It also weighed 8MB across 267 files.

I only spotted how bad it was when I pulled the whole repo into a Claude Opus 4.7 session and asked the model to find inconsistencies as I suspected it might be a dumpster fire. It found plenty! After a focused clean-up it was considerably more svelte at 75 files, 3MB. A roughly 70% improvement just from bothering to do the housekeeping!

Worth keeping in mind: the model that generated the mess won't volunteer to clean it up. There's no built-in incentive to revisit past decisions, flag dead code, or challenge its own earlier output. That's on you.

Two takeaways

1. Context is a compounding asset — until it isn't.

In a running AI session, good context compounds: each exchange builds on the last, the model gets sharper, answers get more useful. Up to a point. After that point, the dead ends, changes of direction, and earlier errors start to compound too — and the model produces less useful answers, not more.

My most effective workflow over the two days was a two-model loop: Claude (when give code files as context) wrote detailed technical Replit prompts that explicitly asked Replit to give a back-brief before making changes. I fed those back-briefs to Claude. Both models got things wrong every now and then, and when they did, would usually start spiralling until I stepped in and bluntly course corrected. What I was missing was the ability to surgically trim the wrong bits out of the context window — to keep the good reasoning and drop the bad turns. For now you either keep the whole thread or start a new one, and both are blunt instruments.

2. Models don't clean up after themselves.

After any substantial block of work — a feature, a migration, a long session — you must stop and take stock. Ideally you hand the output to a different model, or a fresh session of the same one, and ask it to pick holes. This isn't a nice-to-have. It's the equivalent of code review, and skipping it is how you end up with 267 files and 8MB of drift.

Where I think this may be going: thin harnesses and pruned context

I've read a lot lately about the value of a thin harness — lightweight, enduring context files, version-controlled, model-agnostic — rather than relying on a given tool's memory or chat history. It maps directly to my own experiences. The discipline is identical to good document management: a small number of canonical files, kept current, that any model can be pointed at.

I've started building mine in a permissive Git repo. First up a rob-tone-of-voice.md (drafted with Claude and Copilot together based on email data and previous blog posts) that any model can load before writing anything in my voice; it did a reasonable job of turning some unstructured ramblings into this blog post, which I have only had to make relatively minor edits to. There are more files to follow — business context, product spec, architecture, infra decisions etc. Short, current, referenced deliberately rather than hoovered up ambiently.

The wider question, which I am curious to watch play out: how does the application layer at the top of the AI stack evolve from here? The commercial incentives to run open-source models on local inference hardware are strong and getting stronger. Lightweight, model-agnostic harnesses paired with local inference will compete with cloud-based frontier models in ways that are hard to predict today — on cost, on privacy, on latency, and on the sheer control you have over what's in the context.

I don’t think anybody quite knows the answer yet. Frontier models are ahead on raw capability and will stay ahead for a while. But I think the locus of value is quietly shifting from "which model" to "what context, managed how" with models being somewhat fungible. That shift rewards people who treat their reference files with the same discipline they'd apply to their codebase.

Bottom line

Pick your model, but spend most of your energy on the harness around it. Keep a small set of enduring context files under version control. Load them deliberately. Take stock after every substantial block of work, and get a fresh pair of eyes — human or model — to pick holes. Clean up your own mess, because the model won't!

If you're navigating similar challenges, we can help—from cyber risk assessments to architecture advice, AI advice, training, operating model design or trusted partner introductions. Reach out below.👇