Building a Personal AI Factory

derek | 214 points

My hunch is that this article is going to be almost completely impenetrable to people who haven't yet had the "aha" moment with Claude Code.

That's the moment when you let "claude --dangerously-skip-permissions" go to work on a difficult problem and watch it crunch away by itself for a couple of minutes running a bewildering array of tools until the problem is fixed.

I had it compile, run and debug a Mandelbrot fractal generator in 486 assembly today, executing in Docker on my Mac, just to see how well it could do. It did great! https://gist.github.com/simonw/ba1e9fa26fc8af08934d7bc0805b9...

simonw | 11 hours ago

It’s hard to evaluate setups like this without knowing how the resulting code is being used.

Standalone vibe coded apps for personal use? Pretty easy to believe.

Writing high quality code in a complex production system? Much harder to believe.

photon_garden | 13 hours ago

The basic idea is that you can continuously document what your system should do (high level and detailed features), how it should prove it has done that, optionally how you want it to do it (architecture and code style etc).

The multi-model AI part is just the (current) tool to help avoid bias and make fine tuned selections for certain parts of the task.

Eventually large complex systems will be built and re-built from a set of requirements and software will finally match the stated requirements. The only "legacy code" will be legacy requirements specifications. Fix your requirements, not the generated code.

webprofusion | 8 hours ago

I am experimenting with a similar workflow and thought I'd share my experience.

I might be a little too hung up on the details compared to a lot of these agent cluster testimonials I've read, but unlike the author I'll be open and say that the codebase I work on is several hundred thousand lines of Go and currently does serve a high 5 to low 6 figure number of real, B2C users. Performance requirements are forgiving but correctness and reliability are very important. Finance.

Currently I use a very basic setup of scripts that clone a repo, configure an agent, and then run it against a prompt in a tmux session. I rely mainly on codex-cli since I am only given an OpenAI key to work with. The codex instances ping me in my system notifications when it's my turn, and I can easily quake-mode my terminal into view and then attach to the session (with a bit of help from fzf). I haven't gotten into MCP yet but it's on my radar.

I can sort of see the vision. For those small but distracting tasks, they are very helpful and I (mostly) passively produce a lot more small PRs to clean up papercuts around our codebase now. The "cattle not pets" mentality remains relevant - I just fire off a quick prompt when I feel the urge to get sidetracked on something minor.

I haven't gotten as much out of them for more involved tasks. Maybe I haven't really got enough of a context flywheel going yet, but I do typically have to intervene most of the time. Even on a working change, I always read the generated code first and make any edits for taste before submitting it for code review since I still view the output as my complete responsibility.

I still mostly micromanage the change control process too (branching, making commits, and pushing). I've dabbled in tools that can automate this but haven't gotten around to it.

I 100% resonate with the "fix the inputs, not the outputs" mindset as well. It's incredibly powerful without AI and our industry has been slowly but surely adopting it in more places (static typing, devops, IAC, etc). With nondeterministic processes like LLMs though it feels a lot harder to achieve, more like practice and not science.

dgunay | 4 hours ago

Thanks for the writeup!

I talked about a similar, but slightly simpler workflow in my post on "Vibe Specs".

https://lukebechtel.com/blog/vibe-speccing

I use these rules in all my codebases now. They essentially cause the AI to do two things differently:

(1) ask me questions first (2) Create a `spec.md` doc, before writing any code.

Seems not too dissimilar from yours, but I limit it to a single LLM

marviel | 11 hours ago

ADHD coding, brute forcing product generation until you get it right? Just freaking write the code that you can expand and modify in the future instead of increasing your carbon footprint.

geekymartian | 10 hours ago

"Fix inputs" => The assumption is there exists some perfect input that will give you exactly what you want.

It probably works well for small inputs and tasks well-represented in the training data (like writing code for well-represented domains).

But how does this work for old code, large codebases, and emergencies?

- Do you still "learn" the system like you used to before?

- How do you think of refactoring if you don't get a feel for the experience of working through the code base?

Overall: I like it. I think this adds speed for code that doesn't need to be reinvented. But new domains, new tools, new ways to model things, the parts that are fun to a developer, are still our monsters to slay.

nilirl | 3 hours ago

I'd love to see more specifics here, that is, how Claude and o3 talk to each other, an example session, etc.

steveklabnik | 13 hours ago

No real mention of results that aren’t self-referential.

I guess vibe-coding is on its way to becoming the next 3D printing: Expensive hobby best suited for endless tinkering. What’s today’s vibe coding equivalent of a “benchy”? Todo apps?

namuol | 7 hours ago

And here I am struggling to get Claude to create a nice-looking search bar a la booking.com , with some adjustments for my personal use case; it does ok, but never gets to the end result and once I refreshed my Tailwind knowledge it felt much slower than hand coding. I feel like I'm living in a different world.

mmarian | 3 hours ago

I went down this (and even built a bit of internal web tooling) —- it’s like playing multiple games of online poker for me (instead of the factoria analogy here)

it’s really promising, but I found focusing on a single task and doing it well is still more efficient for now. excited for where this goes

dkdcio | 11 hours ago

Show us the code, mate.

caporaltito | 2 hours ago

> It’s essentially free to fire off a dozen attempts at a task - so I do.

What sort of subscription plan is that?

skybrian | 13 hours ago

I actually don't understand how you can offload the instruction pointer of the program to another program, permanently. How are you accountable for anything then? You can't debug, you can't program, just a tourist in your own home. Own your code, even if AI wrote it.

am17an | 7 hours ago

> Is 'Azure OpenAI subscription' cheaper than ChatGPT via OpenAI?

barrenko | 3 hours ago

Okay, what is he actually building with this?

I have a problem where half the times I see people talking about their AI workflow, I can't tell if they are talking about some kind of dream workflow that they have, or something they're actually using productively

IncreasePosts | 13 hours ago

This sounds great, and is similar to the workflow I get from a high level stand point with https://ampcode.com/ - albeit without the model wrangling.

To the author & anyone reading - publicly release your agent harnesses, even if its shit or vibe coded! I am constantly iterating on my meta and seeking to improve.

hamish-b | 2 hours ago

The issue I'm facing with multiple agents working on separate work trees is that each independent agent tends to have completely different ideas on absolutely every detail, leading to inconsistent user experience.

For example, an agent working on the dashboard for the Documents portion of my project has a completely different idea from the agent working on the dashboard for the Design portion of my project. The design consistency is not there, not just visually, but architecturally. Database schema and API ideas are inconsistent, for example. Even on the same input things are wildly different. It seems that if it can be different, it will be different.

You start to update instruction files to get things consistent, but then these end up being thousands of lines on a large project just to get the foundations right, eating into the context window.

I think ultimately we might need smaller language models trained on certain rules & schemas only, instead of on the universe of ideas that a prompt could result in. Small language models are likely the correct path.

vFunct | 12 hours ago

This "AI factory for everyone" model may be able to break resource inequality and allow people from more places to participate in truly valuable entrepreneurship.

guicen | 8 hours ago

> If you know Factorio you know it’s all about building a factory that can produce itself

This is a very interesting concept

Could this be extended to the point of an LLM producing/improving itself?

If not, what are the current limitations to get to that point?

nico | 6 hours ago

> I keep several claude code windows open, each on its own git-worktree.

Can someone convince me they're doing their due-diligence on this code if they're using this approach? I am smart and I am experienced, and I have trouble keeping on top of the changes and subtle bugs being created by one Claude Code.

petesergeant | 9 hours ago

This sounds nice and great and all, but I wonder what the output is like and if there is a measurable difference between doing the factory and trying to two shot the whole thing with claude 4 sonnet.

nurettin | 2 hours ago

> When something goes wrong, I don’t hand-patch the generated code. I don’t argue with claude. Instead, I adjust the plan, the prompts, or the agent mix so the next run is correct by construction.

I don't think "correct by construction" means what OP thinks it means.

solomonb | 12 hours ago

ppl are getting slowly disillusioned with vibe coding.

yes AI assisted workflow might be here to stay but it won't be the magical put programmers out of job thing.

And this the best product market fit for LLMs. I imagine it will be even worse in other domains.

apwell23 | 10 hours ago

Maybe a bit off-topic, but the minimalist style of the blog looks really cool.

c4pt0r | 10 hours ago

"Here’s the secret sauce: iterate the inputs":

No it isn't. There are no short cuts to ... anything. You expend a lot of input for a lot of output and I'm not too sure you understand why.

"Example: an agent once wrote code ..." - not exactly world beating.

If you believe this will take over the world, then go full on startup. YC is your oyster.

I've run my own firm for 25 years. Nothing exciting and certainly not YC excitable.

You wont with this.

gerdesj | 10 hours ago