Make sure your project has a CLAUDE.md or CLAUDE.local.md. You can use Claude to help you come up with this. Use Claude to maintain a list of common, simple workflows and reference them in CLAUDE.md. It’s not great yet for large scale work or refactors but it’s getting better month-by-month. It may never be able to do big work in one-shot.
I run tasks in parallel and definitely hit the rate limits.
If you posted some samples that would help us quiet a bit.
Claude successfully makes code edits for me 90% of the time. My two biggest pieces of advice blind are
1. Break down your task into smaller chunks - 30 minutes worth of human coding max. 2. On larger code bases git hints on what files to edit.
the number one important thing: ask it to write tests first, then the code. and instruct it to not overmock or change code to make tests succeed.
besides this: I have great results in combination with https://github.com/BeehiveInnovations/zen-mcp-server but ymmv of course and it requires also o3 and gemini api keys but the token usage is really low and the workflow works really great if properly used
Is it possible to develop an intuition where it would do a decent job and use it only for these type of tasks or the result is always random?
This sounds like a decent work flow. What makes you think it’s ineffective?
[dead]
You’re definitely not alone “AI code loot box” is a great description! I’ve been experimenting with Claude Code (and the other major models) since late last year, and my success rate seems to track yours unless I’m deliberate about “prompt engineering” and workflow. Here are a few things that have helped me get better, more reliable results:
1. Be uncomfortably explicit in prompts: Claude Code in particular is very sensitive to ambiguity. When I write a prompt, I’ll often:
Specify coding style, performance constraints, and even “avoid X library” if needed.
Give sample input/output (even hand-written).
Explicitly state: “Prefer simplicity and readability over cleverness.”
2. Break down problems more than feels necessary: If I give Claude a 5-step plan and ask for code for the whole thing, it often stumbles. But if I ask for one function at a time, or have it generate stub functions first, then fill in each one, the output is much more solid.
3. Always get it to generate unit tests (and run them immediately): I now habitually ask: "Write code that does X. Then, write at least 3 edge-case unit tests." Even if the code needs cleanup, the tests usually expose the gaps.
4. Plan mode can work, but human-tighten the plan first: I’ve found Claude’s “plan” sometimes overestimates its own reasoning ability. After it makes a plan, I’ll review and adjust before asking for code generation. Shorter, concrete steps help.
5. Use “summarize” and “explain” after code generation: If I get a weird/hard-to-read output, I’ll paste it back and ask “Explain this block, step by step.” That helps catch misunderstandings early.
Re: Parallelization and rate limits: I suspect most rate-limit hitters are power-users running multiple agents/tools at once, or scripting API calls. I’m in the same boat as you — the limiting factor is usually review/rework time, not the API.
Last tip: I keep a running doc of prompts that work well and bad habits to avoid. When I start to see spurious/overly complex output, it’s nearly always because I gave unclear requirements or tried to do too much in one message.