Here is how I use it: As a writing assistant that lives in my IDE, and as a very very cool and sophisticated rubber duck that can answer me.
Something I do quite a lot is throwing back and forth a discussion over a particular piece of code, usually provided with little to no context (because that's my task to worry about), hammering it until we get that functionality correct, then presenting it with broader context to fit it in (or I simply do that part by hand).
Here is how I don't use it: As an agent that gets broad goals that he is supposed to fulfill on its own.
Why? Because the time and effort I have to invest to ensure that the output of an agentic system is in line with what I actually try to accomplish, is simply too much, for all the reasons outlined in this excellent article.
Ironically, this is even more true, since using AI as an incredibly capable writing assistant, already speeds up my workflow considerably. So in a way, less agentic AI empowers me in a way that makes me more critical of the additional time I'd have to invest to play around the quirks of agentic AI.
Developer skill is obviously still essential — you can’t steer if you couldn’t drive. But what about developer energy? Before AI I could only code about 2 hours per day (actual time spent writing code) but with Claude Code I can easily code for 5 hours straight without breaking a sweat. It feels like riding an e-bike instead of a bicycle. AI genuinely feels like Steve Jobs analogy of a bicycle for the mind — it doesn’t replace me but now I can go much farther and faster.
I don't get it, at all.
Why are experienced developers so enthusiastic about chaining themselves to such an obviously crappy and unfulfilling experience?
I like writing code and figuring stuff out, that's why I chose a career in software development in the first place.
> Example: When encountering a memory error during a Docker build, it increased the memory settings rather than questioning why so much memory was used in the first place.
AI really is just like us!
I'm surprised to not see an obvious one (for me): Use AI around the periphery.
There's very often a heap of dev tools, introspection, logging, conversion, etc tools that need to be build and maintained. I've had a lot of luck using agents to make and fix these. For example a tool that collates data and logs in a bespoke planning system.
It is a lot of generated boilerplate off the critical path to build these tools and I just don't want to do it most days.
I find that I have to steer the AI a lot, but I do have optimism that better prompting will lead to better agents.
To take an example from the article: code re-use. When I'm writing code, I subconsciously have a mental inventory of what code is already there, and I'm subconsciously asking myself "hey, is this new task super similar to something that we already have working (and tested!) code for?". I haven't looked into the details of the initial prompt that a coding agent gets, but my intuition is that an addition to the prompt instructing the agent to keep an inventory of what's in the codebase, and when planning out a new batch of code, check the requirements of the new tasks against what's already there.
Yes, this adds a bunch of compute cycles to the planning process, but we should be honest and say "that's just the price of an agent writing code". Better planning > ability to fix things.
i have mentored a few people to become Programmers. Some for months, some for years. It's like teaching someone to ride a bycicle. Hand-holding first, then hand-guiding, then short flights, then longer... Different people pick stuff at different pace and shape.. but they do learn.. if they want to.
What i completely miss in these LLM parrots-agents-generators, is the learning. You can't teach them anything. They would not remember. Tabula rasa / Clean slate, every time. They may cite Shakespeare - or whatever code scrubbed from github - and concoct it to unrecognizability - but that's it. Hard rules or guardrails for every-little-thing are unsustainable to keep (and/or create) - expert-systems, rule-based no-code/low-code.. has been unsuccessful for decades).
Maybe, next AI wave.
And, there's no understanding. But that also applies to quite some people :/
Is Martin Fowler now just renting out space on his website?
Lack of reuse
AI-generated code sometimes lacks modularity, making it difficult to apply the same approach elsewhere in the application.
Example: Not realising that a UI component is already implemented elsewhere, and therefore creating duplicate code.
Example: Use of inline CSS styles instead of CSS classes and variables
This is the big one I hit for sure. I think it's a problem with agentic RAG, where it only knows the files it's looked in and not the overall structure or where to look for things, so it just recreates them.I don't really like AI in IDE. I don't want them to think for me. Code completion and Intellisense is good enough.
That said, I think there are 3 items that are important:
- Quickly grasp a new framework or a new language. People might expect you to do so because of AI's help. 2 weeks might be the maximum, instead of the minimum. The same for juniors.
- Focus on the real important things. So instead of trying to memorize a shell script you are going to use a couple of times per year, maybe use the time to learn something more fundamental. You can also use AI to help you to bootstrap the learning. If you need something for interviews, spend a week to memorize them.
- Be willing to exclude AI from your thought process. If you rely AI on everything, including algorithms and designs, this might impact your understanding.
I use LLMs for various purposes in day to day development. I don't use any of the tools mentioned in the article because I'm using intellij and don't want to replace a tool that has lots of stuff that I use all the time. But aside from that, it's good advice and matches my experience.
I've dabbled with plugins for intellij but wasn't really happy with those. But ever since chat gpt for desktop started interfacing directly with jetbrains products (and vs code as well), that's my goto tool. I realized that I like being able to pull that up with a simple keybinding and it auto connects to the IDE when I do. I don't need to replace my tools and I get to have AI support ready to go. Most of the existing plugins seem to insist on some crappy auto complete, which in a tool that offers a lot of auto complete features already is a bit of an anti feature. I don't need clippy style autocomplete.
What matters here is the tool integration, not the model quality. Better tool integration means better prompts with less work and getting better answers that way.
Example: I run a test, it fails with some output. I had this yesterday. So I asked, "why is this failing" and had a short discussion about what could be wrong. No need for me to specify any detail; all extracted from the IDE. We ticked off a few possible causes, I excluded them. And then it noticed a subtle change in the log messages that I had not noticed (a co-routine context switch) that turned out to be the root cause.
That kind of open ended debugging is a bit of a mixed bag. Sometimes it finds stuff. Mostly it just starts proposing solutions based on a poor analysis of the problem.
What works pretty reliably is:
- address the TODOs / FIXMEs, especially if you give it some examples of what you expect
- write documentation (very good for this)
- evaluate if I covered all the edge cases (often finds stuff I want to fix)
- simple code transformations (rewrite this using framework X instead of Y)
I don't trust it blindly. But it's generally giving me good code and feedback. And I get to outsource a lot of the boring crap.
I've been using Claude to help write a complex prototype for game dev. Overall it's been a big productivity boost. However as the project has grown Claude has gotten much worse. I'm nearing 15k lines and it's borderline more trouble than it's worth. Even when it was helpful, it needed a _lot_ of guidance from me. Almost more helpful as a "rubber ducky" and for the fact that it kept me from deadlocking on analysis. That said, discussing problems and solutions with Claude often does keep things moving and sometimes reveals unexpected solutions.
If Claude could write the code directly unsupervised, it would go wild and produce a ton of garbage. At least if the code it writes in the browser is any indication. It's not that it's all bad, but it's like a very eager junior dev -- potentially dangerous!
Imagining a codebase that is one or two orders of magnitude larger, I think Claude would be useless. Imagining a non-expert driving the process, I think Claude would generate a very rickety proof of concept then fall over. All that said, I wish I had this tool when developing my previous game. Especially for a green field project, it feels like having access to the internet versus pulling reference manuals -- a big force multiplier.
This diagram is painfully relatable — my team checks every single box here, and we don’t even use AI yet! Imagine when we finally do...
"Misunderstood requirements" and "overly complex implementations" are practically our mascots at this point. We're slowly untangling this chaos through better upfront convos and iterative reviews, but man, habits die hard. Anyone else navigating these pitfalls totally unaided by AI?
> I want to preface this by saying that AI tools are categorically and always bad at the things that I’m listing.
I think there's a not missing there. Why would you preface that they are categorically and always bad? Makes more sense the other way round.
Also grammar error "effected" instead of "affected" in the footer.
I actually think that most problems that took 5-20 mins are now a few minutes and it’s more about how many of those intense minutes and loops you’re going through.
Also, right now engineers are hyper optimized in the code aspects but not thinking about the context into cursor and context out of cursor.
Like the amount of copy paste from Notion / JIRA / Sentry and the amount of output like summarizing the git commits and PRs, slack and other “over communication” you have to do these days. This is the area I think we can more easily automate away.
I've been playing around with vibe coding and I think a lot of the issues brought up could be fixed by an architecture abstraction layer that does not exist today. My idea would be something like an architecture-graph (Archigraph working title) that would recursively describe how an application works or should work. Then when an agentic coder is doing a task they can easily see the bigger picture of how an application works and hopefully writing better code. Anyone interested in working on this with me?
I see jr developers and managers not able to make the assessment the authors insight and experience provide and just checking in gobs of stuff they don’t understand
It’s frustrating
Imo, AI implements something useful 20% of the time while breaking existing code 80% of the time.
> 15k LOC codebase
I wish articles about AI assistance would caveat this at the start. 15k LOC is a weekend hackathon project, which is all well and good, but not reflective of the work that 99% of developers are doing in their day jobs.
is this just me, or none of this happens when using LLMs/AIs/Agents while coding? my experiences is overwhelmingly useful and positive, hard to imagine building software without LLM support anymore. I am literally 5~10x more productive.
> supervised agent
This is the trick. Human in the loop, not human hiding in an ivory tower after uttering a single command. This is ~effectively what I see a lot of shops doing right now:
"Clean up the codebase please. Apply best practices :D. OH. By the way, heres a laundry list of 100 things to NOT do: <list begins>".
I get a lot more uplift out of use cases like:
"Please generate a custom Stream implementation that is read-only and sources bytes from an underlying chunked representation. Mock the chunk loading part. Primarily demonstrate the ReadAsync method and transition logic between chunks."
Does anyone post articles about using these LLMs on non web dev?
The internet is full of javascript/html/css info. Some wrong some obsolete some right and current, but there is data.
How about the more peasant languages?
I'm already seeing developers spending more time communicating with their AI than with their team. I don't think that's a good evolution. Many of us aren't the best communicators, but it's a skill we typically polish as we become more senior. I worry about what will happen to junior devs who spend more time talking to/pairing with AI tham their human coworkers.
The opposite of vibe coding, when the agent craps out and you just do it manually = Artisanal coding. Yeah I can get on board with that.
My central question after reading the article:
Why did they choose a circle diagram over a pyramid?
>Complicated build setups that confused both me and the AI itself.
I noticed that I am capable of producing software beyond my own understanding. It wouldn't surprise me if the same is true of AI!
Along these lines, one thing Claude Code does consistently is to see a failing test and then add a conditional in the actual code to satisfy the test.
I'm typically pretty gentle in real code reviews but that one is a serious "what the fuck are you even doing" if it were a human.
Adding a top-level context-rule in claude.md doesn't fix it reliably.
My solution to the authors babysitting problems is to close the iteration loop instead of embedding humans into it.
Agent generated broken code? An agent can discover that, provide feedback on the pull request, and close it, forcing the coding agent to respond to the feedback.
As long as you have 10 agents doing software engineering analysis for every 1 agent you have writing code, my suspicion is that a lot of this babysitting can be avoided.
At least theoretically.. I haven't got all of this infrastructure linked myself to try.
> During refactoring, it failed to recognize the existing dependency injection chain
Not sure if this is an argument against atheism or foxholes.
“Agentic” coding?
I suppose it is Thoughtworks after all working to expand mindshare by defining buzzwords.
I think of the modern developer being more like a shepherd rather than a builder now. You have to vibe with the machine, but you need to make sure they stay on easy terrain and give them structure instead of letting them simply free graze.
Big words from one who does not code.
This article does not line up with my experiences at all. Sometimes I wonder if it's something to do with prompting or model selection.
I recently built out a project where I was able to design 30+ modules and only had 4 generation errors. These were decent size modules of 700-5000 lines each. I would classify the generation errors as related to missing specification -- i.e., no you may not take an approach where you import another language runtime into memory to hack a solution.
Sure, in the past, AI would lead me on goose chases, produce bad code, or otherwise fail. AI in 2025 though? No. AI has solved many quirky or complex headscratchers, async and distributed runtime bugs, etc.
My error rate with Claude-3.7-sonnet and OpenAI's O3-mini has dropped to nearly zero.
I think part of this is how you transfer your expert knowledge into the AI's "mindspace".
I tend to prompt a paragraph which represents my requirements and constraints. Use this programming language. Cache in this way. Encrypt in this way. Prefer standard library. Use this or that algorithm. Search for the latest way to use this API and use it. Have this API surface. Etc. I'm not particularly verbose either.
The thinking models tend to unravel that into a checklist, which they then run through and write a module for. "Ok, the user wants me to create a module that has these 10 features with these constraints and using these libraries."
Maybe that's a matter of 25yrs of coding and being able to understand and describe the problem and all of its limits and constraints quickly but I find that I get one-shot success nearly every time.
I'm not only laying out the specification, but I also have the overall spec in my mind and limit the AI to building modules to my specifications (apis/etc) rather than trying to shove all of this into context. Maybe that is the issue that some people have. Trying to shove everything (prior versions of the same code, etc) into one session.
I always start brand new sessions for every core task or refactoring. "Let's add caching to this class that expires at X interval and is configurable from Y file and dependency injected to the constructor". So perhaps I'm unintentionally optimizing for AI but this fairly easy to do and has probably led to a 5-10x increase in code I'm pushing.
Huge caveat here though, I mostly operate on service/backend/core lib/api code which is far less convoluted than web front-ends.
It's kind of sad that front-end dev will require 100x context tokens due to intermingling of responsibilities, complex frameworks, etc. I don't envy people doing front-end dev work with AI.
Great, so now instead of spending 8 hours writing code, I spend 8 hours "steering" an AI to write the same code. What a fucking win (for the AI companies and no one else.)
One thing that struck me after reading this article as well as all the comments here - we are providing excellent free training for the next generation of agents coming up. We are literally training our replacements!
Like others I agree humans are not getting replaced anytime soon though. For all the current hype current AI technology is pretty dumb. Give it a decade or so though and everything we are currently doing will seem like Stone Age technology.
Aside: sometimes I really wonder if humanity is trying to automate itself out of existence.
[flagged]
I use Cursor for most of my development these days. This article aligns pretty closely with my experiences. A few additional observations:
1. Anecdotally, AI agents feel stuck somewhere circa ~2021. If I install newer packages, Claude will revert to outdated packages/implementations that were popular four years ago. This is incredibly frustrating to watch and correct for. Providing explicit instructions for which packages to use can mitigate the problem, but it doesn't solve it.
2. The unpredictability of these missteps makes them particularly challenging. A few months ago, I used Claude to "one-shot" a genuinely useful web app. It was fully featured and surprisingly polished. Alone, I think it would've taken a couple weeks or weekends to build. But, when I asked it to update the favicon using a provided file, it spun uselessly for an hour (I eventually did it myself in a couple minutes). A couple days ago, I tried to spin up another similarly scoped web app. After ~4 hours of agent wrangling I'm ready to ditch the code entirely.
3. This approach gives me the brazenness to pursue projects that I wouldn't have the time, expertise, or motivation to attempt otherwise. Lower friction is exciting, but building something meaningful is still hard. Producing a polished MVP still demands significant effort.
4. I keep thinking about The Tortoise and The Hare. Trusting the AI agent is tempting because progress initially feels so much faster. At the end of the day, though, I'm usually left with the feeling I'd have made more solid progress with slower, closer attention. When building by hand, I rarely find myself backtracking or scrapping entire approaches. With an AI-driven approach, I might move 10x faster but throw away ~70% of the work along the way.
> These experiences mean that by no stretch of my personal imagination will we have AI that writes 90% of our code autonomously in a year. Will it assist in writing 90% of the code? Maybe.
Spot on. Current environment feels like the self-driving car hype cycle. There have been a lot of bold promises (and genuine advances), but I don't see a world in the next 5 years where AI writes useful software by itself.