I'm not sure its easy to say one is better than the other. I've used ChatGPT pro, it's good. I've also use Gemini, and it's also good. Claude is surprisingly good as well. And I've recently been using Q-cli, which was extremely easy to get integrated into my Neovim/Tmux workflow.
Purely from a code quality perspective, they're all about the same, and they all generate code that rarely works for the first time. At least from my experience, and highly depending on language. For instance, Q-cli with Rust seems to generate better output for me than Gemini with Rust. And ChatGPT with JS gives me way better code than Claude with JS.
I honestly think that currently in the market, it's not really a choice of which is better, but which is the right tool for workflow and language.
It’s tricky. o3 is better (usually) but much much lazier IME. You probably have to pay for pro.
O3 is far ahead of the competition.
I've been using Gemini 2.5 Pro, Claude 3.7 Sonnet, and GPT-4.1 recently and here are my thoughts.
Regarding context windows, Gemini currently offers 1M tokens (reportedly increasing to 2M soon), GPT-4.1 also handles a large window of 1m tokens, and Claude provides 200k. In my experience testing them with large code files (around 3-4k lines), I found Gemini 2.5 Pro and Claude 3.7 Sonnet performed quite similarly, both handling the large context well and providing good solutions.
However, my impression was that GPT-4.1 didn't perform quite as well, While GPT-4.1 is certainly capable, I feel Gemini has a slight edge in this area right now. Based on this, I'd lean towards using Gemini 2.5 Pro for extremely large contexts needing high-quality results, GPT-4.1 for backend logic, and found Claude 3.7 particularly effective for UI interface tasks.