I want to argue that graphics cards are really 3 markets: integrated, gaming (dedicated), and compute. Not only do these have different hardware (fixed function, ray tracing cores, etc.) but also different programming and (importantly) distribution models. NVIDIA went from 2 to 3. Intel went from 1 to 2, and bought 3 (trying to merge). AMD started with 2 and went to 1 (around Llano) and attempted the same thing as NVIDIA via GCN (please correct me if I'm wrong).
My understanding is that the reason is that the real market for 3 (GPUs for compute) didn't show up until very late, so AMD's GCN bet didn't pay off. Even in 2021, NVIDIA's revenue from gaming was above data center revenue (a segment they basically had no competition in, and 100% of their revenue was from CUDA). AMD meanwhile won the battle for Playstation and Xbox consoles, and was executing a turnaround in data centers with EPYC and CPUs (with Zen). So my guess as to why they might have underinvested is basically: for much of the 2010s they were just trying to survive, so they focused on battles they could win that would bring them revenue.
This high level prioritization would explain a lot of "misexecution", e.g. if they underhired for ROCm, or prioritized APU SDK experience over data center, their testing philosophy ("does this game work ok? great").
They likely haven't put even close to enough money behind it. This isn't a unique situation - you'll see in corporate america a lot of CEOs who say "we are investing in X" and they really believe they are. But the required size is billions (like, hundreds of really insanely talented engineers being paid 500k-1m, lead by a few being paid $3-10m), and they are instead investing low 10's of millions.
They can't bring themselves to put so much money into it that it would be an obvious fail if it didn't work.
CUDA isn't the moat people think it is. NVIDIA absolutely has the best dev ergonomics for machine learning, there's no question about that. Their driver is also far more stable than AMD's. But AMD is also improving, they've made some significant strides over the last 12-18 months.
But I think more importantly, what is often missed in this analysis is that most programmers doing ML work aren't writing their own custom kernels. They're just using pytorch (or maybe something even more abstracted/multi-backend like keras 3.x) and let the library deal with implementation details related to their GPU.
That doesn't mean there aren't footguns in that particular land of abstraction, but the delta between the two providers is not nearly as stark as its often portrayed. At least not for the average programmer working with ML tooling.
(EDIT: also worth noting that the work being done in the MLIR project has a role to play in closing the gap as well for similar reasons)
Back in 2015, they were a quarter or two from bankruptcy, saved by the XBOX and Playstation contracts. Those years saw several significant layoffs, and talent leaving for greener pastures. Lisa Su has done a great job at rebuilding the company. But not in a position to hire 2000 engineers x few million comp (~$4 billion annually) even if there were people readily available.
"it'd still be a good investment." - that's definitely not a sure thing. Su isn't a risk taker, seems to prefer incremental growth, mainly focused on the CPU side.
CUDA is an entire ecosystem - not a single programming language extension (C++) or a single library, but a collection of libraries & tools for specific use cases and optimizations (cuDNN, CUTLASS, cuBLAS, NCCL, etc.). There is also tooling support that Nvidia provides, such as profilers, etc. Many of the libraries build on other libraries. Even if AMD had the decent, reliable language extensions for general-purpose GPU programming, they still don't have the libraries and the supporting ecosystem to provide anything to the level that CUDA provides today, which is a decade plus of development effort from Nvidia to build.
I can't contribute much to this discussion due to bias and NDAs, but I just wanted to mention, technically HIP is our CUDA competitor. ROCm is the foundation that HIP is being built on.
AMD have actually made several attempts at it.
The first time, they went ahead and killed off their effort to consolidate on OpenCL. OpenCL went terribly (in no small part because NVIDIA held out on OpenCL 2 support) and that set AMD back a long ways.
Beyond that, AMD does not have a strong software division or one with the teeth to really influence hardware to their needs . They have great engineers but leadership doesn’t know how to get them to where they need to be.
NVIDIA does GPUs and software. Intel does CPUs and software. AMD does GPUs and CPUs.
The idea that CUDA is the main reason behind Nvidia dominance seems strange to me. If most of the money is coming from Facebook and Microsoft they have their own teams writing code at a lower level than CUDA anyway. Even deepseek was writing stuff lower than that.
> The delta between NVIDIA's value and AMD's is bigger than the annual GDP of Spain.
Nvidia is massively overvalued right now. AI has rocketed them into absolute absurdity, and it's not sustainable. Put aside the actual technology for a second and realize that public image of AI is at rock bottom. Every single time a company puts out AI-generated materials, they receive immense public backlash. That's not going away any time soon and it's only likely to get worse.
Speaking as someone that's not even remotely anti-AI, I wouldn't touch the shit with a 10 foot pole because of how bad the public image is. The moment that capital realizes this, that bubble is going to pop and it's going to pop hard.
I don't think its that bad. The focus will turn to inference going forward and that eventually means a place for AMD and maybe even Intel. Eventually it will be all about the efficiency of inference in watts.
That switch will reduce the NVIDIA margins by a lot. NVIDIA probably has 2 years left of being the only one with golden shovels.
AMD was investing in a drop-in CUDA compatibility layer & cross-compiler!
Perhaps in keeping with the broader thread here, they had only ever funded a single contract developer working on it, and then discontinued the project (for who-knows-what legal or political reasons). But the developer had specified that he could open-source the pre-AMD state if the contract was dissolved, and he did exactly that! The project is active with an actively contributing community, and is rapidly catching up to where it was.
https://www.phoronix.com/review/radeon-cuda-zluda
https://vosen.github.io/ZLUDA/blog/zludas-third-life/
https://vosen.github.io/ZLUDA/blog/zluda-update-q4-2024/
IMO it's vital that even if NVIDIA's future falters in some way, the (likely) collective millennia of research built on top of CUDA will continue to have a path forward on other constantly-improving hardware.
It's frustrating that AMD will benefit from this without contributing - but given the entire context of this thread, maybe it's best that they aren't actively managing the thing that gives their product a future!
There's a massive amount of effort in https://github.com/orgs/ROCm/repositories?type=all
Throwing a vast amount of effort at something isn't sufficient.
The answer is in the question, because if they had the foresight to do such a thing the tech would already be here, instead they thought 1 dimensionally about their product, were part of the group that fumbled OpenCL and now they're a decade behind playing catch up.
If AMD developers use AI deployed on nvidia hardware to create tools that complete against nvidia as a company but overall improves outcomes because of competition, would this be an example of co evolution observable in human time standards... I feel like ai is evolving, taking a stable form in this complex multi dimension multi paradigm sweet spot of an environment we have created, on top of this technical, social and governmental infrastructure and we're watching it live on discovery tech filtered into a 2d video narrated by some idiot who has no right to be as confident as he sounds. I'm sorry I'm on withdrawal from quitting mass media and I'm very bored.
This is the best article on why OpenCL failed:
https://www.modular.com/blog/democratizing-ai-compute-part-5...
Tinycorp (owned by George Hotz, also behind Comma.ai) is working on it after AMD finally understood that it was a no-brainer: https://geohot.github.io/blog/jekyll/update/2025/03/08/AMD-Y... Exciting times ahead!
I've been telling people for years that NVIDIA is actually a software company, but nobody ever listens. My argument is that their silicon is nothing special and could easily be replicated by others, and therefore their real value is in their driver+CUDA layer.
(Maybe "nothing special" is a little bit strong, but as a chip designer I've never seen the actual NVIDIA chips as all that much of a moat. What makes it hard to find alternatives to NVIDIA is their driver and CUDA stack.)
Curious to hear others' opinions on this.
Another possible reason might be outreach. NVIDIA spends big money on getting people to use their products. I have worked at two HPC centers and at both we had NVIDIA employees stationed there, whose job it was to help us get the most out of the hardware. Besides that, they also organize Hackatrons and they have dedicated software developer programs for each common application, be it LLMs, Weather Prediction or Protein Folding, not to mention dedicated libraries for pretty much every domain.
CUDA is over a decade of investment. I left CUDA toolkit team in 2014 and it was probably around 10 years old back then. Can't build something comparable fast.
Nobody has mentioned this, but https://docs.scale-lang.com/ is doing some amazing work on this front. Take CUDA code, compile it and output a binary that runs on AMD. Michael and his team working on this are brilliant engineers.
The problem is the hardware not the software, and specifically not CUDA. Triton for example writes PTX directly (a level below CUDA). Trying to copy Nvidia hardware exactly means you will always be a generation behind, so they are forced to try and guess what different direction to take that will be useful.
So far those guesses haven't worked out (not surprising as they have no specific ML expertise and are not partnered with any frontier lab), and no amount of papering over with software will help.
That said I'm hopeful the rise of reasoning models can help, no one wants to bet the farm on their untested clusters but buying some chips for inference is much safer.
If it's a question about entrenched corporate dysfunction, I can't answer it. Most people's answers are wild guesses at best.
If it's a question of first principles, there is a small glimmer of hope in a company called tinygrad making the attempt - https://geohot.github.io//blog/jekyll/update/2025/03/08/AMD-...
If the current 1:16 AMD:NVIDIA stock value difference is entirely due to the CUDA moat, you might make some money if the tide turns. But who can say…
There's OpenCL https://en.wikipedia.org/wiki/OpenCL - this BTW, also runs on NVIDIA GPUs too...
OpenCL is completely open (source) and so why wouldn't we, all of us, throw our weight behind OpenCL.
(no, I have no connection with them and have nothing to do with them, other than having learned a bit).
In turn I will raise you the following: Why are GPU ISA trade secrets at all? Why not open them up like CPU ISAs, get rid of specialized cores and let compiler writers port their favorite languages to compile into native GPU programs? Everyone will be happy. Game devs will be happy with more control over the hardware, Compiler devs will be happy to run haskell or prolog natively on GPUs, ML devs will be happier, NVIDIA/AMD will be happier with taking the MainStage.
Ya'll should have come to Beyond CUDA.
https://www.linkedin.com/posts/jtatarchuk_beyondcuda-democra...
I made a post here a while back suggesting an investment strategy of spending one billion on AMD shares and one billion on software developers to 3rd party write a quality support stack for their hardware. I'm still not sure if its a crazy idea.
Actually it might be better to spend 1B on shares and 10x 100M on development and take ten attempts in parallel and use the best of them.
HIP is definitely a viable option. In fact with some effort you can port large CUDA projects to be compilable with the HIP/ AMD-clang toolchain. This way you don’t have to rewrite the world from scratch in a new language but still be able to run GPU workloads on AMD hardware.
Is this why the stock is in the toilet for years now? It seems it missed the AI bubble, at least from an investor standpoint, like there's tremendous skepticism.
Maybe this is an overly cynical response but the answer is simply that they cannot (at least not immediately). They have not invested enough into engineering talent with this specific goal in mind.
Leadership. At the end of the day, the buck stops with leadership.
If they wanted to prioritize this, they would. They're simply not taking it seriously.
What about Rust for GPU programming? I wonder why AMD doesn't back such kind of effort as an alternative.
HIP is now somewhat viable (and ROCm is now all HIP).
But — too late. First versions of ROCm were terrible. Too much boilerplate. 1200 lines of template-heavy C++ for a simple FFT. Can't just start hacking around.
Since then, the CUDA way is cemented in minds of developers. Intel now has oneAPI, and it is not too bad, and hackable, but there is no hardware and no one will learn it. And HIP is "CUDA-like", so why not CUDA, unless you _have to_ use AMD hardware.
Tl;dr first versions of ROCm were bad. Now they are better, but it is too late.
AMD's CEO is the cousin of Nvidias CEO.
Neither will encroach too much on the others turf. The two companies don't want to directly compete on the things that really drive the share price.
Also why haven’t MS released a decent set of ML-tooling for TypeScript?
[dead]
[dead]
Because they dont want nvidia to be in control of their own development efforts.
There is more than one way to answer this.
They have made an alternative to the CUDA language with HIP, which can do most of the things the CUDA language can.
You could say that they haven't released supporting libraries like cuDNN, but they are making progress on this with AiTer for example.
You could say that they have fragmented their efforts across too many different paradigms but I don't think this is it because Nvidia also support a lot of different programming models.
I think the reason is that they have not prioritised support for ROCm across all of their products. There are too many different architectures with varying levels of support. This isn't just historical. There is no ROCm support for their latest AI Max 395 APU. There is no nice cross architecture ISA like PTX. The drivers are buggy. It's just all a pain to use. And for that reason "the community" doesn't really want to use it, and so it's a second class citizen.
This is a management and leadership problem. They need to make using their hardware easy. They need to support all of their hardware. They need to fix their driver bugs.