HNPWA with Next.js

The PS3 Licked the Many Cookie

I remember trying to learn Cell programming in 2006 using IBM’s own SDK (possibly different and less polished compared to whatever Sony shipped to licensed PS3 developers).

I had already spent a few years writing fragment shaders, OpenGL, and CPU vector extension code for 2D graphics acceleration, so I thought I’d have a pretty good handle on how to approach this new model of parallel programming. But trying to do anything with the SDK was just a pain. There were separate incompatible gcc toolchains for the different cores, separate vector extensions, a myriad of programming models with no clear guidance on anything… And the non-gcc tools were some hideous pile of Tcl/TK GUI scripts with a hundred buttons on the screen.

It really made me appreciate how good I’d had it with Xcode and Visual Studio. I gave up on Cell after a day.

pavlov | 4 days ago

> It is important to understand why the PS3 failed

That's a weird assertion for a console that sold 87M units, ranks #8 in the all-time top-selling consoles list, and marginally outsold Xbox360 which is compared against in TFA.

See: https://en.wikipedia.org/wiki/List_of_best-selling_game_cons...

m000 | 4 days ago

With an SPU's 256K local memory and DMA, the ideal way to use the SPU was to split the local memory into 6 sections: code, local variables, DMA in, input, output, DMA out. That way you could have async DMA in parallel in both directions while you transform your inputs to your outputs. That meant your working space was even smaller...

Async DMA is important because the latency of a DMA operation is 500 cycles! But, then you remember that the latency of the CPU missing cache is also 500 cycles... And, gameplay code misses cache like it was a childhood pet. So, in theory you just need to relax and get it working any way possible and it will still be a huge win. Some people even implemented pointer wrappers with software-managed caches.

500 cycles sounds like a lot. But, remember that the PS2 ran at 300MHz (and had a 50 cycle mem latency) while the PS3 and 360 both ran at 3.2Ghz (and both had a mem latency of 500 cycles). Both systems pushed the clock rate much higher than PCs at the time. But, to do so, "niceties" like out-of-order execution were sacrificed. A fixed ping-pong hyperthreading should be good enough to cover up half of the stall latency, right?

Unfortunately, for most games the SPUs ended up needing to be devoted full time to making up for the weakness of the GPU (pretty much a GeForce 7600 GT). Full screen post processing was an obvious target. But, also the vertex shaders of the GPU needed a lot of CPU work to set them up. Moving that work to the SPUs freed up a lot of time for the gameplay code.

corysama | 4 days ago

The Xbox worked as a proof-of-concept to show that you could build a console with commodity hardware. The Xbox 360 doubled down on this while the PS3 tried to do clever things with an innovative architecture. Between the two, it was clear commodity hardware was the path forward.

dehrmann | 4 days ago

Not a game developer, but I wrote a bunch of code specifically for the CELL processor for grad school at the time (and tested it on my PS3 at home - marking the first and last time I was able to convince my wife I needed a video game system "for real work"). It was fun to play with, but I can empathize with the time cost aspect: scheduling and optimizing DMA and SPE compute tasks just took a good bit of platform specific work.

I suspect a major point killing off special architectures like the PS3 was the desire of game companies to port their games to other platforms such as the PC. Porting to/from the PS3 would be rather painful if you were trying to fully leverage the power and programming model of the CELL CPU.

thadt | 4 days ago

> I used to think that PS3 set back Many-Core for decades, now I wonder if it simply killed it forever.

Did general purpose CPUs not kind of subsume this role? Modern CPUs have 16 cores, and server oriented ones can have many, many more than that

rokkamokka | 4 days ago

> 256 MB was dedicated to graphics and only had REDACTED Mb/s access from the CPU

I wonder what the REDACTED piece means here, aren't the PS3 hardware specifications pretty open? Per Copetti, the RSX memory had a theoretical bandwidth of 20.8 GB/s, though that doesn't indicate how fast the CPU can access it.

accrual | 4 days ago

Sony was funny in this way.

PS1: Easy to develop for and max out. PS2: Hard to develop for and hard to max out. PS3: Even harder than PS2. PS4: Back to easier. PS5: Just more PS4. PS5 PRO: Just more PS5.

christkv | 4 days ago

> Most code and algorithms cannot be trivially ported to the SPE.

Having never worked on SPE coding, but having heard lots about interesting aspects, like manual cache management, I was very interested to read more.

> C++ virtual functions and methods will not work out of the box. C++ encourages dynamic allocation of objects but these can point to anywhere in main memory. You would need to map pointer addresses from PPE to SPE to even attempt running a normal c++ program on the SPE.

Ah. These are schoolboy errors in games programming (comparing even with the previous 2 generations of the same system).

I think the entire industry shifted away from teaching/learning/knowing/implementing those practices de rigeur, so I'm absolutely not criticising the OP -- I was taught the same way around this time.

But my reading of the article is now that it highlights a then-building and now-ubiquitous software industry failing, almost as much as a hardware issue (the PS3 did have issues, even if you were allocating in a structured way and not trying to run virtual functions on SPEs).

dundarious | 4 days ago

I hope that as RISC-V gains in support, there is a chance to experiment with a many-core version of it. Something like a hundred QERV cores on a chip. The lack of patents is a key enabler, and support for the ISA on more vanilla chips is the other enabler. This could happen.

https://github.com/olofk/qerv

The only pratical many-core I know of was the SPARC T-1000 series https://en.wikipedia.org/wiki/SPARC_T_series

Pet_Ant | 4 days ago

Thanks so much, Peter, for writing this up. I think it adds a lot to the record about what exactly happened with the Cell. And, as with Larrabee, I have to wonder, what would an alternative universe look like if Sony had executed well? Or is the idea so ill-fated that no Cell-like many-core design could ever succeed?

raphlinus | 3 days ago

I feel like calling the PS3 a licked cookie is unfair.

>The original design was approximately 4 Cell processors with high frequencies. Perhaps massaging this design would have led to very homogenous high performance Many-Core architecture. At more than 1 TFlop of general purpose compute it would have been a beast and not a gnarly beast but a sleek smooth uniform tiger.

That's great and all, but the PS3 cost (famously) FIVE HUNDRED AND NINETY NINE US DOLLARS (roughly $900 dollars in todays money).

However one thing I noticed is that multi-core programming in 2006 was absolutely anemic. Granted I was too young to actually understand what was happening at the time, but a couple years ago I went in on a deep dive on the Cell and one thing I came away with was proper parallelism was in its infancy for mainstream development. Forget the Cell, it took a long time for game developers to take advantage of quad core PCs.

Developers were afraid of threads, didn't understand memory barriers and were cautious of mutexes. Gaben has a famous clip trashing the PS3 because most of valve's developers at the time did not have the experience programming multicore systems. It was common to just have objective-based threads (ex, Render thread, AI thread, Physics thread), and pretend coordination didn't exist for large parts of the code. This mostly worked up until you had more cores than threads. This stands in stark contrast to most parallel thread today that does userspace scheduling with tasks or threads.

Even Naughty Dog eventually figured out late in the cycle to best take advantage of SPEs using fibers with a system that looks like modern async reactors (like node or tokio) if you squint really, really hard.

The Cell was just really really early. Looking back I don't think the Cell was half-baked. It was the best that could be done at the time. Even if the hardware was fully baked, there was still 5-10 years of software engineering research before most people had the tooling to tae advantage of parallel hardware.

nemothekid | 4 days ago

I still vividly remember when they announced the $599 price tag, inflation adjusted that would be almost $1,000 today! It was crazy

haunter | 4 days ago

"I want a good parallel computer" is a good side article.

NitroPython | 4 days ago

The PS3 failed?

chadhutchins10 | 4 days ago

Yeah the xb360 had better graphics and I couldn't play on a PS3 because of the bad quality. PS3 graphics hurt my eyes too.

b8 | 4 days ago