One of the most awe-inspiring developments in chip making in recent years has been the chiplets and the stacking of those chiplets. The possibilities are, as they say, endless. AMD showed how to boost gaming frame rates by stacking more cache on a processor with the Ryzen 7000X3D CPUs (opens in new tab) at CES 2023, but it also had something just as impressive to the folks at the data center.
AMD uses its 3D chip stacking technology to combine a CPU and GPU into one absolutely gigantic chip: the AMD Instinct MI300.
It’s not just that this chip has both a CPU and a GPU. That’s not particularly remarkable these days – basically everything we think of as a CPU these days has a GPU integrated into it. AMD is also no stranger, having been making APUs for years – essentially chips with both CPU and GPU under one roof. That’s basically what AMD defines the MI300: an APU.
But what’s cool about AMD’s latest APU is the scale of the thing. It’s huge.
This is a data center accelerator that contains 146 billion transistors. That’s almost double the size of Nvidia’s AD102 GPU at 76.3 billion – the GPU in the RTX 4090. And this thing is huge. In fact, AMD CEO Dr. Lisa Su kept the MI300’s chip on stage at CES and whether it’s shrunk or this thing is the size of a proper stroopwafel. The cooling for this thing must be immense.
The chip provides a GPU that stems from AMD’s CDNA 3 architecture. That’s a version of the graphics architecture built just for computing performance: RDNA for gaming, CDNA for computing. That’s packed alongside a Zen 4 CPU and 128GB of HBM3 memory. The MI300 comes with nine 5nm chiplets on top of four 6nm chiplets. That suggests nine CPU or GPU chiplets (looks like six GPU chiplets and three CPU chiplets) on top of what is believed to be the base four-part die it’s all loaded on. Then there is memory at the edges of that. So far no details have been given about its actual composition, but we do know that it is all connected by a 4th generation Infinity interconnect architecture.
The idea is that if you load everything onto one package with fewer hoops that data has to jump through to move around, you get a very efficient product compared to one that might make a lot of calls to off-chip memory, making it all is delayed Processing. Compute at this level is all about bandwidth and efficiency, so this kind of approach makes a lot of sense. It’s the same kind of principle as something like AMD’s Infinity Cache on its RDNA 2 and 3 GPUs. Having more data at hand means you don’t have to travel as far away for relevant data and you can keep frame rates high.
But there are a few reasons why we don’t have an MI300 style accelerator for gaming. First, reasonable expectations for most gamers’ budgets would suggest that we can’t afford whatever AMD wants to charge for an MI300. It’s going to be a lot. Similarly, no one knows how to program a game to see multiple compute chips on a gaming GPU as a single entity without specific coding for it. We’ve been there with SLI and CrossFire, and it didn’t end well.
But yes, the MI300 is extremely large and extremely powerful. AMD touts an eight-fold improvement in AI performance over its own Instinct MI250X accelerator. And if you’re wondering, the MI250X is also a multi-chiplet monster with 58B transistors, but that once-impressive number now seems a bit small. In short, that wasn’t an easy chip to beat, but the MI300 was and it’s also five times more efficient than AMD’s make.
The MI300 is coming in the second half of the year, so still a long way off. That said, it’s more something to marvel at than to actually splash out on. Unless you work in a data center or work in AI and have access to mega dollars.