Historically, GPUs have been designed as monolithic dies with all of their functionality under one ‘roof.’ This hasn’t always been the case — the earliest GPUs sometimes used separate chips for specific functionality. Both AMD and Nvidia have, at various times, used different cores to provide support for additional monitors or to bridge connections between PCI Express and AGP.
As far as the core components of the GPU itself, however, those have been single-die affairs for a long time. That’s why it’s a tad surprising to see Nvidia is now evaluating the possibility of a multi-chip GPU that would communicate with other parts of the core, in something like an MCM (Multi-Chip-Module).
But monolithic GPU designs suffer from a number of problems. First, they can be reticle-busters, pushing the limits of what TSMC, GlobalFoundries, or Samsung can build into a single core. Harvesting good GPUs from bad ones can lead to problems when the manufacturer attempts to harvest good die, as happened with Nvidia and the GTX 970. (Long story short — the method NV used to recover parts for the GTX 970 also had an impact on the GPU’s memory bandwidth when accessing its last 512MB of RAM.) If GPUs were built in modules, then connected together on a common package, the resulting chip could theoretically be larger and more powerful than any single card.
In the authors’ study (available here) they believe they can surpass the performance of the largest buildable GPU by 44.5 percent, and come within 10 percent of a monolithic GPU die that surpasses any product currently buildable at any foundry.
Now, in an ideal situation, this approach could yield huge improvements to performance and even power consumption and TDP, since you wouldn’t have the entire GPU’s horsepower concentrated in such a small space. I would caution, however, against leaping to conclusions. The authors of the report acknowledge that this would require software that was NUMA (Non-Uniform Memory Access) compatible. There would be an inevitable performance hit when accessing data held in a different GPU or sharing information across multiple cores.
The flip side to this is that these kinds of performance impacts already happen when customers attempt to deploy arrays of GPUs. The performance penalties now are significantly harsher than what they’d be in an MCM.
How credible is this approach?
I don’t want to claim that Nvidia is preparing to roll out MCM-style GPUs right around the corner, but I’ll say this: It’s not crazy. The paper has more details on this, but the big-picture takeaway is that by giving each GPU block enough bandwidth, keeping latency low, and properly allocating cache resources, you can hit some significant performance targets. The trick, of course, is keeping all those resources properly balanced.
For those who would argue that this is just the same two-GPUs-on-one-piece-of-silicon that we’ve seen from both AMD and Nvidia for years, no, it truly isn’t. Nvidia isn’t contemplating sharing data across a silicon PCIe bridge; they’re talking about designing a GPU that’s built, from the ground up, to share workloads across multiple GPU modules. Better or worse, it’d be profoundly different from anything we’ve seen before.