MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability

Discussion in 'Frontpage news' started by angelgraves13, Jul 4, 2017.

  1. angelgraves13

    angelgraves13 Ancient Guru

    Messages:
    2,218
    Likes Received:
    657
    GPU:
    RTX 2080 Ti FE
  2. Spets

    Spets Ancient Guru

    Messages:
    3,059
    Likes Received:
    155
    GPU:
    RTX 3090
    Thanks for the link, it's a great read. Looking forward to seeing what they achieve with their first commercial Multi-Chip-Module GPU.
     
  3. volkov956

    volkov956 Ancient Guru

    Messages:
    6,118
    Likes Received:
    10
    GPU:
    GTX 1080 TI
    both companies need to mask the amount of gpus from the OS driver level so the system only sees 1 and the onboard bios of the gpu decides out how the gpu dishes out the utilization otherwise we will be stuck waiting and hoping the developers figure it out

    Same goes for CPU i really want to find the documents on this it was discussed way back in mid 2000's how its possible but no one wants to do it..

    and from what I can find it has been done back in the earlier days aka voodoo and someother company forget which one where os and drivers only seen it as 1
     
    Last edited: Jul 5, 2017
  4. LesserHellspawn

    LesserHellspawn Master Guru

    Messages:
    657
    Likes Received:
    12
    GPU:
    2x GTX980ti
    Hello Voodoo 5 ?
     

  5. drac

    drac Ancient Guru

    Messages:
    1,758
    Likes Received:
    33
    GPU:
    Strix RTX2080 Ti OC
    Good stuff, no more masses of video cards crammed into cases overheating all over the place.

    This could be soo good, SLI in the one card done properly, but I guess it hasn't happend due to the current tech limitations.
     
  6. Evildead666

    Evildead666 Maha Guru

    Messages:
    1,302
    Likes Received:
    273
    GPU:
    Vega64/EKWB/Noctua
    yay for a Voodoo 5 :)

    This is understandable, especially what with the Fab Processes being late, and farther and farther apart.

    If you can't rely on reducing the size of the chips/transistors, the you have to go MCM, or Multi-chip-on-board a la Voodoo, for the High end cards.

    even if its only for the datacenter GPU cards at first (they can pay for the R&D with the prices the cards sell for) and then the consumer will get a trickle down effect later on.

    I suspect it hasn't been done lately, because they didn't plan to, officially.
    It also means dedicating on-die space to something that won't be used in Single chip boards, which would be wasted.

    I could see a medium sized GPU with a 128bit GDDR/X bus, or a Single HBM Stack (either 512bit or 1024bit wide) and adding up to four of them together on a single card.

    The HBCC might be a good fit for this, since there would be one per chip, and one of them could become 'master' to the others, and give them orders.
    NVLink is probably set up for this as well.
     
  7. Exascale

    Exascale Banned

    Messages:
    397
    Likes Received:
    8
    GPU:
    Gigabyte G1 1070
    This isnt a replacement for SLI and not even something for consumer GPUs for a long time(when the performance requirements dictate that a monolithic die is too expensive). That should be a ways off still, and considering that Nvidias monolithic V100 sells for $13,000 dont expect these to be cheap. It may reduce the cost of the individual dies and make binning easier, but the addition of all the interconnects and SRAM for the L1.5 cache will still make these expensive.

    Its a small NUMA setup for a GPU that uses L1.5 cache to get around some of the issues involved with making NUMA architectures.

    This GPM(graphics processing module) approach is destined to be used i n Nvidias exascale architecture, and the Volta V100 successor chip will likely be such an MCM.

    Intel discussed a similar idea a year or two ago regarding the Knights Hill architecture, which follows the 72 core Knights Hill HPC focused x86 CPU.

    https://www.nextplatform.com/2015/08/03/future-systems-intel-ponders-breaking-up-the-cpu/

    This is the next step in 2.5D architectures. Nvidias approach discusses how to solve data locality issues and reduce the pJ/bit cost of moving data with their L1.5 cache. I need to read up on Infinity Fabric and HBCC to see if it has any similar provisions. If it doesnt now, it certainly will need them for large scale systems with hundreds of thousands or millions of cores.
     
    Last edited: Jul 5, 2017
  8. nevcairiel

    nevcairiel Master Guru

    Messages:
    737
    Likes Received:
    283
    GPU:
    MSI 1080 Gaming X
    If its just another form of SLI then it'll actually suck quite bad. SLI support in games has been terrible and will not get much better.

    If they want to pull this off, they need to find a way to utilize all the GPUs on all workloads, if anything just runs on one of the modules because its not flexible enough, its going to be an extremely disappointing experience.
     
  9. Exascale

    Exascale Banned

    Messages:
    397
    Likes Received:
    8
    GPU:
    Gigabyte G1 1070
    Its not SLI at all. Its NUMA.
     
  10. Craigpd

    Craigpd Member

    Messages:
    25
    Likes Received:
    7
    GPU:
    Raedeon HD7950 3GB
    OK, so nVidia published a paper out lining the theory and application behind the use of MCM in a GPU.
    However, having an interconnect that can supply enough bandwidth without large latency hits is a different matter. AMD got very lucky with IF, but will Intel and nVidia be able to replicate results without hitting on AMD's patents related to IF? If they can't, their only option could be to lease the technology from AMD, assuming AMD are game for giving up their ace up their sleeve.
     

  11. yasamoka

    yasamoka Ancient Guru

    Messages:
    4,838
    Likes Received:
    237
    GPU:
    EVGA GTX 1080Ti SC
    This isn't feasible now. Workloads are more complex and that's why it's up to the developer to manage their resource allocation, not drivers. Mantle, DX12, and Vulkan all went in this direction not for naught. Rendering engines are becoming more and more involved, and often you have to work preemptively, such as loading textures before they're used when you're near the borders of a new area. This is not something the driver can just guess unless the game exposes the notion of an area whereby the driver is configured to load textures for nearby areas automatically - think of an infinite combination of situations that the driver would then have to cater for. If each GPU die comes with a memory controller and its own VRAM, then choosing to preload the textures into one or both pools at a specific time when the bus is idle or capable is vital.

    Nvidia works with developers, and AMD is starting to. We have huge game engines with tons of backend support, such as Unreal, Frostbite, Cryengine, etc... so if this pattern resumes we're good to go.
     
  12. nevcairiel

    nevcairiel Master Guru

    Messages:
    737
    Likes Received:
    283
    GPU:
    MSI 1080 Gaming X
    NUMA just defines memory access patterns, it has nothing to do with how the actual work is spread over the processors.
     
  13. Venix

    Venix Maha Guru

    Messages:
    1,407
    Likes Received:
    525
    GPU:
    Palit 1060 6gb
    none expect em to be cheap although it's much much much easier to produce 2x250 chips than a massive 500mm chip and if the 2x250 give similar or even just close performance it will be much much cheaper than the massive chips ... you kind of can see that with the intel cpu's now the 20 and 18 core parts are extremely expensive sure intel charge extra for the bragging rights but also those are hard to produce when you get a full working 20 core xeon of em you pretty much get the best of the best silicon they can produce the rest are 10~12~14....18 core chips same goes for the gpus really
    all this of course if they make it so the system see em as a single entity
     
  14. Evildead666

    Evildead666 Maha Guru

    Messages:
    1,302
    Likes Received:
    273
    GPU:
    Vega64/EKWB/Noctua
    I'm pretty sure Infinity Fabric uses PCIe lanes for communication, maybe it can use other transports as well.

    Between the CPU's on an Epyc chip, there are 64PCIe lanes going in between each cpu, if i read the slides correctly.
    They can cut the latencies due to the short hops in between the on-chip cpu's, and the bandwidth should be plenty.

    A GPU that has 2x16 PCIe lanes, could use the second set for intra-GPU signalling. Ideally you'd want 4 sets, like the North/South/East/West links on those DEC Alpha chips. That way, each GPU Die would be only one hop from any other, up to a certain number of dies.
     
    Last edited: Jul 5, 2017
  15. drac

    drac Ancient Guru

    Messages:
    1,758
    Likes Received:
    33
    GPU:
    Strix RTX2080 Ti OC
    Just looked like it it could be potentially used to make it better (SLI), at least I was just hoping that lol. I didn't read in-depth about the architecture, was just a generalised observation really.
     

  16. Denial

    Denial Ancient Guru

    Messages:
    13,155
    Likes Received:
    2,648
    GPU:
    EVGA RTX 3080
    They explain all this in the PDF in the article.

     
  17. Exascale

    Exascale Banned

    Messages:
    397
    Likes Received:
    8
    GPU:
    Gigabyte G1 1070
    I know but the technology is about distributing a workload on a MCM GPU, which effectively turns a GPU into a small NUMA node. The new intra chip interconnect scales much better than SLI because it uses the L1.5 cache to avoid unnecessary communication between the L1 cache and "far" memory on a different chip within the GPM.

    Its designed to make the GPU chiplets and their RAM communicate effectively within a GPM. NVLink SLI would be nice.
     
    Last edited: Jul 5, 2017
  18. DeskStar

    DeskStar Maha Guru

    Messages:
    1,068
    Likes Received:
    151
    GPU:
    EVGA 2080Ti FTW3 HC
    Pretty interesting.... Couldn't resist on the 3DFX Voodoo 5 5500 AGP picture as I had that card years ago!!! I still remember my ATI x1800PE dying on me and having to slap in that good'ol 3DFX Voodoo 5 5500 AGP just to have a display adapter.

    Damn thing ran Halflife 2 at 800p resolution most setting at max...

    Just as long as the tech doesn't take a long time to come around I'm on board....
     
  19. MorganX

    MorganX Member Guru

    Messages:
    131
    Likes Received:
    12
    GPU:
    Red Devil Vega 64
    Given AMD's size and limited resources, it's amazing they're the lead on several fronts. I can only imagine if they had Intel and Nvidias resources. Good stuff all around. Amazing what stiff competition can do to light a fire under behometh's behinds.
     
  20. JamesSneed

    JamesSneed Maha Guru

    Messages:
    1,083
    Likes Received:
    449
    GPU:
    GTX 1070
    I assume Nvidia thinks AMD's Navi will be a bit hit since they are also moving the same direction. To me Vega is pretty boring but Navi using the IF along with a die shrink looks pretty interesting.
     

Share This Page