AMD Radeon (big NAVI) again rumored to get 80 Compute units

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Aug 3, 2020.

  1. cucaulay malkin

    cucaulay malkin Ancient Guru

    Messages:
    9,236
    Likes Received:
    5,208
    GPU:
    AD102/Navi21
    now that is the dissonance I'm asking about
    Fedulid says some stuff that sounds amazing and is supposedly way ahead of its time,but gets angry with me for asking how it works and specifics.
    is it gonna happen ? cause I'm buying a gpu this fall,sold my 2070S,should I wait for this IF multi GPU from AMD ?

    Is it for servers ? is it for consoles ? this gen or next gen ? I can't for the love of god decipher what he is implying
     
  2. Denial

    Denial Ancient Guru

    Messages:
    14,206
    Likes Received:
    4,118
    GPU:
    EVGA RTX 3080
    The research says no:

    https://research.nvidia.com/sites/default/files/publications/ISCA_2017_MCMGPU.pdf

    Essentially in order for MCM GPUs to be a thing, you either A. Need an absolutely massive amount of bandwidth between chiplets, like frack tons more than any interconnect currently even comes close to offering.

    Or B. you need to radically change the way you do scheduling on the GPU and still have a metric ton of bandwidth between chiplets, but less than option A.

    With either option Nvidia's research shows this works great for certain workloads - like in the compute space where minimal data needs to be shared across chiplets, but not so great in gaming workloads, where data does need to be shared fairly frequently (especially with the way most shaders work).

    The maintake away though is that it's a scheduling nightmare, like hundreds of times more difficult than a CPU, even if it's done perfectly it wouldn't get perfect scaling and would probably require a massive rewrite of drivers, architecture (especially the front end), etc. So more than likely when this technology does inevitably show up, it will be in GPUs like AMD's CDNA or Nvidia's Tesla chips (that are massive chips where MCM would significantly cut production costs), and are mostly used for compute and it will probably start off with specific workloads.. like for example breaking the tensor cores off to a separate chiplet, working on that first, etc.
     
    Last edited: Aug 5, 2020
  3. Fediuld

    Fediuld Master Guru

    Messages:
    773
    Likes Received:
    452
    GPU:
    AMD 5700XT AE
    AMD published this just 5 months ago. The middle image is this year.

    [​IMG]

    And another article about this

    https://www.tomshardware.com/news/amd-infinity-fabric-cpu-to-gpu

    This is not some far into the future tech. This tech is out this year and next year. :)
    AMD is pushing the technology fast and hard, had proven it the last 3 years and caught it's competitors with their pants down.

    And this it not only applicable for the servers but for the normal consumers and we see it from the signatures dropped into the Linux Kernel
     
  4. cucaulay malkin

    cucaulay malkin Ancient Guru

    Messages:
    9,236
    Likes Received:
    5,208
    GPU:
    AD102/Navi21
    isn't this for servers ?
    and doesn't nvidia have something similar since volta in 2017 ? server solutions are for serves I think.
    what makes you so convinced this is happening on rdna2 ? I'm puzzled since no one is reporting what you're saying here.
     

  5. Fediuld

    Fediuld Master Guru

    Messages:
    773
    Likes Received:
    452
    GPU:
    AMD 5700XT AE
    No Nvidia doesn't have something similar. And is for desktop also.
    Posted pages ago the Linux kernel update which includes references for Infinity Fabric Link on Navi 21 and Navi 12 (Radeon Pro 5600M). Navi 21 is the whole lineup coming with RDNA2 and not to be confused with the MI100 CDNA based GPU which is for servers.

    What I see atm is we having the same discussions about what AMD can do when the Threadripper and Zen 2 were rumours. AMD proves that is pushing the technology boundaries faster than it's competitors.

    2018 when discussed about chiplets and I/O dies everyone thought we were crazy and impossible to AMD to do so.

    AMD has pushed the envelop further in just 3 years than we have before. Yet even today we have people not been able to comprehend that AMD has an APU that is 8 core Zen 2, integrated I/O, with the GPU power of a 2080Ti while is half the size and way more power efficient than the latter and coming on sale for $500-600 tops. With NVME etc.
     
    Last edited: Aug 6, 2020
  6. cucaulay malkin

    cucaulay malkin Ancient Guru

    Messages:
    9,236
    Likes Received:
    5,208
    GPU:
    AD102/Navi21
    don't talk to me about consoles or servers or zen processors
    is rdna2 mcm like you're saying or not
    jeezus

    this is why I usually stay out of hype train threads.I always end up discussing things I'm not even sure are gonna exist.
     
  7. Denial

    Denial Ancient Guru

    Messages:
    14,206
    Likes Received:
    4,118
    GPU:
    EVGA RTX 3080
    Nothing that you linked shows what you said - that windows (and thus applications) will see multiple GPUs as one large GPU, which is the part I'm disputing. The applications still need to be aware of the multiple GPUs, the OS will still need to be able to schedule to them independently. It's not like you can just run a game across 3-4 RDNA2 GPUs and it will automagically scale like an MCM GPU setup would. It wouldn't scale at all unless the application specifically coded support for it.

    As far as Nvidia not having something similar - this is not true at all. NVLink is basically identical to this - especially on IBM processors that have NVLink built directly into the processor (and thus can forego PCI-E entirely). But even over PCI-E, NVLink has cache coherency support. AMD will obviously run PCI-E on it's motherboards for non-IF configurations, so it will just utilize the IF protocol over PCI-E, which may offer some benefits but not the one you described (single GPU). I also think the reason why Nvidia wants a stake in ARM is to integrate NVLink in the same fashion, as more and more server CPUs are being built on ARM.

    Also the Intel equivalent (replacement for QPI) is apparently UPI: https://en.wikipedia.org/wiki/Intel_Ultra_Path_Interconnect
     
    Last edited: Aug 6, 2020
  8. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    The conclusion from the paper is interesting though.

    I mean, what they basically say is that it's worth it.
     
  9. Denial

    Denial Ancient Guru

    Messages:
    14,206
    Likes Received:
    4,118
    GPU:
    EVGA RTX 3080
    Yeah it's absolutely worth it - it's just that it's a massive undertaking, a radical departure from the current configuration and not likely to just randomly show up in RDNA2. It's clear that both AMD/Nvidia are laying the foundation to head that way, hence why AMD is starting to put IF in it's GPUs, I just don't think you're going to see a full blown MCM configuration with RDNA2. When it does come it will probably be in CDNA. Further, what @Fediuld is talking about - multiple monolithic GPUs being seen as a single GPU (with similar behavior to an MCM setup in terms of scheduling) will probably never come. These new configurations with better protocols and interconnects may allow new techniques to help scaling in MGPU situations but I don't believe they'll ever allow for a non-MGPU aware application to run across multiple GPUs and scale nearly flawlessly.
     
  10. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    Isn't AMD putting IF in its GPUs at least since Vega?
    I don't believe either that they will go MCM right now, but for gen 3-4, the newer nodes are practically begging for it. Don't forget that at this point they have many years of real experience for a system like that because of Zen, which is not a GPU, but even in the paper the MCM is compared to a NUMA approach, and it really depends on the driver, or a hardware controller. Making the OS/application see it as one is the least of their problems I think.
     

  11. Denial

    Denial Ancient Guru

    Messages:
    14,206
    Likes Received:
    4,118
    GPU:
    EVGA RTX 3080
    I know they did for the Vega APUs, I don't know about the desktop graphics and if so if it was internally or as a protocol over PCI-E.

    Making the OS/Application see one GPU, in terms of like "Hey there is one GPU here" obviously this isn't the issue, it's the scheduling that occurs afterward to get real scaling, right? For example Nvidia/AMD can probably make two GPUs look as one and then just run AFR behind the scene but that's not going to be helpful in most situations, it's just a facade. I feel like when someone describes multiple GPUs as one - they are talking about automatically taking workloads and scaling the individual work, well, across multiple GPUs. According to Nvidia doing this even at MCM latencies requires insane amounts of bandwidth. With multiple discreet monolithic GPUs the latency is 100s of times greater.

    Again I think IF in RDNA2 (over PCI-E), similarly to NVLink, will offer some new opportunities for MGPU scaling, I just don't think it's the panacea that's going to allow for scaling within 10% of a single massive monolithic GPU. I'm not sure that will ever come with mGPU (multiple, individual GPU) solutions.
     
  12. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    Wouldn't a latency like the one in Zen 2 be more than OK for a GPU? Aren't CPUs actually more latency sensitive? If anything the paper says that as long as you have enough data bandwidth the scaling is actually good (more than 40%). They even managed to get within 10% of the performance of a monolithic die, which is insane.
     
  13. Astyanax

    Astyanax Ancient Guru

    Messages:
    17,011
    Likes Received:
    7,353
    GPU:
    GTX 1080ti
    no
     
  14. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    I never understood this. The current meme is that GPUs care more about memory bandwidth, and CPUs about memory latency. If latency is OK for a CPU, why wouldn't it be for a GPU? Do you have any sources on this, it sounds interesting.

    Also the CCD to CCD latency in Zen is measured in the ~70ns for the worst case scenario.
     
  15. Denial

    Denial Ancient Guru

    Messages:
    14,206
    Likes Received:
    4,118
    GPU:
    EVGA RTX 3080
    Define enough data bandwidth.. to get within 10% of a monolithic die without the schedule changes it requires bandwidth several times greater than the newest interconnects have now. It seems to me that the workloads are just different, which makes it hard to compare to the requirements of a CPU MCM setup.

    But regardless, the guy I'm quoting is saying it's going to happen with two individual GPUs over PCI Express.. which is obviously going to drive the latency up hundreds of times over the MCM setup as described in the Nvidia whitepaper.

    Like if you're asking if MCM will be a thing - yeah, definitely, probably within the next few generations. But is it happening with RDNA2? almost certainly not. Is it happening as described with two completely different GPUs? 100% no.
     
    Maddness likes this.

  16. Astyanax

    Astyanax Ancient Guru

    Messages:
    17,011
    Likes Received:
    7,353
    GPU:
    GTX 1080ti
    Latency scales exponentially once you've gone beyond the CPU, you might have only added a couple of ns unit to unit, but there are thousands of them making use of an interface to travel where it used to be routed around a corner or via a crossbar to pickup the data.

    the performance of this interface is already a problem for Ryzen at low clocks.
     
  17. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    This sounds a bit weird as a comparison, but hasn't HBM basically solved this problem? It has proven that if you have a good interposer and enough channels you can move terabytes of data around.

    I would guess it's more of a matter of die space for the fabric controller. There is also the idea of the Compute Dies + I/O die, which will homogenize access even more.
     
  18. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    But this latency would be within the GPU unit itself.
     
  19. Astyanax

    Astyanax Ancient Guru

    Messages:
    17,011
    Likes Received:
    7,353
    GPU:
    GTX 1080ti
    Except that's not how API rendering works, communication back and forth between cpu and gpu happens millions of times a frame.

    I believe NVLink actually has more throughput than Infinity Fabric, and thats still not capable of memory pooled rendering in realtime.
     
  20. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    So basically anything below PCIe latency/bandwidth would be ok?

    In the NVDIA paper that @Denial quoted they had no issues if the MCM bandwidth was around 1,5TB/sec. AMD has a GPU that does 1TB/sec with its memory already, which means they have a memory controller that can handle it.
     

Share This Page