Radeon Instinct MI25 specifications

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Jun 21, 2017.

  1. malitze

    malitze Active Member

    Messages:
    89
    Likes Received:
    0
    GPU:
    Sapphire Fury X
    I think so too, additionally one could question whether the architecture profits from higher frequencies in a way that would be worth the drawbacks from them. I don't think this is yield related since the difference is just too big (~950 MHz vs. ~1.4 GHz). But we won't know until the Boss gets his hands on one of these babies ;)
     
    Last edited: Jun 21, 2017
  2. SHS

    SHS Master Guru

    Messages:
    502
    Likes Received:
    47
    GPU:
    Sapphire Vega 56
    That not what I heard just by going to 12nm it at lease a 25% less power 7+% boost in faster speed vs 14nm technology, And beside I said could not should there a diff there.
    We know that the next big thing is going to be 7nm unless they end up shot for 10nm so we just wait and see there goes, As I'm petty sure there reaching the limit on how far they can go down with out losing long term reliability as I think we are coming close to the end with Moore's law.
    I don't want heard we created few sample core there only used in lab's it need be real world test or there no point in it also have factor in the cost with good yield count.
     
  3. Denial

    Denial Ancient Guru

    Messages:
    14,206
    Likes Received:
    4,118
    GPU:
    EVGA RTX 3080
    They don't have the same memory interface. Instinct MI25 is 2048, GV100 is 4096.

    Those numbers are from TSMC's 12nm FFC node, which is a mobile node and it's also based on 6T library with a 20% density improvement, Nvidia is still using 7.5 - hence why I said there is no density difference. I'm sure there is some power savings with FFN but without the density increase I can't imagine it being significant. Most of the power savings is coming from the architecture changes - Nvidia claims a 50% perf/w increase on FP32.

    Also both companies are said to be skipping 10nm completely.
     
    Last edited: Jun 21, 2017
  4. Kaarme

    Kaarme Ancient Guru

    Messages:
    3,513
    Likes Received:
    2,355
    GPU:
    Nvidia 4070 FE
    I'll be damned. That certainly explains it. AMD is saving some money, but it makes sense considering the competition. I'm certain I've actually seen that in some tables before, but somehow my memory dismissed it since Fiji already had 4096.

    Thanks for pointing this out!
     

  5. SHS

    SHS Master Guru

    Messages:
    502
    Likes Received:
    47
    GPU:
    Sapphire Vega 56
    Just wait I'm sure AMD did that one purpose as in holding back and plan for refresh ver with 4096 just throw nVidia off it track LoL that would good one after all it always seem to be cat and mouse game with them two.
     
  6. Truder

    Truder Ancient Guru

    Messages:
    2,392
    Likes Received:
    1,426
    GPU:
    RX 6700XT Nitro+
    What about the cache controller and the ability to access memory outside it's own memory pool? I'm guessing that's the key feature difference? I don't know if nVidia's solution has something similar? If not I guess that's why they doubled the bandwidth, if so.... then I can't see Vega being as good against it, except in cost.
     
  7. SHS

    SHS Master Guru

    Messages:
    502
    Likes Received:
    47
    GPU:
    Sapphire Vega 56
    That I don't know as there no info on cache and rest of the die, the rest Agree
     
  8. Denial

    Denial Ancient Guru

    Messages:
    14,206
    Likes Received:
    4,118
    GPU:
    EVGA RTX 3080
    CUDA has been able to address a unified memory pool since CUDA 6, which included both Kepler/Maxwell. They improve it's ability/performance each gen and I don't know the performance characteristics compared to AMD's HBCC, but Nvidia definitely offers the functionality.

    That being said, the need for higher memory bandwidth on the GV100 is probably due to the tensor cores. Tensor operations are heavily dependent on memory bandwidth:

    https://www.extremetech.com/computi...u-makes-hash-intel-nvidia-inference-workloads
     
  9. Aura89

    Aura89 Ancient Guru

    Messages:
    8,413
    Likes Received:
    1,483
    GPU:
    -
    Could it be that Tesla uses 4 4GB HBM2 stacks while this chip uses 2 8GB stacks? I'm not entirely certain thats how it would work, but that's how i remember it working?

    "Retaining 1024-bit wide access, HBM2 is able to reach 256 GB/s memory bandwidth per package. The HBM2 spec allows up to 8 GB per package."

    Using 4 4GB HBM2 stacks (or packages) probably costs more then 2 8GB stacks (or packages)
     
  10. Truder

    Truder Ancient Guru

    Messages:
    2,392
    Likes Received:
    1,426
    GPU:
    RX 6700XT Nitro+
    But that's just for CUDA operations right? Not sure from what has been presented by AMD but their controller seems to indicate they can access memory for all or most operations handled on the GPU? I've no idea really, I'd really like to see more info on it but don't know where to look.

    There was that demo they showed of limiting the VRAM to 2GB and allow the controller to intelligently load in textures into memory.
     

  11. solo16

    solo16 Active Member

    Messages:
    72
    Likes Received:
    1
    GPU:
    7900XTX OC
    I'm curious about the double precision figure...isn't that suppose to be half of the single precision?
     
  12. ChicagoDave

    ChicagoDave Guest

    Messages:
    46
    Likes Received:
    2
    GPU:
    EVGA 1060 / EVGA 970
    Just wanted to point out that long-term reliability isn't really even a consideration with regards to shrinking process nodes.

    First is obviously just developing the physical method to make chips with contents packed so closely together.

    Once you sort that out, the main roadblock we're running into is electron leakage. When you get to the single-digit micron node, the wires are so close together that electrons can simply jump across to a neighboring one. That's a big problem and there's really no way around it at this point. It's strictly physics...at 7nm (not sure what the actual number is) and below, wires are close enough together that electrons just jump ship. They hop over to the next wire, and that screws up both the transistor it was leaving/going to, and the set that it just jumped to...not good. This is absolutely a show stopper below 5nm.

    Finally, as your chip gets smaller, the components "working" at that time are all closer together than they previously were, producing a localized hotspot. Since the size of the actual chips don't care much, Intel/AMD just cram more transistors in the same area. At 25nm a 2"x2" chip held 10m transistors. Five years later that same 2x2 area holds 30m transistors (making up these numbers). On the macro side, you have the chip as a whole holding increasing amounts of transistors, which al produce heat during their operations. On the micro side, you have the same 50 transistors that used to have 40nm of space between them, now have 10nm of space between them. This produces localized heat that can be difficult to remove. Not sure how big of a problem this is, but again it will become more acute as we continue shrinking nodes.

    There's other issues as well, just figured I'd list some of the major ones. AFAIK long-term reliability isn't a major concern at this time. Need to sort 1 and 2 before worrying about the rest.
     
    Last edited: Jun 22, 2017

Share This Page