Ampere's "doubled" shader units

Discussion in 'Videocards - NVIDIA GeForce' started by Tyrchlis, Oct 16, 2020.

  1. Tyrchlis

    Tyrchlis Member Guru

    Messages:
    159
    Likes Received:
    68
    GPU:
    PNY RTX 3090 XLR8
    I've had hands on with my 3090 for just over 2 weeks now and while it is exactly what I expected and hoped for and has delivered on some amazing ray traced experiences already, I am fairly sure that whatever "doubled" shader unit performance that Nvidia has claimed is really beginning to take on the flavor of Intel's Hyperthreading hype when it launched back in the Pentium 4 days. In short, I think it might almost be fair to call what Nvidia has done to it's shader units are loosely using some kind of SMT setup that is just totally unoptimized right now.

    Intel made massive claims about HT that did not bear out in real world testing when it launched. Most multithreaded software saw little benefit from the extra "thread" at that time due to lack of resources on the processor side, and lack of optimization for multithreading to a single core that can execute 2 threads in parallel (theoretically, back then...).

    It took years for HT to really catch on at all. And it's still iffy whether it helps or hinders from time to time, but optimizations and newer generations of implementation eventually made HT legitimately useful overall to some extent.

    With this in mind, wouldn't it be more accurate to ignore Nvidia's market speak BS of doubled shader count in Ampere and go with something similar for shaders to how CPU cores and threads are measured? Something like "Shader Cores" and "Shader Threads" (massively unoriginal, I know, open to better ideas).

    I'm seeing closer to the kind of performance from RTX 3090 that I would expect from a 5248 shader unit GPU with a small bump in IPC per shader (exactly what was expected) in respect to most of the posted 4352 unit RTX 2080 Ti benches out there. There is just nothing I have seen to indicate that 10496 shaders worth of horse power are present unless shader output itself has dropped dramatically. That would be an unfortunate reason, but one I am open minded to listening to if somebody has a good explanation.

    DO NOT TAKE THIS AS ME WHINING! I love this card! I am not downing it or saying I feel in any way short changed by it. I just feel there's a bit of BS hype from Nvidia that we have had to just take as the official way of it now, and it doesn't add up. We know corporations like Intel, Nvidia and AMD love to hype themselves and over-rate their tech. But this doubling thing feels a bit like a road to far to swallow. -I- see doubled (and more) performance over my 1080 Ti, but that is irrelevant as that is a 2 generation jump. 2080 Ti is a relevant reference point and it just seems that RTX 3090 is performing as expected for the singled shader count (with IPC enhancement), not the doubled one.
     
  2. Tyrchlis

    Tyrchlis Member Guru

    Messages:
    159
    Likes Received:
    68
    GPU:
    PNY RTX 3090 XLR8
    I should also add that if my hypothesis of Nvidia implementing some form of SMT at the shader unit level is going on, this could be an opportunity to explain exactly that instead of this big show of "doubled physical shader units!" that just patently doesn't bear out in actual real world performance output at all.

    Nvidia could get game devs onboard with it as a feature and push it hard to get optimizations in place quicker so it doesn't literally lose an entire generation with misleading information resulting in developer misunderstandings. If you implement SMT and want SMT boosted performance, you have to tell devs SMT is there AND give them a tool kit to optimize for it. Intel mastered this, eventually. AMD learned from Intels mistakes on the CPU side.

    Now Nvidia may have to figure out how that works. SMT doesn't mean automatic doubled core count, but it can sure be marketed that way and end up resulting in something that looks alot like what we're seeing now. Nvidia needs a fancy marketing name for it, like Turbo Shaders or something flashy to get devs excited :p
     
  3. teleguy

    teleguy Maha Guru

    Messages:
    1,316
    Likes Received:
    121
    GPU:
    GTX 1070/Vega 56
    IIRC Turing's shaders contain a certain number of floating point units and the same number of integer units. The shaders of Ampere are pretty similar however the integer units can now also double as floating point units. So in theory Ampere has twice the amount of floating point units as Turing and that's why Nvidia counted each shader twice.

    Of course this is only true if the GPU doesn't have to use some units for integer operations.
    Therefore the actual performance of Ampere's shaders heavily depends on the workload. In the best case they're twice as fast as Turing's shaders, in the worst case they perform the same. The actual performance is somewhere in between.

    However maybe in the future games will be better optimized for the architecture by relying more heavily on floating point operations.
     
    Last edited: Oct 16, 2020
    Tyrchlis likes this.
  4. Tyrchlis

    Tyrchlis Member Guru

    Messages:
    159
    Likes Received:
    68
    GPU:
    PNY RTX 3090 XLR8
    That makes sense, but then it also leaves us with a big hole in the entire claim of doubled shader units when it really does come down to just reworked internals on the existing shader units, and a tad bit more IPC.

    I really feel weird then referencing RTX 3090 as a 10496 shader card when it's not. It really is a 5248 shader unit card with some interesting tweaks to the shader units themselves. GPUz and pretty much every other "readings" applications counts shaders the new Nvidia way.

    I do love the RTX 3090, but I think not for the reasons Nvidia has tried to hype.
     

  5. Astyanax

    Astyanax Ancient Guru

    Messages:
    7,985
    Likes Received:
    2,667
    GPU:
    GTX 1080ti
    So theres a big misunderstanding here and throughout the PC community at large.


    Doubling the amount of FP32 capable units does not mean double the performance, that goes for any generation of hardware, which if you've noticed there wasn't twice as many frames going from 2560 shaders to 5120, theres more involved - it means it can do twice as many FP32 calculations than a card with the exact same architecture with half that number, You're still going to hit other bottlenecks in the traditional raster engine before you've gotten to cranking crap up on the shaders.

    Keep in mind that SM's can be dispatched independently, just because SM1 is doing Int on path 2 doesn't mean SM 2-72 have to as well, and the context penalties for switching on these dual mode units is negligible these days.
     
    jura11 and Tyrchlis like this.

Share This Page