Asynchronous Compute

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by Carfax, Feb 25, 2016.

  1. dr_rus

    dr_rus Ancient Guru

    Messages:
    2,984
    Likes Received:
    331
    GPU:
    RTX 2080 OC
    I've heard this point of view many times but the fact is - it's just plain wrong.

    Here are the results of the latest VR benchmark for example:

    [​IMG]

    Fact is - NV's cards are very competitive to AMD's in price/perf even if you disregard the more advanced software ecosystem and better overall compatibility with APIs (+FL12_1, +PhysX, +CUDA) and severely higher power requirements of AMD's cards.

    AMD has to sell a much more advanced cards with twice the VRAM to even compete with what NV is offering - and how is that good for AMD exactly? Even in this situation where a choice between 970 and 390 seems to be an easy one - AMD still looses the market. They need something which will plain be better, always and everywhere, like 970Pro again to turn the tides - and I don't see anything from them that actually can.
     
    Last edited: Feb 26, 2016
  2. Alessio1989

    Alessio1989 Maha Guru

    Messages:
    1,454
    Likes Received:
    249
    GPU:
    .
    I do, but that is just my guess (I do not have any documentation about AMD implementation): graphics works involve different part of the hardware, like geometry and rasterizer, as well texture filtering units while compute jobs do not (AFIK). Setting up the pipeline for graphics works may take some time where the some part of the GPUs essentially does nothing. This time is perfect for working in concurrency with computes jobs (ie: compute shaders).
     
    Last edited: Feb 26, 2016
  3. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,005
    Likes Received:
    139
    GPU:
    Sapphire 7970 Quadrobake
    Nope, it clearly states that commands cannot be given without big penalties whithin draw calls. Meaning that the card has to give the draw command before it takes compute tasks. That's it.

    What you post shows performance per dollar, and on the top five 3 of the cards are AMD. Let's not even say that this is a synthetic benchmark and it doesn't take into account things like VRAM sizes that DO matter in purchasing decisions. I'm sure that if any rational person had the choice between the 390 and the 970, it would get the 390 no questions asked.

    The FL12_1 doesn't matter, especially for Maxwell cards that aren't getting anything extra from DX12 performance. The rest are propertiary things that involve investment in the NVIDIA ecosystem. CUDA is nice, but OpenCL is catching up, and it's used everywhere by everyone. Same for PhysX, as engines like Havoc or even PhysX itself has to run properly in a variety of hardware, otherwise nobody will use them.

    I'm not sure that AMD will be losing the market any time soon. They have shown signs of recovery since the introduction of the 300 series actually, and I expect their Q4 2015 results to be even better. This response also misses the point.

    The point being that NVIDIA chose to go with less complex scheduling hardware to get more performance per watt, and now they see no performance benefit from a lower level API, because their driver is basically fulfilling this role for most games. Why is that so hard to grasp? Look at the things I paraphrased from Anandtech. NVIDIA has said so themselves in the Maxwell architecture presentation to websites. They have explicitly said that the Maxwell scheduling hardware is inferior, and it was a conscious design choice. Why everybody pretends not to read it?


    I'll post it again here:
    How hard is it to get?
     
  4. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,005
    Likes Received:
    139
    GPU:
    Sapphire 7970 Quadrobake
    Thanks for the explanation Alessio.
     

  5. -Tj-

    -Tj- Ancient Guru

    Messages:
    16,506
    Likes Received:
    1,538
    GPU:
    Zotac GTX980Ti OC
    Most of that stuff you quted as me wasnt my txt, fix it if you wanna qoute mutiple sentences and debate with the one posting it. Just fyi

    And I'll post this again since you seem to ignore it



    source: http://ext3h.makegames.de/DX12_Compute.html
     
  6. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,005
    Likes Received:
    139
    GPU:
    Sapphire 7970 Quadrobake
    I'll keep just this:

    You realize that they basically admit that their hardware can't really handle DX12, right? I also love (like you do), that everybody pretends to be blind about the architecture posts on Maxwell that I'm quoting. How they said themselves that the scheduling hardware is inferior.
     
  7. CalinTM

    CalinTM Ancient Guru

    Messages:
    1,550
    Likes Received:
    4
    GPU:
    MSi GTX980 GAMING 1531mhz
    Cant handle dx12 ? Too bad, buy pascal. R9 200/300/fury series will be low end until some proper dx12 games come.

    Whats all the fuss about ? Nvidia has their 80% market share, because of this. Making ppl. buying new stuff. They not included those dx12 things in hardware by purpose. A company has their objective in making money...its normal.
     
  8. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,005
    Likes Received:
    139
    GPU:
    Sapphire 7970 Quadrobake
    No, they made an architectural compromise. It gave them one and a half year of almost complete market domination. It was a good choice. Not expecting to get more from DX12, that doesn't mean that the cards are bad.
     
  9. -Tj-

    -Tj- Ancient Guru

    Messages:
    16,506
    Likes Received:
    1,538
    GPU:
    Zotac GTX980Ti OC
    I will make judgements if it doesnt support dx12 or not when Im going to play some proper dx12 games and I'll let you know then if it runs ok.


    Cuda is cuda if they say use async in cuda lvl instead of dx12 api lvl doesnt mean its not dx12 capable. Running in cuda is faster anyway far more direct to metal then directX api ever will be.

    Now interpret this how you wish ;)
     
  10. Singleton99

    Singleton99 Maha Guru

    Messages:
    1,027
    Likes Received:
    68
    GPU:
    Aorus-Extreme-1080TI
    So with all this said , what's the future looking like with my 980 ti's , once dx12 is being used more and more will this generation of maxwell become obsolete very quickly ,should we sell or cards now while there worth something and hold onto the money ? ,i do hope this isn't the case as when i got these cards i wanted at least 3 rys out of them .

    Are we as nvidia customers doomed by the introduction of dx12 and Asynchronous compute.

    As you can tell all this goes over my head a lot ,, but i'm starting to learn or trying to :pc1:
     
    Last edited: Feb 26, 2016

  11. Alessio1989

    Alessio1989 Maha Guru

    Messages:
    1,454
    Likes Received:
    249
    GPU:
    .
    I still do not see why they will not able to bring a better dispatcher on the d3d12 driver too like the one of the CUDA driver... Or I simple ignore some implementation details...
     
  12. Carfax

    Carfax Ancient Guru

    Messages:
    2,671
    Likes Received:
    310
    GPU:
    NVidia Titan Xp
    That's pre-Maxwell v2 only. With Maxwell v2, the GPU can do 1 graphics plus 31 compute tasks in mixed mode, or 32 compute tasks in compute mode.

    This information comes straight from Anandtech as well, since you like to quote them a lot :)

    Source

    I'm actually subscribed to that thread, so I'm familiar with it. The guy who wrote that benchmark/test app was a novice programmer. Also, asynchronous compute wasn't enabled in the GPU drivers for NVidia..

    We'll have to wait until AC is fully enabled before we can come to any final conclusions.

    That's pre-Maxwell v2 only. That limitation is lifted for Maxwell v2.

    Why do you think Maxwell v2 is so much faster than Kepler with PhysX workloads? Because it can execute both graphics and compute in parallel.

    This has already been addressed by TJ and dr_rus. The Tesla being a Kepler based variant has nothing to do with the supposed lack of a hardware scheduler..

    Here's the thing though. One other DX12 benchmark (Fable Legends) shows no discrepancy with NVidia.. In fact, the GTX 980 Ti is faster than the Fury X in that benchmark, and this is using reference cards.

    With aftermarket cards, the gap would be even larger. Also Fable Legends uses asynchronous compute for the dynamic global illumination..

    [​IMG]
    [​IMG]
    [​IMG]
     
  13. fellix

    fellix Member Guru

    Messages:
    172
    Likes Received:
    12
    GPU:
    KFA² GTX 1080 Ti
    The warp scheduling in Maxwell is indeed simplified because it doesn't need to be complex anymore, since the purpose of the new SMM layout is to boost the perf/W ratio with more balanced architecture. Naturally, a big part of the downsizing was the significant reduction of the number of FP64 units, compared to Kepler. As compensation, many improvements to the memory pipeline were made in Maxwell, like doubling of the LDS size and general streamlining of the data caching routines.

    The only advantage of Kepler (GK110 and GK210) is the high FP64 throughput and the official HPC validation for the Tesla line of SKUs.
     
  14. Carfax

    Carfax Ancient Guru

    Messages:
    2,671
    Likes Received:
    310
    GPU:
    NVidia Titan Xp
    It's probably being developed as we speak. These things take time.

    I remember it took NVidia about two years to come up with a working DX11 multithreading driver, something which AMD still lacks.

    Knowing NVidia, they are waiting until it's as great as they can make it before they release it. Since no final release DX12 game is available yet, they are not really under any pressure to release it before it's ready.
     
  15. -Tj-

    -Tj- Ancient Guru

    Messages:
    16,506
    Likes Received:
    1,538
    GPU:
    Zotac GTX980Ti OC
    I checked with AIDA64 and it has 2 async engines, dunno though as a whole or per block..

    CUDA
    [​IMG]


    OpenCL
    [​IMG]

    I see this driver has OpenCL2.0 @ 62% maybe full profile also adds some async parts in it and they also "wait" (optimize) for that to make it full?
     

  16. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,005
    Likes Received:
    139
    GPU:
    Sapphire 7970 Quadrobake
    If you want my bet, your cards will perform under DX12, roughly the same as under DX11. The only "problem" is in the comparison with AMD cards, because NVIDIA users expect similar performance uplifting from them, which is completely unreasonable in my opinion.
     
  17. Barry J

    Barry J Ancient Guru

    Messages:
    2,770
    Likes Received:
    122
    GPU:
    RTX2080 TRIO Super
    I agree NVidia cards are very well optimised in DX11 so if hardware is already being used to almost its maximum DX12 will only give slight/no improvement. AMD has huge room for improvement due to poor DX11 usage
     
  18. fellix

    fellix Member Guru

    Messages:
    172
    Likes Received:
    12
    GPU:
    KFA² GTX 1080 Ti
    That feature could be referring to the ability of the GPU to utilize the PCIe bus full-duplex (bi-directional) data transfer. For that the GPU has to have two independent interface controllers (engines) to facilitate async requests.
     
  19. Yxskaft

    Yxskaft Maha Guru

    Messages:
    1,371
    Likes Received:
    82
    GPU:
    GTX Titan Sli
    Shouldn't Nvidia be able to do that through NVAPI?
     
  20. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,005
    Likes Received:
    139
    GPU:
    Sapphire 7970 Quadrobake
    So ignoring DX12, like they are actually saying for heavy workloads in their documentation? Doesn't that completely negate the usage of a common API?
     

Share This Page