Asynchronous Compute

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by Carfax, Feb 25, 2016.

  1. aufkrawall2

    aufkrawall2 Ancient Guru

    Messages:
    4,498
    Likes Received:
    1,874
    GPU:
    7800 XT Hellhound
    There are currently no fps gains to be expected in GPU limit, apart from Async Compute (if correctly supported by the GPU).
    This is what developers say and there is no DX12 application that would run any faster than with DX11 in GPU limit on Nvidia. So I'm not speculating, i'm just describing empirical evidence.

    :stewpid:

    I don't know and I don't care. When Bf3 was released, AMD was still stuck with VLIW architecture. Doesn't say anything and Battlefront still runs much faster compared to Nvidia on AMD than BF4.

    It's not a lie. You are mixing up CUDA and graphics. Nvidia can do parallel queues for CUDA for a long time, but it can't do parallel queues for graphics + compute on the GPU with DX12. Thus the driver needs to emulate Async Compute on CPU, which explains the performance loss and stuttering seen in Ashes with DX12 AC on vs. off on Nvidia.
    Be a bit more careful before telling someone to be liar when you easily mix up things.


    Are console ports to be known for their technical sanity?


    I really think you are a Nvidia employee, as you sound exactly like their bull**** on Twitter, taunting their customers...

    Async Compute is always an advantage, like HTT on CPUs when there are enough threads executed. Just listen to some GDC presentations and educate yourself.
     
  2. Alessio1989

    Alessio1989 Ancient Guru

    Messages:
    2,952
    Likes Received:
    1,244
    GPU:
    .
    Let's do a very little technical immersion about D3D12 specifics: actually there two features that can be actually emulated via driver if not supported by hardware.

    The first is implicit and is the range limit on root table SRV range: if the hardware is capable to access directly less then required SRVs descriptor tables the driver can use extra and implicit constants offset in the descriptor heap. This apply to both tier1 and tier2 of resource binding. I am not sure what GPUs need to this, probably none of the three major IHVs has a similar hardware limitation. The overhead added by this emulation is negligible, this is way it has been added. This sort of trick cannot be applied to the other restriction of the resource binding tears. Last time I checked MSDN still show the old limit (5 SRV descriptor tables), though I reported it different times.

    The second emulated features is the ability to bypass the Geometry Shader stage to output viewports and render target arrays semantics to all shader stage the infamous (VPAndRTArrayIndexFromAnyShaderFeedingRasterizer performance flag). If the feature is not supported the driver will create a dummy geometry shader on the fly. AFIK older NV architectures do not support this features.+

    It is not always. Running both compute and graphics tasks in parallel, even on different hardware resources, still requires hardware and driver synchronization. They both have a performance cost. Usually little, but it is not completely free. If the synchronization cost is greater than the benefit you will get a performance loss. The same apply to hyper-threading (though for different reason is even more rare to notice on most of games and consumer software). Note that all this does not justify some issues of current NV drivers.
     
    Last edited: Apr 6, 2016
  3. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,929
    Likes Received:
    1,044
    GPU:
    RTX 4090
    This is a lie as well. There are DX12 h/w features which may provide a performance gain when compared to DX11 FL11_0/1/2. GCN doesn't support them however as they are grouped in FL12_1 feature pack. The same features can be accessed in DX11.3 so they're not DX12 exclusive - which is a plus.

    Yes, there is. It's those application which are using FL11_0 in DX11 and FL12_1 features in DX12.

    Great reply! Always great when a person goes for personal insults when there's nothing factual to reply instead.

    It's pretty obvious that you don't know and don't care in general.

    I'm not mixing anything and that is a lie. Additional compute queues can be submitted to the h/w in a serial fashion and this isn't an emulation of anything and it is perfectly within DX12 spec.
    Be a bit more knowledgeable next time you'll decide to go around telling people what they should call whom.

    What's console ports have to do with it? A bad console port will perform bad on any h/w, GCN including, async compute or not. Cue GearsUE.

    I really think that you're an AMD employee as you sound exactly like their bull**** on Twitter and in async compute PR posts, ****ting into the brains of their customers with lies, FUD and disinformation...

    HTT isn't always an advantage as any person who knows what 2+2 is should be aware of. Async compute is an even smaller advantage because there are actually much less idle resources in a typical modern GPU than in a typical modern CPU due to the natively parallel nature of the former. There's also a lot of in-queue concurrency happening at any given time to make the whole secondary queues into what they really are - a hack helping AMD's poor graphics pipeline to achieve a higher GPU utilization. No other GPU on the market need it - Maxwell, Intel, PowerVR are all fine without it. Chances are that this is how it will always be - until AMD will fix their graphics pipeline makeing async compute pointless on their GPUs as well. Dixi.
     
  4. Alessio1989

    Alessio1989 Ancient Guru

    Messages:
    2,952
    Likes Received:
    1,244
    GPU:
    .
    dr_rus, it is not only about increasing performance or how good/bad is the graphics pipeline of a hardware architecture. It is also (and mostly) about application rendering pipeline flexibility, avoiding potential hardware/driver stalls and actually current NV hardware/driver combo could be a lot better on that.
     

  5. Yxskaft

    Yxskaft Maha Guru

    Messages:
    1,495
    Likes Received:
    124
    GPU:
    GTX Titan Sli
    I can imagine that it's no small engineering task, but getting 5-10% extra performance in DX12 games is no small thing.
    Nvidia needing to compensate by having 5-10% faster cards than AMD's equivalents would be no good for them.



    Alessio, do you think Vulkan is any better off than DX12 for Nvidia's current architectures? I'm thinking of if Nvidia can create its specific Async compute extension to get the boost that it's currently not getting in DX12
     
  6. -Tj-

    -Tj- Ancient Guru

    Messages:
    18,103
    Likes Received:
    2,606
    GPU:
    3080TI iChill Black
    I think it will be once directx releases SM 6.0 api update
     
  7. Alessio1989

    Alessio1989 Ancient Guru

    Messages:
    2,952
    Likes Received:
    1,244
    GPU:
    .
    Too early to evaluate it, but Vulkan has the same OpenGL ability to become a proprietary extension clusterfùck.
    D3D12 specification about multi-engine is not so strict about implementation, so it's all about hardware or driver issues.
    SM 6.0 has nothing to do with multi-engine. But yes, more compute shader semantics will be exposed.
     
    Last edited: Apr 7, 2016
  8. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,929
    Likes Received:
    1,044
    GPU:
    RTX 4090
    What is? I don't really get what you mean by "it" there.

    Is would be great for all of us as them compensating this with a general +10% performance increase will mean that NV's cards will be 10% faster everywhere, not just in DX12+async renderers. And it is actually a small thing. 10% is 3 fps when the game is running at 30. A difference between 30 and 33 is hardly ground braking.
     
  9. Yxskaft

    Yxskaft Maha Guru

    Messages:
    1,495
    Likes Received:
    124
    GPU:
    GTX Titan Sli

    You seem to believe it's easy as pie for Nvidia to simply improve the general performance by 10% to compensate for DX12+Async vs AMD's equivalents.
    If Nvidia hasn't made changes to Pascal to have effective support, this general 10% improvement won't magically appear out of nowhere to help Pascal out. I'm not claiming async is easy to implement either however.

    If minimum FPS generally is 30 FPS, 3 FPS extra does alot to remove those 20ish FPS dips

    6 FPS extra at 60 FPS will also help keep the framerate above 60

    Same deal for the enthusiasts having 144 hz monitors
     
    Last edited: Apr 7, 2016
  10. Ieldra

    Ieldra Banned

    Messages:
    3,490
    Likes Received:
    0
    GPU:
    GTX 980Ti G1 1500/8000

    I agree with what you're saying, the thing is


    I'm gonna copy. Paste a post I made on another forum


    First things first, there is only one game in which we can isolate the gain from asynchronous shaders from the gain from Dx12. Ashes.

    In ashes of the singularity, a compute bound game, there is absolutely no advantage for Gcn w/Async vs maxwell at fp32 parity. This is not debatable, I have proven this.

    Conversely, at fp32 parity, nvidia hw has a performance advantage vs amd hw with no async.

    Doom is a vulkan title, and the developers have stated they have gotten significant gains out of async shaders for amd hw.

    Despite these big gains they perform equal to their nvidia counterparts, which do not make use of asynchronous shaders.

    Ergo async shaders provide no tangible benefits over NVIDIA hardware running without them.

    The 10% boost is relative to amd hw running without async, not relative to nvidia hw.
     
    Last edited: Apr 8, 2016

  11. Yxskaft

    Yxskaft Maha Guru

    Messages:
    1,495
    Likes Received:
    124
    GPU:
    GTX Titan Sli
    The discussion was about Async possibly providing a 5-10% improvement boost for AMD hardware, and we'd ideally want that to happen for Nvidia hardware as well.

    Yes, I'm aware that percentage is a ratio. dr_rus said that a 10% performance increase for a game running at 30 FPS is just 3 FPS and claims that's too small to matter. IMO he's doing wrong to scoff at that improvement.

    My argument is that even 3 FPS can do alot to eliminate or minimize the drops below 30 FPS, and the same is if the game is running at 60 FPS or for the high-end gamers targeting 144 FPS. 5-10% improvement is nothing to scoff at in any case.
     
    Last edited: Apr 7, 2016
  12. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,929
    Likes Received:
    1,044
    GPU:
    RTX 4090
    Well, yeah. Last time I checked Maxwell's energy efficiency was way more than 10% better than GCN's so it is actually easy - it's even already happened to be precise as there are almost zero stock clocked Maxwell cards on the market - most of them are factory OCed.
    Granted this isn't really relevant for Pascal vs Polaris battle which is coming this year but I think that the changes in comparative power consumption won't be that big and the same argument will be applicable there as well.

    It can appear even on Maxwell, not "magically" but just from NV's general power advantage. Async has nothing to do with this really. NV will have to balance Pascal cards for modern workloads and that means that they'll have to make sure that Pascal cards of equal price aren't slower than Polaris/GCN cards on the same price points. As simple as that. The reason you're seeing NV cards loosing today lies in the fact that there was no GeForce lineup update since the launch of 980Ti a year ago.

    In all cases 10% isn't much. The same can be gained by bumping down some quality setting which you may not even notice as a difference in a game.
     
  13. Alessio1989

    Alessio1989 Ancient Guru

    Messages:
    2,952
    Likes Received:
    1,244
    GPU:
    .
    A waste of time and const char* indeed...
     
  14. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,929
    Likes Received:
    1,044
    GPU:
    RTX 4090
    If that "it" was DX12 async compute queues then you're incorrect. It is mostly there to fill in the idle times of GCN's graphics pipeline. The rest of cases can be accomplished from the graphics queue or doesn't need to run in parallel to graphics at all - hence why NV's drivers+h/w combo is pretty good at this without any support for concurrent compute queue execution.

    Feel free to not waste anymore time and whatever the **** "const char*" is here.
     
  15. Stormyandcold

    Stormyandcold Ancient Guru

    Messages:
    5,872
    Likes Received:
    446
    GPU:
    RTX3080ti Founders
    I don't think that's entirely true. I've always thought we pay a slight premium for Nvidia products for better support. Nvidia still have a better "day one" experience of a game's release track-record than AMD.

    Games like Gears of War, Hitman, The Division etc were better/smoother on Nvidia, on release. I don't see this changing and it's entirely reasonable for Nvidia to continue with this pricing strategy if their support also continues. In this regard; AMD has no answer, even when these games were made on GCN-based hardware.
     

  16. CrazyGenio

    CrazyGenio Master Guru

    Messages:
    455
    Likes Received:
    39
    GPU:
    rtx 3090
  17. fantaskarsef

    fantaskarsef Ancient Guru

    Messages:
    15,750
    Likes Received:
    9,641
    GPU:
    4090@H2O
  18. CrazyGenio

    CrazyGenio Master Guru

    Messages:
    455
    Likes Received:
    39
    GPU:
    rtx 3090
    early? 364.72 are game ready drivers for quantum break so i think there is nothing more to do.

    the game is performing well in AMD hardware, and the game is from windows store, so it's going to be like that forever.
     
  19. fantaskarsef

    fantaskarsef Ancient Guru

    Messages:
    15,750
    Likes Received:
    9,641
    GPU:
    4090@H2O
    Early in the day for me :wanker: didn't mean to comment on the drivers

    Is that thing in the top left the fps counter? I'm not familiar with it tbh.
     
  20. XenthorX

    XenthorX Ancient Guru

    Messages:
    5,057
    Likes Received:
    3,435
    GPU:
    MSI 4090 Suprim X

Share This Page