Asynchronous Compute

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by Carfax, Feb 25, 2016.

  1. Darren Hodgson

    Darren Hodgson Ancient Guru

    Messages:
    17,222
    Likes Received:
    1,541
    GPU:
    NVIDIA RTX 4080 FE
    I was shocked to see the ASync Compute option in Gears of War 4 but even if it does only add a handful of frames per second to the game it's still nice to have. Also, the game runs like a dream, looks great and is jammed packed full of every option you could ever need. The developers should be commended for this IMO as so many PC mutliplatform games seem like afterthoughts with barebones options and lacklustre performance/optimization.
     
  2. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,930
    Likes Received:
    1,044
    GPU:
    RTX 4090
    And the reason for this is, partially, at least: https://blogs.nvidia.com/blog/2016/10/07/dx12-gears-of-war-4/

    It's actually kind of an interesting result with NV's title working good on all h/w again, just a month after AMD's title (DXMD) running like **** on NV's h/w even in DX11, let alone 12, don't you think?
     
  3. Stormyandcold

    Stormyandcold Ancient Guru

    Messages:
    5,872
    Likes Received:
    446
    GPU:
    RTX3080ti Founders
  4. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,129
    Likes Received:
    971
    GPU:
    Inno3D RTX 3090
    Mafia III is an NVIDIA title and runs like dogsh*t. Some would say that it's the developer that actually matters, and not the company sponsoring.
     

  5. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,930
    Likes Received:
    1,044
    GPU:
    RTX 4090
    It runs like dog **** on everything and it's actually running a bit better on AMD cards. It's also DX11 so I don't see how it's relevant to this thread. I think it's time you've noticed that there are Nvidia titles and Nvidia titles and some of them are actually just games which were tested by NV for compatibility and nothing more.
     
  6. Denial

    Denial Ancient Guru

    Messages:
    14,207
    Likes Received:
    4,121
    GPU:
    EVGA RTX 3080
    http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/10

    Read pages 10 and 11.
     
  7. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,930
    Likes Received:
    1,044
    GPU:
    RTX 4090
    All of this is just completely wrong, starting with "dated and legacy preemption" and ending with "fully h/w based async". Please, read at least something from this thread before posting such bull****.
     
  8. Stormyandcold

    Stormyandcold Ancient Guru

    Messages:
    5,872
    Likes Received:
    446
    GPU:
    RTX3080ti Founders
    It looks like the deal's off. M3 wasn't a "way it's meant to be played" game as far as I'm aware. It must be so s**t that it has been removed from GeForce.com now. Nvidia must've known that it performed like a turd because M3 receive little to no advertising on Nvidia sites.

    Even Nvidia's facebook only had 3 posts, 1 for system requirements and 2 posts about the game that was actually only shared links.

    EDIT: I've asked both Nvidia and M3 facebook sites to confirm whether M3 is an Nvidia game or not, no answer yet. Both parties are silent on the issue.
     
    Last edited: Oct 12, 2016
  9. Redemption80

    Redemption80 Guest

    Messages:
    18,491
    Likes Received:
    267
    GPU:
    GALAX 970/ASUS 970
    Yeah, looks like it was tested by Nvidia purely to get settings for something like GFE and that was it.

    The fact neither Nvidia or AMD were involved with Mafia 3 is probably why it's so bad.

    I have to laugh at the post above that is still claiming async compute is magic and can bring huge performance increases from thin air.
     
  10. Redemption80

    Redemption80 Guest

    Messages:
    18,491
    Likes Received:
    267
    GPU:
    GALAX 970/ASUS 970
    The technical side has already been explained in this thread.

    Personally, I like my dumbed down, layman's way of looking at async compute.
    It's pretty much just a way of making better/efficient use of underutilised GPU hardware.

    Since AMD hardware is underutilised, async brings nice performance gains, but since Nvidia hardware is much better utilised/efficient it gains much less.

    I've been informed that depending on the engine this may not be the case all the time, but I bet that it's the case in every game so far, and the idea that "hardware async" makes GPU's run at 110-120% is a strange one, and logic seems to be getting lost.
     
    Last edited: Oct 12, 2016

  11. Redemption80

    Redemption80 Guest

    Messages:
    18,491
    Likes Received:
    267
    GPU:
    GALAX 970/ASUS 970
    It helps on AMD hardware that has 5-15% of underutilised hardware, not all hardware has that breathing space/weakness.
     
  12. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,129
    Likes Received:
    971
    GPU:
    Inno3D RTX 3090
    This only reinforces the opinion that neither AMD nor NVIDIA endorsements really matter, just what the developers do.
     
  13. Denial

    Denial Ancient Guru

    Messages:
    14,207
    Likes Received:
    4,121
    GPU:
    EVGA RTX 3080
    Yeah - in one of the interviews The Coalition dev said Epic had a big hand in bringing the game up on UE4 - which is most likely the reason why it runs so well. They probably optimized the hell out of it as it's the first really big AAA title on the engine.
     
  14. Stormyandcold

    Stormyandcold Ancient Guru

    Messages:
    5,872
    Likes Received:
    446
    GPU:
    RTX3080ti Founders
    There's AAA, then, there's wannabes.

    There's also this article for GOW4; https://blogs.nvidia.com/blog/2016/10/07/dx12-gears-of-war-4/

    I'm of the opinion there are games that do have hands on engineers from Vendors who do help work on games and make them run better. Then, there's games that just want the "branding" which is what I believe M3 is.
     
  15. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,930
    Likes Received:
    1,044
    GPU:
    RTX 4090
    Again? I've said it several times already in this thread. Preemption is a technique which makes multitasking possible so unless you want everything to run in order you need it and because of that there's nothing "legacy" in it. Pascal's way of handling pre-emption on pixel/instruction level is actually the best way there is in GPUs at the moment, GCN is behind on this now.

    As for pre-emption having nothing to do with async compute: async compute runs "concurrently" only when it runs on a dedicated SM partition which is precisely the way Pascal handles it (and GCN3+Polaris got the ability as well although it's kinda not really enabled in general I think). There's no pre-emption of any kind in play here as different contexts run on different execution units, this is very much like multicore CPU handling several contexts at the same time.

    GCN's way of running compute wavefronts on the same CUs which run graphics at the same time is nice as it improves the overall utilization of CU execution units - but one thing to understand here is that the actual execution is still happening serially as a SIMD can't process two wavefronts per clock. So when a graphics wave is running - the compute wave is waiting for it's turn and vice versa. This execution is "concurrent" only in the scheduling part, not the actual processing of wavefronts.

    As for NV h/w getting "massive gains in performance" when it will be able to run compute warps on the same SMs as graphics warps - this won't happen, you're looking at +5% to Pascal at best (much less is more likely even), because there's not much idle units in NV's SMs when running graphics, partially because there's a lot of scheduling level concurrency going on in NV's SM without any async compute already.

    And async compute gains will go down with h/w progressing further, not up, both for AMD and NV. Both NV and AMD are handling async compute "in hardware", there is no other way of doing this.
     

  16. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,930
    Likes Received:
    1,044
    GPU:
    RTX 4090
    AMD's gains in new APIs are mostly because AMD's drivers for old APIs sucks donkey balls. So instead of improving their driver they said "**** that, let the devs handle all this ****, we can't be bothered". Results are speaking for themselves really, with most devs being actually unable to provide as efficient solution as NV have in their drivers and thus falling behind in new APIs on NV h/w. Async compute general contribution to this is small, around +5% average even on AMD's h/w. Bulk gain is coming from better resource and CPU management than what AMD have in their driver.

    NV can't improve pre-emption further because there's nowhere to improve it from Pascal. Volta will most likely add the ability to run compute warps on the same SMs which are already running graphics but in case of NV this is unlikely to lead to significant performance gains because the issue which AMD have in their GCN architecture which do lead to these gains is just absent from NV's h/w. Basically, there's not much to utilize with compute in NV's SM when it's already doing graphics. Some corner cases will certainly benefit -- most likely the same which already benefit on Pascal's async implementation though. So the overall gain over Pascal will probably be very small.
     
  17. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,129
    Likes Received:
    971
    GPU:
    Inno3D RTX 3090
    So you mean that there is concurrency in the CU level, and not in the SIMD level, right? A GCN CU has four SIMDs in it, so it could basically run four different things at once, per clock. A GCN SIMD has to finish its current task, to grab another task, so there is no concurrency in the SIMD level.

    On Pascal the concurrency exists in the GPU level, as each SM can do one thing until you switch it (which you actually can outside of draw call boundaries in Pascal), right? So GCN offers more fine grained thread/job control, but it pays the price for that by having more ALUs that stay idle.

    There is also a nice perspective in from Anandtech's Ryan Smith:

     
  18. pharma

    pharma Ancient Guru

    Messages:
    2,496
    Likes Received:
    1,197
    GPU:
    Asus Strix GTX 1080
    https://forum.beyond3d.com/posts/1939976/
     
  19. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,930
    Likes Received:
    1,044
    GPU:
    RTX 4090
    Yes, you could say that GCN's "granularity" of assigning execution units to different contexts is higher than Pascal's. But this wouldn't really be an advantage for NV's h/w as NV's h/w doesn't have as much (state change) idle bubbles in graphics execution on the SMs and because of this there's little gain in assigning (stateless) compute warps to the same SMs as graphics. This "granularity" choice is a conscious one, in both architectures. What results in a possible performance gain for GCN would not result in the same gain on Pascal.

    The biggest plus of GCN's approach for NV's h/w is the generalization of execution scheduling which should make it simpler for a driver+h/w combo to run mixed contexts workloads resulting in, potentially, simpler driver and more universally robust h/w execution. But it's unlikely that GCN's approach will actually bring performance improvements for NV's architecture outside of some corner cases. I don't expect that from Volta, and I'm pretty sure that if Volta will bring some significant performance gains over Pascal - it certainly won't happen because of async compute or the ability to run different contexts on the same SM. This is mostly a convenience feature, not performance one.
     
  20. stevevnicks

    stevevnicks Guest

    Messages:
    1,440
    Likes Received:
    11
    GPU:
    Don't need one
    Maybe the site should be changed to guru geeks, the average gamer cares more about playing the game and not worry about the rest, still i guess it's just away to make the geeks feel they know best all the time lol, if the geeks spent as much time playing their games as they did benchamrking and gloating, they wouldnt be a issues about what dose AS best.
     

Share This Page