Nvidia Has a Driver Overhead Problem

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by RealNC, Mar 15, 2021.

  1. mbk1969

    mbk1969 Ancient Guru

    Messages:
    11,503
    Likes Received:
    9,245
    GPU:
    GF RTX 2070 Super
    My 2 cents:
    If DX9/10/11 code in drivers is mostly single threaded (with occasional helper threads) and DX12 code in drivers is mostly multi threaded (as I take it DX12 offers multi threaded API for game devs) then it is not a strange thing that AMD and NVIDIA have different results in DX12 path - simply because multi threaded code is not a joke, and good multithreaded code is almost un-achievable.
     
    BlindBison, enkoo1 and Cave Waverider like this.
  2. Chrysalis

    Chrysalis Master Guru

    Messages:
    278
    Likes Received:
    43
    GPU:
    RTX 3080 FE
    yes they did but most people dont fully watch videos, thats why titles are important.

    Can you please show me where this information its down to the scheduler is confirmed? thanks.
     
  3. BlindBison

    BlindBison Master Guru

    Messages:
    929
    Likes Received:
    221
    GPU:
    RTX 2080 Super
    He's probably referring to the Nerdtech video I linked earlier in this thread and that Hardware Unboxed linked in their comments section that went over the schedulers for AMD and Nvidia -- that video claimed that AMD used a hardware scheduler while Nvidia has a software scheduler that takes up CPU resources (provided I'm understanding it correctly). Astynax has pushed back against that videos claims though from what I gather so now I'm not sure what to believe really. In any case, it is true that we appear to be seeing meaningfully lower cpu bound performance in modern dx12/vulkan games so my hope is Nvidia is able to close this gap somehow.
     
    Chrysalis likes this.
  4. Chrysalis

    Chrysalis Master Guru

    Messages:
    278
    Likes Received:
    43
    GPU:
    RTX 3080 FE
    Thanks what I do remember is two or three years back lots of the reviewers were talking about AMD's DX12 advantage, I think HU was one of those reviewers, and seems to have forgotten about it.

    From what I remember of those videos (As well as articles from non youtubers).

    1 - Nvidia was winning in DX9/DX11 because those API's rely a lot on driver optimisation for performance and its something Nvidia has always put a lot of effort into.
    2 - AMD was winning on DX12 because drivers have less effect on this API, whilst having hardware built for it is going to get you wins, remember DX12 has stuff in common with Mantle which AMD designed.

    Whether these videos and articles were on the money I dont know but thats what I remember consuming as content. I welcome to be corrected on these points of course. But this is why I think its not as simple as just stating Nvidia has a driver problem. HU have also already realised they made a mistake claiming its a general problem when all they did was test a few DX12 games.

    I do expect by the time DX12 is fully relevant Nvidia will close the gap. They always want to be king.
     
    Darren Hodgson likes this.

  5. ern88

    ern88 Member

    Messages:
    39
    Likes Received:
    12
    GPU:
    MSI RTX 3080
    So, I take it a VBios update couldn't fix this? Or a hardware rework?
     
    BlindBison likes this.
  6. BlindBison

    BlindBison Master Guru

    Messages:
    929
    Likes Received:
    221
    GPU:
    RTX 2080 Super
    @ern88 Unfortunately i'm not knowledgeable enough to know what improving/resolving this would entail. If the solution requires hardware on the GPU for whatever reason or if it takes a major re-write of some driver system(s) then we might not see it fixed for quite sometime (if at all -- we still haven't received an official response from Nvidia from what I gather).

    Here's hoping we learn more about the nature of the problem/behavior and it gets resolved though as 20-30% CPU bound performance is quite significant and DX12/Vulkan will become increasingly relevant in time. If that one Nerdtech video is correct, by the sounds of it it would require hardware on the GPU, but going off Astynax comments I'm not sure.
     
  7. BlindBison

    BlindBison Master Guru

    Messages:
    929
    Likes Received:
    221
    GPU:
    RTX 2080 Super
    From what I've read, DOOM Eternal and Id's work in Id Tech 7 are extremely impressive examples of multithreading game code on the CPU side with a modern API (Vulkan) -- the CPU utilization in that game and real world performance to visual return are just outstanding seems to me.

    We've heard for many many years at this point that multithreading games effectively is extremely hard, but there have been a handful of games to do it very well at this point and that number will hopefully increase as others more or less implement similar systems.

    I'm not a game or engine programmer, just a typical software dev, but I've read a small amount regarding the modern job queue systems that are being used for some of these newer titles. Going back to DOOM Eternal its one of the few games out there where the 3900X actually outperformed the 10900K in the tests I've seen and though I didn't personally enjoy the title quite as much as its predecessor (DOOM 2016), I'm honestly floored by how well programmed that game is. One of the devs claimed that they'd more or less eliminated the concept of a main thread which might be hyperbole, but its remarkable what they achieved in any case.

    So, all that to say, properly multithreaded and efficiently written game code on the CPU can be done/we know it has been done, but I also acknowledge its seemingly a very difficult task or at least has been very challenging to "solve" historically. Destiny 2 also had a helpful GDC talk iirc on how they went about multithreading their engine -- from what I recall it sounded extremely difficult/complex and they really lost me in some of their diagrams.
     
    Last edited: Mar 19, 2021
    mbk1969 likes this.
  8. mbk1969

    mbk1969 Ancient Guru

    Messages:
    11,503
    Likes Received:
    9,245
    GPU:
    GF RTX 2070 Super
    Last edited: Mar 19, 2021
    BlindBison likes this.
  9. BlindBison

    BlindBison Master Guru

    Messages:
    929
    Likes Received:
    221
    GPU:
    RTX 2080 Super
    Thanks for the link, will enjoy giving that a read after work :)

    EDIT: One thing I forgot in my last comment is that for DOOM Eternal is for the whole “job queue” system they’re using a sort of dependency graph as described in this thread by mehel: https://www.reddit.com/r/gamedev/co...&utm_medium=ios_app&utm_name=iossmf&context=3

    > “... the basic idea is to split the whole game update into jobs and form a dependency graph of those jobs. You can then execute the jobs that don’t depend on one another simultaneously using the queue they mention. As the jobs are executed more and more are unlocked by their dependencies being fulfilled and can also be run until all jobs have been processed and things can begin again.”

    From what I gather this was probably quite difficult, but my hope is more games will begin leveraging DX12/Vulkan and using similar approaches down the line.

    Somewhat related perhaps, I wonder what kind of unique challenges the split CCX design of Ryzen entails given the latency penalty of passing data between the CCXs for Zen 2 for example. I’m not knowledgeable enough to know unfortunately, but it’s fascinating to see how the monolithic die CPUs compare in gaming sometimes.I wonder how much the large “game cache” helps with that (if at all), but I’m getting off topic so I’ll stop now. My knowledge base is limited so I apologize if I’m misrepresented anything of course.
     
    Last edited: Mar 19, 2021
    mbk1969 likes this.
  10. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,900
    Likes Received:
    776
    GPU:
    Inno3D RTX 3090
    I haven't read the whole thread, but isn't this behavior clearly controlled by the "Threaded Optimization" setting? It's not as if it doesn't have an effect.

    Also, NVIDIA hardware does have hardware scheduling in since Pascal. Turing and Ampere have even more schedulers.
     

  11. Undying

    Undying Ancient Guru

    Messages:
    16,838
    Likes Received:
    5,722
    GPU:
    Aorus RX580 XTR 8GB
    Apparently its software bases since fermi so it does not have. It explains why it sucks so much btw. Who knows how many generations will pass since nvidia add hardware scheduling to their arch.
     
  12. Astyanax

    Astyanax Ancient Guru

    Messages:
    12,046
    Likes Received:
    4,565
    GPU:
    GTX 1080ti
    Its not.

    Kepler removed the instruction scheduler (Data Hazard Block) that was present in fermi because it was complicated, power guzzling and provided no benefit to performance.

    Every other aspect of a hardware scheduler was retained and remains in kepler, maxwell, and later.

    Pascal obtained the capability to pre-empt instructions (in effect reducing context switch penalties), that is to say, an instruction received can trigger an interruption to a task at the thread (warp(hardware)) level and execute a thread that has higher priority. This is part of 2 or 3 changes that made async compute workable on Pascal vs Maxwell.

    Turing+ Obtained the capability to not have to interrupti anything by being able to coissue tasks to int and fp units removing a source of expensive context switching (but can still do so if necessary)

    Every incorrect bit of information you see about scheduling is rooted back to a misinterpretation of the kepler white paper by Ian Cuttress Ryan Smith in the GTX 680 review on Anandtec.
     
    Last edited: Oct 30, 2021
    Mapson, mirh, Archvile82 and 2 others like this.
  13. cucaulay malkin

    cucaulay malkin Ancient Guru

    Messages:
    3,042
    Likes Received:
    1,543
    GPU:
    RTX 3060 Ti
    my question is - what did they really find ?
    is it driver overhead on ampere ? or is it normal behavior,but amd handles it better ?
     
    BlindBison likes this.
  14. Erick

    Erick Member Guru

    Messages:
    104
    Likes Received:
    18
    GPU:
    RTX 2060 Super 8GB
    Most of the overhead issues will go away when DirectStorage is ready. That will require a motherboard with NVMe slots as it has the lanes available. I think some MSI motherboards has it working by hardware. At the same time, every PC game will need to get patched because they will process too slow.
     
  15. GREGIX

    GREGIX Master Guru

    Messages:
    768
    Likes Received:
    166
    GPU:
    6800XT Merc
    I think u pissing on wrong tree here.
    SSD or nvme is irrelevant in this.
     
    Smough and BlindBison like this.

  16. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,114
    Likes Received:
    466
    GPU:
    RTX 3080
    Later GPUs have significantly extended global scheduling capabilities in fact. Kepler itself has expanded its scheduling h/w in comparison to Fermi. Pascal was the only one where this aspect hasn't changed much AFAIR. Turing added h/w for MPS, Ampere added SysPipes and h/w virtualization per GPC.

    The amount of people saying things without any research applied to the subject is astounding.

    Still, the issue is certainly there and it's unknown what is the precise root cause of it for now. It certainly has nothing to do with any scheduling and from the data we have seems to be mostly limited to D3D12 hinting at this being an API issue for the most part.

    Hopefully someone will do a proper investigation into this without making loud click baity videos with false claims.
     
  17. BlindBison

    BlindBison Master Guru

    Messages:
    929
    Likes Received:
    221
    GPU:
    RTX 2080 Super
    I’m still just waiting for Hardware Unboxed to release more test data for other games and APIs. They said in their other video that they were working on it. I also haven’t really seen other bigger tech youtubers talk about it except Linus briefly in one of their vids. Here’s hoping we get more testing from HU and an official statement from Nvidia at some point to clarify this would be great.

    Would be awesome to see Digital Foundry look into it — Rich does such great CPU and GPU reviews imo.

    EDIT: Digital Foundry did actually touch on it briefly in one of their recent videos starting around the 12 minute mark:
     
    Last edited: Mar 22, 2021
  18. BlindBison

    BlindBison Master Guru

    Messages:
    929
    Likes Received:
    221
    GPU:
    RTX 2080 Super
    Right, it was my understanding that HU tested with the same stuff only switching GPU.
     
  19. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,900
    Likes Received:
    776
    GPU:
    Inno3D RTX 3090
    This is completely wrong. It's even public that Ampere has four instruction schedulers. Because NVIDIA doesn't use the same terms as AMD, it doesn't mean that the hardware doesn't have them. Ampere is the most "GCN-like" hardware NVIDIA has ever made, it practically has specialized hardware for everything.

    https://forums.developer.nvidia.com/t/instruction-scheduling-in-ampere/169072/7

    Basically, scheduling wise, all architectures have "hardware scheduling".

    I find it incredible that nobody has tested the Threaded Optimization setting, and they keep talking about hardware that is or isn't there. It's very clear that the "auto-threading" that NVIDIA is doing has given them a big performance boost in the past, but it's diminishing as more new titles know how to do threading properly. That means nothing for the actual hardware in the GPU.
     
  20. GREGIX

    GREGIX Master Guru

    Messages:
    768
    Likes Received:
    166
    GPU:
    6800XT Merc
    Problem is when u do not have actual hardware, and, as HU point out(I think in their second video about topic), ppl less fortunate than u or me, ie with crap(now) 2500k era CPUs "UPGRADING" GPU for, well, just to have more frames?, well, they have less frames instead.
     

Share This Page