High DX11 CPU overhead, very low performance.

Discussion in 'Videocards - AMD Radeon Drivers Section' started by PrMinisterGR, May 4, 2015.

  1. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    We all know what happened the last time that a thread like this (dis)appeared, so this one is not directly directed to anyone. It is a merely a place for me to link my findings regarding something that I believe is the greatest problem with AMD drivers; a problem that cannot be attributed to anyone but AMD (you can blame developers for bad multi-gpu optimization, or GameWorks libraries).

    It all started with the infamous now Direct X 11 Command Lists. Believe it or not, when DX11 was presented, it was the solution that would finally bring multi-threading to the graphics pipeline, and bring tremendous improvements on parallel workloads. To quote the explanation of the new API from Anandtech:
    If that reminds you of some recent new API announcements, it is because the wording is virtually the same.

    The way that DX11 was supposed to achieve this was by using Deferred Contexts in combination with Command Lists. NVIDIA suppors both, AMD does not support Command Lists, resulting in comical situations like this in the AMD developer Support forum, where a question like that remains unanswered for five years now:
    [​IMG]
    Surprisingly, Firaxis was one of the first developers to leverage that power from DX11. Apparently Civilization V supported all the multithreading features of DX11 resulting in CPU utilizations that looked like this:
    [​IMG]
    on twelve threads.
    Unfortunately, support for Deferred Contexts and Command Lists is optional, and AMD choose not to support Command Lists. Ryan Smith had contact with NVIDIA during the release of Civilization V, and he explains all of what they told him in this post. Apparently AMD had activated some kind of Command Lists support for the 7970/GCN architecture, but only for Civilization V specifically. Another quote from the Anandtech 7970 original review:
    You will ask, why does this matter so much? The answer is that it is a good indicator of the general state of multithreading/api overhead situation with AMD's DX11 drivers. And that state is bad.

    In this GDC presentation performed by people from NVIDIA, AMD and even Intel, you will see that it is the driver scheduler that is really responsible for almost everything regarding CPU optimization. So, you ask, how bad is it?

    The answer is really bad.

    Unfortunately Eurogamer (the Digital Foundry) is the only people I have seen doing any real work regarding GPU performance on slower CPUs, it would be really nice to see more work from sites like this one. The Digital Foundry's findings have not been disputed by either NVIDIA nor AMD, so I will use them as a general guide here, regarding to actual gameplay experience. The new 3D Mark API test will serve as a synthetic benchmark basis.

    This article (which is a recommendation for a console-killer) says it all. For people with lower end CPUs, they recommend the GTX 750 Ti over the R9 270x!
    I'll assume that people who read here have an idea about the relative hardware in each card, but for the sake of readability keep in mind that the R9 270x is almost double the card that the GTX 750 Ti (and it shows in the tests with better CPUs). The actual competitor of the GTX 750 Ti should have been the R9 260x, but apparently the performance is so atrocious with lower CPUs, that they cannot recommend it.
    From the article:
    Click below for some screenshots and percentages from the article:
    Far Cry 4.
    GTX 750 Ti Loss: 0%
    R9 270x Loss: -27.27%
    [​IMG]

    Far Cry 4 again.
    GTX 750 Ti Loss: 10.52%
    R9 270x Loss: -46.66%
    [​IMG]

    Now the same game, with the R7 260x.

    Far Cry 4.
    GTX 750 Ti Loss: -10.52%
    R7 260x Loss: -13.63%
    [​IMG]
    You will notice that the more low-end an AMD GPU is, the less is bottlenecked. Keep it in mind until we get to the R9 280 numbers.

    Far Cry 4 again.
    GTX 750 Ti Loss: 0%
    R7 260x Loss: -4.54%
    [​IMG]
    Same here, almost no bottleneck, but you will notice that the 750 Ti is almost steadily at zero.

    Ryse: Son of Rome
    Back to the R9 270x now.
    GTX 750 Ti Loss: -7.69%
    R9 270x Loss: -35.29%
    [​IMG]

    More Ryse: Son of Rome
    This time with the R7 260x. The frame rate loss is not great, notice the horrible frame spikes though.
    GTX 750 Ti Loss: -7.69%
    R7 260x Loss: -8.33%
    [​IMG]

    Call of Duty Advanced Warfare
    The R9 270x again.
    GTX 750 Ti Loss: 0%
    R9 270x Loss: -34.14%
    [​IMG]

    Call of Duty Advanced Warfare.
    R7 260x.
    GTX 750 Ti Loss: 0%
    R7 260x Loss: -20.58%
    [​IMG]

    Call of Duty Advanced Warfare moar.
    This time the lower end processor is an A10 5800K/Athlon X4 750k. The NVIDIA CPU stutters more with the AMD CPU, something that the AMD GPUs do not. The higher-end R9 270x still loses more than 50% of the high end cpu frame rate though, which is (de)impressive.
    GTX 750 Ti Loss: -16.66%
    R9 270x Loss: -52.38%
    [​IMG]

    Call of Duty Advanced Warfare final.
    Still with the lower end AMD processors, the R7 260x. Although it gets GPU-bound much faster, it still manages to lose an impressive 41.17% with the lower CPU.
    GTX 750 Ti Loss: -16.66%
    R7 260x Loss: -41.17%
    [​IMG]

    I will stop referring to the Digital Foundry now, after I quote them one last time from their GTA V PC performance article:
    They are basically telling us that the driver is SO BAD, that the GTX 750 Ti performs better with an i3, than the R9 280. If any of you have any concept about relative GPU performance, you understand about what kind of bottleneck we are talking about here.

    The CPU utilization performance gap was always there, but it has reached these tragic proportions since NVIDIA launched their 337.50 beta driver set, that promised (and apparently delivered this:
    [​IMG]

    There is no concrete proof about it, but it seems that NVIDIA is using Command Lists and other API optimizations in their driver, even if the game/app does not support them. This is pure speculation for now, but the results are there and they explain the vast gap in overhead performance under DX11.

    From the Anandtech test for GPU overhead, we can see that all the AMD cards in DX11 have a bottleneck that makes them stop on 1.1 million draw calls. They don't scale at all between them, and they don't scale at all regardless if the load is multithreaded or not which clearly indicates that the bottleneck is not in the hardware.

    [​IMG]

    Now it is the time where people say that "DX12 is the future, why should we care" etc. My answer to that is b u l l s h i t . 99,99% of the games in the PC catalogue are going to be DX9/DX11 for at least two years after DX12 appears. DX11 optimization matters, because the PC is the only gaming platform that offers backwards compatibility. If some idiot (because they would only be that) at AMD suddenly decides that DX11 is not important any more, then we're royaly screwed gentlemen.
    You might say that with that advertisment for the CPU/GPU optimization guy some months ago, something might happen. Well, I have news for you. The advertisment is still up which means that most likely they haven't hired anybody yet. Unfortunately, not being able to find someone to hire is almost expected, especially if you read this blog post from Valve's Rich Geldreich. Although he refers to the state of the OpenGL driver stack, he gives insights about situations in both companies. He says about "Vendor B" (which is AMD):
    where Vendor A = NVIDIA.

    And this leaves us to today. The main reason for this post is that I don't want everybody to forget the importance of a good DX11 driver stack, amidst the enthusiasm for DX12/Vulkan.

    AMD does have a leg up on the lower level API driver development (as they should), but the extra resources from that leg up should be used to add more features and make the DX11/DX11.3 driver leaner and faster, and close the driver gap with NVIDIA that is apparently widening instead of closing.

    Indications of that are situations like the whole VSR/frame limiter fiasco (does anyone still remember the frame limiter?), the Vsync/Antialiasing controls on CCC not working with almost anything, no driver-side Triple Buffering support, no frame pacing in single GPU configurations, no double/dynamic vsync, this could go on and on and on...

    There was a time that there wasn't any driver gap (around 2010), now it is back with a vengeance, and something should be done about it instead of burying our heads in the sand and crying about "teh DX12s".
     
    Strange Times likes this.
  2. sammarbella

    sammarbella Guest

    Messages:
    3,929
    Likes Received:
    178
    GPU:
    290X Lightning CFX (H2O)
    Thank you very much PR. :)

    There is very good info in your post.

    It points one more time the AMD Achilles's tendon: his software (drivers).

    As you said 2 more years (at least) of DX11 games.

    DX12 will not improve the things for AMD if they insist in have the best hardware and at the same time the worst software (drivers) to use his fantastic hardware.
     
  3. -Tj-

    -Tj- Ancient Guru

    Messages:
    18,095
    Likes Received:
    2,601
    GPU:
    3080TI iChill Black
    That's true, AMD needs better multicore driver, it is ok most of the time, but when it really needs to be then it fails and needs a stronger cpu per core perf. to do the same as nvidia with less.
    From what I saw they're working on it, should be better soon.

    Dx11 is multi core aware, but its still bottleneck by DX API calls and you need more drawcall index buffers for a little boost overhead.
    e.g. nv since 337.xx driver by Starswarm benchmark there was 50% boost over Mantle, but then they patched it again and now its only 25% or around the same as Mantle..

    btw AMD will run the same as nvidia in dx12 API, no more driver bottlenecks as you're describing it now.
     
    Last edited: May 4, 2015
  4. elpsychodiablo

    elpsychodiablo Master Guru

    Messages:
    349
    Likes Received:
    0
    GPU:
    Retina Z2 + Vlab Motion
    I really want to know how much Dev works on drivers for ATI / Nvidia and Intel.
     

  5. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    A lot of people still don't understand that DirectX 11 is not going anywhere.
    Along with DX 12, DX 11.3 is also presented:
    DX11 will be with us as long as indie developers are (at least). Ubisoft also hasn't spoken a word about DX12 yet.

    The person they try to hire, is obviously not hired yet. The job position is still up.

    Firaxis managed to balance twelve threads on DX11 as I showed in the post above, and all that with DX11 and proper use of it.
     
  6. freibooter

    freibooter Active Member

    Messages:
    58
    Likes Received:
    0
    GPU:
    PowerColor R9 280X 3GB OC
    Why? How many Indie developers are writing their own 3D engines these days?

    How many are writing DirectX 11 engines?

    Your argument that Indie developers won't use DirectX 12 capable engines is absurd and should be self evident by the overwhelming use of the Unity engines by the Indie developer community.

    I don't think you quite understood the second part of this quote:

    The development model envisioned for Direct3D 12 is that a limited number of code gurus will be the ones writing the engines and renderers that target the new API, while everyone else will build on top of these engines.

    That's already happening and has been for a long time.

    Engines will be more and more platform agnostic and there will be even fewer in-home engines, but that was a trend that has been happening since way before Vulkan or DirectX 12 were even a thing.
     
  7. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    Microsoft is certain enough that a lot of game developers will not use DX12, that it is the first time in their history that they will be supporting two different versions of DX simultaneously.
    Even if all the game developers switch to DX12 tomorrow, you still have the bad performance on every game released with DX11. I play on PC because of the back catalogue. This kills the back catalogue.
     
  8. -Tj-

    -Tj- Ancient Guru

    Messages:
    18,095
    Likes Received:
    2,601
    GPU:
    3080TI iChill Black
    But does it eliminate API bottleneck and adds more drawcalls? I doubt it since dx11 is limited to ~ 17k calls, openGL4 ~ 26K, dx12 up to 80-90K

    in dx11 driver scenario nv found a "trick" to add more buffers which helped them further 337.xx, before that by 310.xx proper dx11 threading.

    AMD has some multicore, but driver overhead is kinda limiting it atm
     
    Last edited: May 4, 2015
  9. freibooter

    freibooter Active Member

    Messages:
    58
    Likes Received:
    0
    GPU:
    PowerColor R9 280X 3GB OC
    Wait, what? First time in history?

    They have always done this, how on earth is this any different?

    That's why old DirectX 5 games still run on Windows 10. That's why such a huge amount of modern games still uses DirectX 9. Heck, you can even go back further: if it's at least 32bit and used Direct3D in a somewhat standard way, it will probably run.

    Microsoft has always been all about backwards compatibility, it's one of their strongest assets, it's not going away. Why would that change with the release of DirectX 12 and how on earth is that in any way special?

    Huh? :3eyes:
     
  10. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    You confuse simple driver support with active development/testing. It is the first time in history that Microsoft will have two versions of DirectX developed/supported actively in parallel. When you ask "What is the latest version of DirectX I should develop on", you will get two answers.

    What I meant is that the "DX12" arguement is completely invalid, regarding the state of the DX11 driver. No matter how many developers adopt DX12, we still need proper and good DX11 drivers for the whole back catalogue of games on PC. I don't even get why this is a point of contention.
     

  11. sammarbella

    sammarbella Guest

    Messages:
    3,929
    Likes Received:
    178
    GPU:
    290X Lightning CFX (H2O)
    Too much people in this forum really think DX12 will be some kind of "singularity" point that once reached will magically solves all future and past troubles of AMD drivers.

    Problems like DX11performance, crossfire support in Freesync and VSR, frame limiter,ect, will not dissapear without specific driver improvements ($$$).

    DX12 is fantastic right now...on paper.

    DX12 by "itself" will not write the AMD drivers code, AMD must do it and do it at LEAST with a similar (if not better) quality and performance than "Vendor A".
     
  12. xodius80

    xodius80 Master Guru

    Messages:
    790
    Likes Received:
    11
    GPU:
    GTX780Ti
    i think for this issue, windows 10 needs a compatible api wrapper, sort like DirectX Glide Wrapper that is used in old games that only handle a glide api.
     
  13. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    This will never happen. That's why Microsoft introduces DX11.3 along with DX12.

    The closest thing we have to a wrapper, is the NVIDIA driver scheduler (and probably the reason why NVIDIA's drivers destroy AMD's in any kind of CPU efficiency).
     
  14. Yecnot

    Yecnot Guest

    Messages:
    857
    Likes Received:
    0
    GPU:
    RTX 3080Ti
    Thats not important because few if any dx11 games will be ported to dx12, meaning these performance problems will still be here.

    Civ5 represents what could be done on dx11 in spite of inherent bottlenecks which AMD didn't support. Another example would be Far Cry 3 where AMD doesn't support dx11 multithreading which was literally a selling point of dx11.

    Tfw a year from now we'll be probably be able to reproduce those Digital Foundry benchmarks. I'd say you can compare going from AMD to Nvidia to Dx11 to Mantle/Dx12.

    My question to Dx12 though is it's capability to use 6 threads. Are those render threads (Dx9 only used one iirc) or just what we can expect games to use?
     
  15. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    It is no simple trick apparently, otherwise it would have been duplicated. The reason I post the historical stuff in the beginning of the thread is to show that AMD has had historically bad performance/feature adoption regarding DX11.
    My complete guess, as I state in the post, is that NVIDIA took the support they already had for Command Lists and managed to make their scheduler apply something like that to all DX11 titles.

    Exactly that. It would be like if people were saying that they drop DX9 support, when DX12 was out. We still have popular DX9 titles out today. And I still want to be able to play the two original Witchers with acceptable performance, thank you.

    If you notice, AMD tends to have more problems with games that have huge numbers of items/draw calls. Civ proved that you can do amazing things with DX11, and they proved again that they can do amazing things with Mantle, the problem is that in DX11 the AMD driver does not do a good job.

    What do you mean exactly? I didn't get it.
     
    Last edited: May 4, 2015

  16. Doom112

    Doom112 Guest

    Messages:
    204
    Likes Received:
    1
    GPU:
    MSI GTX 980 TF V
    Very True.
    I have R9 290X reference model and a GTX 780 and i was testing Dead Rising 3 of course R9 290X was faster but when i go out side or driver a car in that game than R9 290X goes under 30 fps on 1080p where as GTX 780 stays at 40fps to 50fps on 1080p.
     
  17. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    Dead Rising 3 is not the best of examples, as it runs badly on both camps. If you have actual numbers they would be very interesting to see, and I'll attach them on the original post. :)
     
  18. xacid0

    xacid0 Guest

    Messages:
    443
    Likes Received:
    3
    GPU:
    Zotac GTX980Ti AMP! Omega
    You better save that 1st post somewhere else just in case the thread go invi. :p
     
  19. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    You should read this post in gamedev.net.

    It contains some really interesting insights on the whole process.

    That quote sums up why AMD has awesome compute performance. The rest of the post sums up why the have horrible multithreading performance.
     
  20. maxstep

    maxstep Guest

    Messages:
    130
    Likes Received:
    0
    GPU:
    4090 + LG OLED 65
    Brilliant post PrMinisterGR. Thank you for spreading awareness of this issue! Save and/or re-post this information everywhere where it's pertinent everyone. Awareness and voting with wallets will drive AMD to improve.
     

Share This Page