High DX11 CPU overhead, very low performance.

Discussion in 'Videocards - AMD Radeon Drivers Section' started by PrMinisterGR, May 4, 2015.

  1. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,004
    Likes Received:
    137
    GPU:
    Sapphire 7970 Quadrobake
    Well, NUMA isn't a scheduler. NUMA is a type of SMP configuration. It literally means "Non Uniform Memory Access". Ryzen does seem to be close to a NUMA-type of system, but since the memory controller is shared, things are a bit muddy there.

    There are two main types of schedulers. CPU schedulers that arrange how your CPUs are "fed" problems to solve, and I/O schedulers that handle input and output in memory and devices of all sorts. The default Linux kernel CPU scheduler is CFS. You can see that if you run:
    Code:
    sudo dmesg | grep -i 'cfs'
    or even better, you can see all of the registered schedulers (CPU and I/O), by running:

    Code:
    sudo dmesg | grep -i 'sched'
    Con Kolivas has a nice comparison about how CPU and I/O schedulers can matter in desktop interactivity (not absolute performance), here. The aim of his patchset is to create a "desktop" system that doesn't go to its knees when heavy I/O is happening, something that has been the Achilles heel of Linux for the desktop since time immemorial. A lot of people also seem to be backing a total rewrite of the I/O subsystem of Linux, aiming to create something like fq_codel (which is a network packet scheduler, and the best -by far, in the business), for normal I/O.

    This is getting too technical, so let me cut to the chase.

    1) NUMA is a type of multi-processor system whose processors have different latencies with some parts of its memory. With Ryzen, all CCX's have the same access times to main memory and I/O, but slower access times to the L3 of the other CPU node. So I would say that this a NUMA system only at a part. The OS CPU scheduler should be smart enough to not switch tasks between CCXs, so that no costly and slow L3 to L3 transfers occur. Ryzen providing 8 whole threads for each CCX should make this trivial. This isn't as hard as it sounds, both Windows and Linux have had this capability for decades now, it just needs to be patched in.

    2) CPU governor. This is the program that determines how to clock the CPU depending on workload. The default in Linux is "ondemand", the one I would personally recommend is "performance" or "schedutil".

    3) CPU power/frequency driver. That's the driver of the CPU, exposing CPU states to the CPU governor and "taking orders" from the governor. Intel at this point recommends setting the governor to "performance" and letting it's pstate driver do the actual job. AMD has no current driver, just the default ACPI one.

    4) The I/O scheduler determines the method used to fetch data from storage devices to the CPU, and move it from it. Wikipedia has a very good diagram on how all this is connected. This one has a tough job as it has to "serve" data from devices that are orders of magnitude slower than the CPU (even RAM and very fast NVMe are much much slower), wait for the CPU to do its thing and then serve it back again.

    In conclusion, I don't really see Ryzen having any problem if the schedulers are patched. It's inclusion of on-cpu I/O is much more important imo, than the choice to go with a data fabric instead of a ringbus.
     
    Last edited: Mar 12, 2017
  2. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,004
    Likes Received:
    137
    GPU:
    Sapphire 7970 Quadrobake
    Well, it keeps improving that's for sure. With the 17.3.1 driver it has markedly improved for almost all the games I play. I haven't benchmarked though, so I can't talk to you with numbers. I believe that NVIDIA is still more efficient, but the gap isn't the Grand Canyon any longer, more like a little stream at this point.
     
  3. OnnA

    OnnA Ancient Guru

    Messages:
    9,736
    Likes Received:
    2,047
    GPU:
    Vega 64 XTX LiQuiD
    #PrMinisterGR

    Respect Bratan'

    :idea: When One have even a basic knowledge about PC computer, understanding it gets much easier....
     
  4. user1

    user1 Maha Guru

    Messages:
    1,426
    Likes Received:
    466
    GPU:
    hd 6870
    Yeah im not too sure that test can be used to get any accurate results between systems, My core 2 quad system with the 6870 gets 15.40 fps on win 7 at those settings they use, which i find quite ridiculous. Might be worth looking into the Intel compiler patcher for that program given how ancient it is(uses dlls from .net 1.1).
     

  5. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,004
    Likes Received:
    137
    GPU:
    Sapphire 7970 Quadrobake
    The 3DMark API test is quite accurate about this.
     
  6. Spartan

    Spartan Master Guru

    Messages:
    678
    Likes Received:
    2
    GPU:
    R9 290 PCS+
  7. MatrixNetrunner

    MatrixNetrunner Member Guru

    Messages:
    125
    Likes Received:
    0
    GPU:
    Powercolor PCS+ R9 270X
    The first test looks like an anomaly. Since the API call test works roughly like this:
    Code:
    int drawcalls = 0
    while (frametime < 0.0333333){ // Frame time for 30 fps in seconds
        drawcalls += DRAWCALL_STEP;
        frametime = draw_frame(drawcalls);
    }
    
    ELI5: increase the number of drawcalls until the FPS drops to 30 fps (it's more complicated than this).

    The most important number in this test should be drawcalls per frame, and in your case this number is 49 152. In the anomalous test you got 1 410 328 DX11 multithreaded, but only 46 080 drawcalls per frame.
     
  8. s0x

    s0x Member Guru

    Messages:
    136
    Likes Received:
    1
    GPU:
    MSI RX 480 Gaming X 8G
    Seems to be inline with the minimal fps it "allowed" it to be run at.
    30 vs 22/23 fps, in your test DX11 MT.

    I'm on a different driver, but 3 consecutive runs: http://www.3dmark.com/compare/aot/196535/aot/196536/aot/196537

    Don't mind missing DX12, somehow it crashes on me (3dmark) after running the DX12 tests, TimeSpy runs fine otherwise.

    @PrMinisterGR, yup, I did notice a change between AMD Drivers between 16.200 to 16.300 ( http://forums.guru3d.com/showthread.php?t=409380 ) and then from the 16.300 to the newer 21.Xs. But still, not a test to be compared between systems.

    www.3dmark.com/compare/aot/170547/aot/170541/aot/164483/aot/163545/aot/161058
     
    Last edited: Mar 13, 2017
  9. Only Intruder

    Only Intruder Maha Guru

    Messages:
    1,131
    Likes Received:
    150
    GPU:
    Sapphire Fury Nitro
    In case you're not aware, I posted this in the Fury X thread
    The API test crashing is a known fault which supposedly will be fixed in the upcoming update which should have been with us last week so possibly will be this week and it'll also include the Vulkan test.
     
  10. Spartan

    Spartan Master Guru

    Messages:
    678
    Likes Received:
    2
    GPU:
    R9 290 PCS+
    Hmm, interesting. It seems 3dmark is glitching out on me a lot, hopefully they will fix it at some point.

    http://www.3dmark.com/compare/aot/195558/aot/196125#
     

  11. s0x

    s0x Member Guru

    Messages:
    136
    Likes Received:
    1
    GPU:
    MSI RX 480 Gaming X 8G
    Hey @Only Intruder, yeah I wasn't aware of your post on the Fury X thread. Ty.
     
  12. moaka

    moaka Master Guru

    Messages:
    275
    Likes Received:
    6
    GPU:
    GTX 1080 ti MSI GX
  13. Spartan

    Spartan Master Guru

    Messages:
    678
    Likes Received:
    2
    GPU:
    R9 290 PCS+
  14. Redemption80

    Redemption80 Ancient Guru

    Messages:
    18,335
    Likes Received:
    185
    GPU:
    GALAX 970/ASUS 970
    Is this something that is showing up with games, as i thought the overhead test varied to much to be comparable.

    I doubt actual games have the overhead, as this would mean AMD isn't gaining as much going to DX12 and im sure we would have heard about that by now.
     
  15. Turanis

    Turanis Maha Guru

    Messages:
    1,438
    Likes Received:
    182
    GPU:
    Gigabyte RX500
    IF you run in Windows 7 and Windows 10 v.1511 (same as 8.1) this:

    winsat formal -restart clean (command prompt with admin),

    you can have 'Video Memory Throughput' result better in Windows 7 than Windows 10(8.1),like 8000+ MB/s difference.

    I dont know why this happend.
     

  16. -Tj-

    -Tj- Ancient Guru

    Messages:
    16,419
    Likes Received:
    1,499
    GPU:
    Zotac GTX980Ti OC
    I get this per frame

    DirectX 11 Multi-threaded draw calls per frame
    98 304
    DirectX 11 Single-threaded draw calls per frame
    82 944

    http://www.3dmark.com/aot/143816


    This was with one older driver @ 780, best single threaded, but now its somewhere in ~83K


    DirectX 11 Multi-threaded draw calls per frame
    98 304
    DirectX 11 Single-threaded draw calls per frame
    86 016


    http://www.3dmark.com/aot/47269
     
  17. Redemption80

    Redemption80 Ancient Guru

    Messages:
    18,335
    Likes Received:
    185
    GPU:
    GALAX 970/ASUS 970
    So is it an issue with the overhead test or the OS?

    Anyone willing to test some games, ideally ones that have DX12.
     
  18. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,004
    Likes Received:
    137
    GPU:
    Sapphire 7970 Quadrobake
    This is very weird. Windows 10 has consistently better draw call performance due to WDDM 2.0. Are you sure your configurations are completely "vanilla"? Try restarting as recommended below.

    It's an issue with the AMD driver, and the OS has an effect on that.
     
  19. Redemption80

    Redemption80 Ancient Guru

    Messages:
    18,335
    Likes Received:
    185
    GPU:
    GALAX 970/ASUS 970
    I didn't mean the whole thread, just meant the 3D Mark Overhead test issue above.

    I'm assuming like you mentioned that it's either a system issue or a 3D Mark one.
     
  20. Spartan

    Spartan Master Guru

    Messages:
    678
    Likes Received:
    2
    GPU:
    R9 290 PCS+
    Did that on win 10, took about 2 minutes, same results. Oh well, back to gaming.

    http://www.3dmark.com/compare/aot/197069/aot/197073/aot/197074/aot/197075/aot/197077
     

Share This Page