AMD: “There’s no such thing as “full support” for DX12 today”

Discussion in 'Frontpage news' started by (.)(.), Sep 1, 2015.

  1. Denial

    Denial Ancient Guru

    Messages:
    14,207
    Likes Received:
    4,121
    GPU:
    EVGA RTX 3080
    I'm familiar with Beyond3D. I'm just pointing out that there is a huge argument going on in that thread about how to interpret the results of that program. So posting the reddit thread here and saying anything definitive about the results is misleading.
     
  2. Barry J

    Barry J Ancient Guru

    Messages:
    2,803
    Likes Received:
    152
    GPU:
    RTX2080 TRIO Super
    I agree but it is interesting
     
  3. BedantP

    BedantP Guest

    Messages:
    220
    Likes Received:
    0
    GPU:
    1660Ti
    I bought a 960 because of DX 12. Damn, could've gotten a Refurb 290 or New 280X.
    No real difference now, is it?
    And now the AMD cards will get even more powerful because of better async support and low latency, consoles will have a boost too. #PCMR will end soon.
     
  4. Chillin

    Chillin Ancient Guru

    Messages:
    6,814
    Likes Received:
    1
    GPU:
    -
    Does any of this even matter?

    Seriously, you guys are at each others neck over HYPOTHETICAL DETAILS! FFS, even if Nvidia did support every godamn thing in DX12(_1) and AMD only supported the most basic things, it wouldn't really matter too much today. These architectures are literally a year old at this point (even older if you take into account they are revisions), and will be two years old by the time some real DX12 games come around; not to mention for how much longer they were in development before the final DX12 spec. If you bought a year old architecture that were both known to have varying levels of support right now to play hypothetical games a year from now with future features, then you are just throwing away your money.

    Pascal is what you should be screaming at if it doesn't support nearly everything, same goes with whatever AMD's next architecture is.
     

  5. Turanis

    Turanis Guest

    Messages:
    1,779
    Likes Received:
    489
    GPU:
    Gigabyte RX500
    Pascal is like a ghost,nobody knows nothing.And HBM2 is still on Amd&Hynix hands.
    Until will come on the market we still have old "gen" DX11.2/DX12_0 cards which are still good.
     
  6. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    That is because they have a lot of tests for compute running alone from 1 to 128 depth.
    Then there is just ONE value for graphical rendering time, so one should not over look it.
    And finally 1~128 test for both Graphics + compute.

    Proper display for those data are 3 graphs:
    1st: showing 1~128 for compute only ( lowest values & best case scenario if async shaders were magical)
    2nd: 1~128 theoretical result with taking Graphical rendering time and adding to it values from 1st graph (results in highest values and worst case scenario where execution is not done in parallel at all)
    3rd: showing 1~128 values for compute+graphical running at once (real world result comparable with best/worst case scenario)

    And I would add 4th graph showing percent difference for each 1~128 value where it would show how much it went from worst case to best case scenario.
     
  7. degazmatic

    degazmatic Guest

    Messages:
    1
    Likes Received:
    0
    GPU:
    HD 7xxx
    send them to me.
     
  8. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,924
    Likes Received:
    1,040
    GPU:
    RTX 4090
    AMD's in a full scale damage control mode it seems.

    While he's technically right that supporting a full stack of DX12 features is close to impossible (some of them are actually made for different h/w architectures so you would need to have a chip which will have several of them in itself to support them all) it doesn't mean that some GPU can't support less DX12 features than another GPU.

    So for example a Fermi-based GPU support DX12 runtime but doesn't support much features above the FL11_0 level and Maxwell 2 support the same runtime with features listed in FL12_1 level - does this mean that they're the same since they both support DX12 runtime?

    AMD's FUD is getting tiresome to read.

    This is completely wrong.
    A. FL12_1 is made mostly of features which should help with performance. Not supporting FL12_1 means that you'll loose more performance trying to render the same effects.
    B. Asynchronous shaders aren't needed for performance - it depends on the architecture in question. Your architecture may be able to run the code just fine without async shaders or with them being done in a less efficient way. It's still up for discussion how much performance will async shaders even bring to PC h/w - most of estimations are coming from consoles which are quite a bit different in both h/w and s/w they run from PCs.
     
    Last edited: Sep 1, 2015
  9. warezme

    warezme Master Guru

    Messages:
    237
    Likes Received:
    37
    GPU:
    Evga 970GTX Classified
    Lack of full DX12 calculated move

    If you install enough features to claim DX12 support you sell more of the "new" GPU. If you leave enough off, you can sell future full support DX12 GPU's. win/win. Both sides know this. surpise!...., not really.
     
  10. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    One way to display it properly:
    Blue bars show how big portion (in percent) of graphical rendering could be squeezed in between compute operations.
    One should note several things:
    - Compute Code is for some reason executed much faster on nV hardware (leaving less space to squeeze-in graphical processing); but on other hand it had space to squeeze Compute in between rendering (if space was left)
    - I used improvement scale from 0 to 100%, but in reality nV results range from negative to positive due to statistical error between runs
    - Real average improvement across entire execution 1~128 is:
    - > GTX960 : 1.96%
    - > GTX980Ti : 9.81%
    - > R9-390X : 92.51%
    - > Fury X : 73.45% (seems like some driver/OS/other HW problem)
    - nVidia's execution time for compute goes up with batch size (logical since it is more work)
    - AMD's execution time stays practically same over entire range, and is way too high (50~52ms) for workload which even GTX960 can process in 10 to 40ms
    - > seems like AMD has some static driver overhead for compute stuff and therefore there is so much more space for rendering
    - results for Fury X are not from mine since I have no access to files on beyond3d forums

    GTX 960
    [​IMG]
    GTX 980Ti
    [​IMG]
    GTX R9-390X
    [​IMG]
    Fury X
    [​IMG]

    Edit: To make it very easy to understand, Blue Background shows how many percent of rendering time has been absorbed by free time slots in between rendering tasks.
     
    Last edited: Sep 1, 2015

  11. BedantP

    BedantP Guest

    Messages:
    220
    Likes Received:
    0
    GPU:
    1660Ti
    lol, that Fury X graph doe.
    @Chillin, you are right and wrong. Why don't we get what we *paid* for? Here, I did not waste my money.
    NVIDIA's box art includes that big-****ing-font-size "DIRECTX 12", but does not have all the features as mentioned.
    It's how you feel if you get less potato chips in your packet.
     
  12. Dazz

    Dazz Maha Guru

    Messages:
    1,010
    Likes Received:
    131
    GPU:
    ASUS STRIX RTX 2080
    Although it's hard to say, no one can say for sure how the program really works except the creator. From what i can make of the thread NV's is done in serial and the CPU is feeding it data one after the other hence less delay between since there is less delay for the CPU to look and pull the data to feed the GPU as it's cached along the way one to the other, while AMD is waiting for data to be fed by the CPU hence the longer delay time as people are saying it's executing two commands (parallel) at a time. People there are also saying there should be no variance in delays if it's working correctly while with nVidia there is 10~40ms and AMD it's a static 50ms.

    https://www.reddit.com/r/pcmasterra...e_all_jump_to_conclusions_and_crucify/cumlmwv

    Also from the original thread there is an explanation of it.

    A lower percentage is better. If it's at or near 100% it means it's doing it pretty much serially, no benefit from asynchronously running them together.
    tl;dr: OP missed the point. Maxwell is good at compute, that wasn't the point. Maxwell just cannot benefit from doing compute + graphics asynchronously. GCN can.
    Extra point: all of the NVidia cards show a linear increase in time when you increase the number of compute kernels, stepping up every 32 kernels since Maxwell has 32 thread blocks. The 980Ti took 10ms~ for 1-31 kernels, 21ms~ for 32-63 kernels, 32ms~ for 64-95 kernels, and 44ms~ for 96-127 kernels.
    The Fury X took 49ms~ for all 1...128 kernel runs, didn't even budge. It looks like the 49ms is some kind of fixed system overhead and we haven't even seen it being strained by the compute calls at all yet.
     
    Last edited: Sep 1, 2015
  13. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Delays for compute are explained incorrectly. nVidia gradually rise 'delays' (properly named time to finish compute batch) because it is more complex and therefore takes longer to finish.
    While for AMD it is cake and it is finished almost instantly once dispatched. And that dispatch takes around:
    Fury X : 49.66ms to finish compute task
    R9-390x : 52.28ms to finish compute task
    Difference 2.62ms from performance jump between r9-390x and Fury X. Fury X is 31.25% stronger in compute so 2.62ms is this part of entire calculation.
    Entire calculation time for Fury X is 2.62ms / 0.3125 = 8.384ms
    And that makes this stable delay from giving order via software to get through API & driver into GPU 41.276ms.
    Therefore in this test where graphical rendering takes 25~27ms, there is no problem to process it in between compute tasks.
    Which as usually shows that AMD has room to improve... what a faux pas again.

    And btw. rendering time on 980Ti takes 17.88ms, and 960 takes 41.8ms. 17.88ms vs 25ms is again quite big difference.
    That means this 'benchmark' sides with AMD in showing Async shader functionality, but then sides with nV in total performance (AMD's usual overhead).

    Edit: In my graphs blue background shows how much rendering you could squeeze in between compute tasks, so higher percentage = better!!!
    Guy in link uses simplest way to calculate difference and so you are right that lower = better there, but in his way his scale can't ever reach 0%
     
    Last edited: Sep 1, 2015
  14. Dazz

    Dazz Maha Guru

    Messages:
    1,010
    Likes Received:
    131
    GPU:
    ASUS STRIX RTX 2080
    Which is right as the graphics part will always take longer than the compute so thats right it will never be 0%. But like the guy who created said it's not a benchmark but only to test the functionality.
     
  15. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Apparently I used scenario where compute absorbs rendering, because in this test it is made that way, but you can easily do reverse math and say that Rendering is absorbing compute in between draw calls.

    You can again get from 0% on nV to 100% on AMD if proper workload is used.
    Saying it in reverse, than Async shaders allowed r9-390x to absorb (in average) 48.75% of compute workload into rendering space.
    While gtx 980ti could only absorb 5.61% of compute task in between rendering calls.
    But again, since AMD took so long to finish anyway, there for sure was a lot of opportunities to schedule other kind of workload in between.

    Btw, has someone link for benchmark download? I see another think I would like to test myself.
     

  16. tsunami231

    tsunami231 Ancient Guru

    Messages:
    14,748
    Likes Received:
    1,868
    GPU:
    EVGA 1070Ti Black
    We already knew this, or i already know this i would think most people that keep up with would know there is no full support of DX12 on any card yet
     
  17. Enticles

    Enticles Guest

    Messages:
    242
    Likes Received:
    10
    GPU:
    Asus RTX 3070ti
    this made me lol.

    Both companies are equally guilty of misleading their customers, its just this time AMD appears to be more honest about it. the next scandal will have people hating on AMD for something or another (if i had to guess it would be AMD's performance claims getting debunked - again.)


    my point is, twisting the truth and in some cases outright bull****ing is rife in the technology marketplace. so lets all relax about what nvidia did or didnt do, because it'll be AMD's turn to mess up next! :)
     
  18. DmitryKo

    DmitryKo Master Guru

    Messages:
    446
    Likes Received:
    159
    GPU:
    ASRock RX 7800 XT
    You are messing things up. A few of each individual optional features, such as resource binding, tiled resources, conservative rasterization etc., have their own separate tiers with well-defined set of requirements on each tier.

    When you see "tier 1" and "tier 3" in the table, it's the tiers for these individual optional features - not the Direct3D 12 API in general.

    https://en.wikipedia.org/wiki/Feature_levels_in_Direct3D#matrix

    All DX12 cards support Resource Binding at tier 1 and Resource Heap at tier 1, feature level 12_0 additionally requires Resource Binding at tier 2, and feature level 12_1 requires Conservative Rasterization at tier 1.

    Yes, GCN supports Resource Binding tier 3, and so does Skylake integrated graphics.


    It's simply an earlier version of the table.

    No, there were actually major changes for Skylake (feature level 12_1, Resource Binding at tier 3, Tiled Resources at tier 3, Conservative Rasterization at tier 3, PS reference value), as well as a small change for Maxwell-1 (it now supports Typed UAV for additional formats with the latest drivers).
    https://en.wikipedia.org/wiki/Feature_levels_in_Direct3D#matrix

    I removed features from the table because they are not optional in Direct3D 12 anymore.

    For example,
    1) "UAVs at every stage" is supported for all DX12-capable hardware (this feature was tied with UAV slot count of 64 in Direct3D 11.2, but in Direct3D 12these "slots" are just memory pointers and you can use them at every pipeline stage);
    2) Maximum sample count of 16 for UAV-only rendering - supported by all DX12 hardware;
    2) cross-node sharing - exposed in multi-GPU configurations and current drivers don't seem to support it, so there is no way to test the tiers supported by actual hardware;
    3) "async shaders" is not an optional capability in Direct3D 12 - it's an internal hardware feature that can be exposed by the WDDM driver, [post="5094415"]as I explained in an earlier thread.[/post]

    I did not really remove the logical blend operations cap bit. Also I didn't bother add a few smaller cap bits from the MDSN docs.

    No, someone kept adding "async shaders" to this table when [post="5094415"]there is no such optional capability in Direct3D.[/post]
     
    Last edited: Sep 1, 2015
  19. Denial

    Denial Ancient Guru

    Messages:
    14,207
    Likes Received:
    4,121
    GPU:
    EVGA RTX 3080
    Since you obviously seem to know a bit about Direct3D and whatnot, could you shed some light on what Oxide is claiming about Nvidia's Async stuff or just AoS implementation of DX12 in general?
     
  20. DmitryKo

    DmitryKo Master Guru

    Messages:
    446
    Likes Received:
    159
    GPU:
    ASRock RX 7800 XT
    Anyone who is not a graphics driver developer has very few chances to get the right answer to that question.

    If I had free time to investigate, I would need to learn the internals DXGK first, which is quite lower-level stuff comparing to the main Direct3D API, and unfortunately MSDN documentation for WDDM 2.0 driver development is far from complete as of now.
     

Share This Page