Review: Ashes of Singularity: DX12 Benchmark II with Explicit Multi-GPU mode

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Feb 24, 2016.

  1. mohiuddin

    mohiuddin Maha Guru

    Messages:
    1,007
    Likes Received:
    206
    GPU:
    GTX670 4gb ll RX480 8gb
    I know right. I did some research for curiosity .
    And Thanks for backing me up with prooves. But, it's pitty that someone prefer to put on shiny green glasses even if these hinder/distort their vision alot. :p
    Though i have one hazy hell of an old green glass , i prefer not to. :p
     
  2. Ext3h

    Ext3h Guest

    Messages:
    13
    Likes Received:
    0
    GPU:
    Various.
    I can't tell for sure yet. Nvidia still claims that they want async shaders to get running in parallel eventually. If they manage to do that, problem solved.

    Every information related to what's hindering Nvidia from using the hardware is only speculation so far, we only know that they aren't doing it and what it is causing. So they still might or might not be able to get it working without CUDA.


    I'm not sure if that is related, but I'm currently also riddled why they are still preferring DirectFlip over DWM. DWM is already handled using a compute queue in hardware (according to GPUView it's running on the same compute engine as CUDA).

    The point is, DWMs inner function is not much different from what you need to provide the low latency, V-synced post processing required for sickness free VR.

    But if they are so reluctant about using it, there's most likely a catch of some kind.

    Against Pascal and Polaris, none of the current architectures can compete. Simply because the new manufacturing process alone can account for an +50% performance increase at the same power consumption.

    But yes, GCN used to be over-engineered. But that's not even limited to async compute with the ACEs, which GCN had from the first revision on, but is also true for e.g. the HSA related additions present since Tonga, which are also laying a foundation for future applications.

    And it's not much different with Maxwell II either, some optional DX12 features like conservative rasterization are still untapped as well, until they are wide spread enough to become a default without alternate fallback.

    It's only 2-5FPS now, but it used to be more back when AotS made more reckless use of it. However AotS isn't the only title using AC, and other titles dealing less respect to the differences in hardware need to be addressed properly.
     
    Last edited: Feb 27, 2016
  3. Carfax

    Carfax Ancient Guru

    Messages:
    3,956
    Likes Received:
    1,450
    GPU:
    Zotac 4090 Extreme
    Thanks for stating the truth Ext3h. A lot of people with an agenda (especially Mahigan) are taking what you said, and twisting it to suit their own purposes..

    Anyway, I have a question I'd like to ask. How does NVidia's hardware GMU and work distributor compare to AMD's ACEs irrespective of the APIs?

    It would seem that the GMU is still very effective given that the performance increase over Kepler is massive in PhysX workloads.. The benchmark I posted on the previous page shows this. GK110 is leading GK104 by a huge margin, and so GM200 and GM204 are leading GK110 by a huge margin as well, which definitely seems like it's because of Maxwell v2's capability of running graphics and compute tasks concurrently.

    [​IMG]
     
  4. zimzoid

    zimzoid Guest

    Messages:
    1,442
    Likes Received:
    25
    GPU:
    2xEVGA980TiSC+(H20) Swift
    Would be nice if we could talk about the actual game, threads getting boring
     

  5. GeniusPr0

    GeniusPr0 Maha Guru

    Messages:
    1,439
    Likes Received:
    108
    GPU:
    Surpim LiquidX 4090
    The topic is the revised benchmark of AoTS.
     
  6. Ext3h

    Ext3h Guest

    Messages:
    13
    Likes Received:
    0
    GPU:
    Various.
    If you disregard synchronization with the 3D queue: At least on par.

    The original ACEs (pre-Tonga) appear to be surprisingly simple, each of them merely fetches instructions from 8 different ring buffers which must be filled by the CPU, and monitors the global data share memory to handle synchronization. Possibly also write access to the global data share.

    The GMU sports at least a few features the ACEs (at least the old ones) don't have, such as dynamically dispatching new commands from inside an running shader, without roundtrip to the CPU.

    Both are capable of handling synchronization in hardware, keeping the latency barely noticeable. And since Maxwell II, both are supposed to be able to launch compute tasks in parallel, which appears to be working. It's not clear if pre-Maxwell hardware wasn't actually capable of that as well, at least to a limited extent (such as only being unable to launch concurrently, but ongoing shader programs might have already been executed in parallel before due to pipelining effects).

    There are still some limitations to Maxwell, regarding mixing 3D and compute workload on a single SMM unit. So it can't be used to increase the utilization of an SMM unit, only to bypass bottlenecks on shared components such as the memory subsystem, Z-buffer or rasterization. It's not entirely clear where that limitation springs from.

    The new ACEs (HWS) are micro programmable, it depends on the firmware what they can actually do, and I don't know what the limits are.

    About the work distributor - I can't tell for sure. I only know that both Maxwell and GCN have at least one each, and that both the graphics command processor, as well as that the GMU and ACEs need either to have one integrated, or to access a shared one. Neither vendor has published anything accurately describing this piece of the hardware, for AMD only the leaked schematics for the PS4 GPU are known, and for Nvidia nothing at all.

    Well, this, and that it also got more complicated with modern GCN revisions in order to support real preemption of running shaders. Again, no data on what the hardware looks like.
     
    Last edited: Feb 28, 2016
  7. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Your thesis that it is PhysX and Async execution of it on Maxwell which makes GTX 970 approximately 10% faster than GTX 780Ti can be easily confirmed by looking at older games with PhysX and new ones without it.

    But general pattern shows that in older games GTX 780Ti is equal (rarely) or better (up to 15%). And With new games PhysX or not GTX 970 has around 10% lead.

    I do not see your thesis as holding much of a ground. Same goes for comparison of GTX 780Ti vs r9-290x. 780Ti had always lead or was equal. How can it be that at 1440p r9-290x delivers 31% higher fps in game like Black Ops 3? And that's not only case out there.

    My thesis is that nVidia no longer cares about Kepler and its software based optimizations it was always benefiting from.
    It is clearly planned obsolescence.
     
    Last edited: Feb 28, 2016
  8. Chillin

    Chillin Ancient Guru

    Messages:
    6,814
    Likes Received:
    1
    GPU:
    -
    Again with that nonsense, show the hard apples-to-apples evidence:

    People keep talking about these magical driver increases (this goes for Nvidia as well), yet I don't see any empirical evidence of this anywhere. Here, this is the 290x with launch drivers (Oct, 2013 - Catalyst 13.11 Beta v5 ) on Crysis 3, and then on the late drivers 15.7 (July 2015):

    The 780 and the 290 Uber didn't move despite the year and half since their release. And this is the benchmark game.

    Where are these magical increases? Hexus (entire 2014 driver set) and other sites also did comprehensive driver performance roundups and found only minor increases (around 5% or less) for both Nvidia and AMD.
     
    Last edited: Feb 28, 2016
  9. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Please, at least write @NAME. If you are unable to quote person. I have hard time finding someone stating any driver level increases for 290x.
     
  10. Chillin

    Chillin Ancient Guru

    Messages:
    6,814
    Likes Received:
    1
    GPU:
    -
    WTF? Just look at the 780 numbers, where is the supposed crippling regress?
     

  11. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    I still have no idea what you are talking about.
     
  12. Denial

    Denial Ancient Guru

    Messages:
    14,206
    Likes Received:
    4,118
    GPU:
    EVGA RTX 3080
    He's not saying that they decreased in the same game, he's saying that it decreased in newer games vs older ones.

    Like lets say on average, the 780Ti had a 10% advantage over the 290x in all the games released in 2013. The 290x now has a 10% advantage over the 780Ti in all the newly released games in 2016 (I don't know what the actual % is).

    The problem is that this can be attributed to a number of things. Is it because Nvidia stopped optimizing for Kepler? Is it because generally, Kepler cards have fewer VRAM on them then AMD equivalents? Is it because newer games are just better optimized for AMD cards, due to the devs becoming more familiar because of consoles?

    The other problem is quantifying it across a broad number of games. PrimeMinisterGR in the AMD forum often uses TechPowerUp's relative performance summary graph to show how it's changed. Techpowerups graph has a few issues though. There was a single point in time, between two reviews, where the numbers of all the older cards shift dramatically. They added a number of different games over that period.

    https://www.techpowerup.com/reviews/MSI/GTX_980_Ti_Lightning/23.html
    https://www.techpowerup.com/reviews/ASUS/GTX_980_Ti_STRIX_Gaming/30.html

    Look specifically at the 770 and 280x. In the Strix review it's tied, in the Lighting review the 280x blows the 770 away. The games they tested changed though and you can't go to those specific games and see the results because they only list the newer cards. So you can't actually see where the 770 is suddenly doing significantly worse.

    But if we go to Anandtech's benchmark, we can see that a single game can make a really big difference.

    http://anandtech.com/bench/product/1494?vs=1495

    Take a look at Shadows of Mordor, it's like half a 7970, because the 2GB of VRAM. Or just do a comparison in general between QHD/FHD, AMD cards benefit in almost every case. So if all these newer games are using more VRAM in general at 1080p and the Nvidia 2GB cards with often slower memory performance and less ram have to constantly shuffle things in and out, I think it's going to overall lead to a performance loss of a few %, which is what we are seeing. And it may not only be that, it may be like I said, that in combination with developers just getting more comfortable with AMD's architecture, it may be AMD's architecture is slightly faster in areas where developers are making advances in engines, it maybe that AMD's drivers have recently extracted more performance out of games then Nvidia's recent Kepler drivers. And honestly yeah, it may be that Nvidia just stopped optimizing newer titles for Kepler.

    My thing is though, unless someone does a comprehensive review that answers all of those questions I don't think it's fair to say that Nvidia is intentionally downgrading performance. Is it possible? Sure. But I've yet to see any real evidence for it. The more I look the more I see it's mostly just VRAM differences and a few games that completely throw off the overall %.
     
    Last edited: Feb 28, 2016
  13. Stormyandcold

    Stormyandcold Ancient Guru

    Messages:
    5,872
    Likes Received:
    446
    GPU:
    RTX3080ti Founders
  14. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    It is not that hard to do. For each game which came after Maxwell 2.0, you need last driver where there were no optimization for given game, and then you need newest driver (last week or so). To see how much both architectures improved.

    Then you sort them by driver release date. If nVidia is leaving intentionally Kepler behind you will see SAW pattern in Kepler vs Maxwell comparison before and after each release.
    But if Maxwell gained from release ~20% boost due to general driver improvement for this architecture, you will see no SAW, it will be general improvement over time.
    (Similar to what AMD done to GCN 1.0 with ~12.11b improvements.)
     
  15. -Tj-

    -Tj- Ancient Guru

    Messages:
    18,097
    Likes Received:
    2,603
    GPU:
    3080TI iChill Black

    Newer games use more compute and 290x has more compute power when it comes to that. Imo this is the only reason why we sometimes see such deviations.
     

  16. Yxskaft

    Yxskaft Maha Guru

    Messages:
    1,495
    Likes Received:
    124
    GPU:
    GTX Titan Sli
    IMO this is where it gets so silly.
    For me, to say Nvidia is intentionally downgrading, gimping or crippling Kepler, would mean they add code to make it perform worse.

    I prefer to say Nvidia is neglecting Kepler in favor of Maxwell. And it also goes two ways, Maxwell drivers are prioritized, and Nvidia wants Gameworks to expose Maxwell's strengths, even at the cost of Kepler performance.


    I would be pissed if it turns out that Nvidia doesn't release performance drivers for Kepler anymore.
    And one thing I do think gives credibility to the claims of neglecting Kepler is the release of TW3, where forums roared due to Kepler's bad performance, and Nvidia suddenly promised to fix Kepler and did so in a couple of days.
     
  17. TBPH

    TBPH Guest

    Messages:
    78
    Likes Received:
    0
    GPU:
    MSI GTX 970 3.5+.5GB
    I actually misread it, but even then this is with my card overclocked to 1500MHz, and frankly 50 was being generous. When driving it's closer to 40-45.
     
  18. Alessio1989

    Alessio1989 Ancient Guru

    Messages:
    2,941
    Likes Received:
    1,239
    GPU:
    .
    DirectX 12 allows the driver to support dFlip (direct flip) or iFlip (independent flip) for boderless maximized windows. The first was introduced with Windows 8, the second with Windows 8.1. Fullscreen is now handled by iFlip immediate. DWM composes only now on windowed application (not maximized-borderless). At least this is what WDDM 2.0 specifics say.

    I am not aware how FCAT works, but assuming wrong composition method returns wrong results.
     
  19. Ext3h

    Ext3h Guest

    Messages:
    13
    Likes Received:
    0
    GPU:
    Various.
    That part can't be entirely accurate, as composition is also required to provide notifications and the Xbox overlay in Windows 10. Which both show up in maximized-borderless applications as well, and at that definitely no longer by injecting in the DirectX context, as they used to.
     
  20. Carfax

    Carfax Ancient Guru

    Messages:
    3,956
    Likes Received:
    1,450
    GPU:
    Zotac 4090 Extreme
    I'd love to see these benchmarks, but hardware accelerated PhysX games are fairly rare. The last Batman Arkham game before Arkham Knight was Arkham Origins, and that came out before Maxwell v2.

    Yes because newer games are more likely to make heavy use of compute shaders, which favors architectures with strong compute performance like the Hawaii, Fiji and Maxwell v2.

    Some games just have an innate preference for certain architectures. But there is a valid explanation for Maxwell v2's dominance in Batman Arkham Knight with PhysX enabled as I've already explained.

    Maxwell v2 can run graphics and compute concurrently via CUDA. Kepler on the other hand cannot.

    You can believe whatever you like, but there's no evidence for that belief. Kepler had lots of weaknesses that became more and more apparent as games began to rely more and more on compute shaders.
     

Share This Page