Is it fair to say that according to Oxide Maxwell does not do well in Oxide's Async shader implementation? And concluding just from that, without knowing internals, that Maxwell does not support Async shader in general - is a leap?
Again, this "async compute" is not an API feature - it's not an optional capability that can be exposed to the API programmer. This is a WDDM driver/DXGK feature which can improve performance in GPU-bound scenarios. Developers would just use compute shaders for lighting and global illumination, and in AMD implementation there are 2 to 8 ACE (asynchronous compute engine) blocks which are dedicated command processors that completely bypass the rasterization/setup engine for compute-only tasks. In theory this means additional compute performance without stalling the main graphics pipeline. Parallel execution is actually [post="5094415"]a built-in feature in the Direct3D 12 [/post] - it's called "synchronization and multi-engine". There are three sets of functions for copy, compute and rendering, and these tasks can be parallelized by runtime and driver when you have the right hardware. You just need to submit your compute shaders to the Direct3D runtime using the usual API calls, and on high-end AMD hardware with additional ACE blocks, you may use larger and more complex shaders and/or create additional command queues using multiple CPU threads. This will saturate the compute pipeline and you would still get fair performance gains comparing to the traditional rendering path. So when Oxide said they had to query hardware IDs for Nvidia cards then disable some features in the rendering path, it makes sense. When they talk about console developers getting 30% gains by using "async compute" - i.e. using compute shaders to accelerate lighting calculations in parallel to the main rendering stack - it makes sense as well. But when Oxide says that the 900-series (Maxwell-2) don't have the required hardware but the Nvidia driver still exposes "async compute" capability, I don't think they can really tell this for sure, because this feature would be exposed through DXGK (DirectX Graphics Kernel) driver capability bits, and these are driver-level interfaces which are only visible to the DXGI and the Direct3D runtime, but not the API programmer (and the MSDN hardware developer documentation for WDDM 2.0 and DXGI 1.4 does not exist yet). They are probably wrong on hardware support too, since Nvidia asserted to AnandTech that the 900-series have 32 scheduling blocks, of which 31 can be used for compute tasks. So if Nvidia really asked Oxide to disable the parallel rendering path in their in-game benchmark, that has to be some driver problem rather that missing hardware support. Nvidia driver probably doesn't expose the "async" capabilities yet, so the Direct3D runtime cannot parallelize the compute tasks, or the driver is not fully optimized yet... not really sure, but it would take me quite enormous efforts to investigate even if I had full access to the source code.
Thank you for that explanation Dmitry. I was hoping to find something useful in this thread. Most of it appears to be the typical AMD vs NVidia garbage....
Thanks DmitryKO for sharing that info. Some useful Charts about GCN: http://news.softpedia.com/news/AMD-Hawaii-GPU-Diagram-Leaked-Shows-Four-Shader-Engines-390754.shtml Maxwell 2 can do Compute in the 31+1 taks limit,after that become much harder(impossible) do in time all that Compute queue (bottleneck).All Compute is do it in serial. GCN can do Compute in 64 tasks limit,that is much better than Maxwel and much consistent.If is more than 64 tasks ACE(Async Compute Engines), which take tasks as independent scheduler,can do without a penalty time more tasks. All Compute is do it in parallel. Some short explanations: http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1710#post_24368195 https://www.reddit.com/r/pcgaming/comments/3j87qg/nvidias_maxwell_gpus_can_do_dx12_async_shading/ Great discussions on that threads. And I really hope a Real answer from Nvidia about this issue,not like this answer: "misunderstanding" between our teams (Gtx 970 memory issue).
Currently as it stands, it does appear AMD-GCN are better positioned in making in making the most out of DX12 capabilities/features, esp in asynch compute/shaders. Dont care much at this point in time, but if Pascal doesnt address the issue, I could be switching sides. Of course too early to tell from one alpha stage game and whether other DX12 titles will similarly have an over-heavy reliance on async shaders as Oxide did. I just hope the best DX12 features/capabilities are put to best use by GPU makers and NOT have one side pressure devs to exclude things that legitimately improve a games looks, performance.
And I really hope that new DX12 games dont disable Async Compute and not be "optimized" like some Gameworks titles,who dont run very well on some nvidia cards. Pascal will be great if we know something about him,inside architecture,but nothing yet.
A few problems with that tho. 1# DX12 is meant to streamline coding and make it more simplistic to help games come to market quicker and tap the power of the GPU without having to always talk to the driver and avoid the mess with DX11 which is pure driver optimisations. If their Async doesn't meet Microsoft's guide lines then it's hardly DX12 compliant! 2# Lower overhead, games have direct access to GPU resources without the CPU overhead when talking to the driver otherwise it's just multithreaded DX11 rather than a low level API to metal. 3# The developers say they have had alot more optimisations from nVidia than AMD in this game, although it's marketed as an AMD Game because it originally started out as DX11 and Mantle product with DX12 being added shortly after. Then again you can say every Unreal engine game out there is a nVidia marketed product and with it incorporating Gameworks directly into the newer engine which is also bias so i fail to see your argument there. Bias or not developers need to streamline their games so it runs across lots of different configurations you ignore these then your game really isn't going to sell very well, great way to destroy your company really. At Least with game works AMD can get round the performance issues by tweaking the tessellation in the drivers sadly if you have a nVidia product thats not Maxwell you are well screwed, just a paper weight at that point unless you disable the feature. Looking at this more i think the problem with nVidia is there are too many instructions for it to cope, i think RTS games will need a lot more compute power since each unit has multiple sources and when you add more and more units to the fold and remember there will be hundreds even thousands of units at once it will become very intensive far more than any FPS.
I don't really understand this or why this matters We have an RTS game with tons of compute, it's Ashes of Singularity. It has a benchmark that's literally designed to push massive compute utilizing ASync shaders. The Ti and the Fury X tie in performance. So how does Nvidia have a problem with too many instructions? Like ok, so maybe if we push even more instructions, like double that AoS does, maybe then AMD will show significant gains, but what game is doing that and why? RTS is like the most demanding for this type of workload and AoS is literally the pinnacle of it and it's a non-issue for Nvidia. So why do people keep pushing this like its going to have a major impact? On the flip side, lets just say Nvidia's tessellation performance sucked. Then all of a sudden a driver put out made it like 1203910293120x faster then AMD's. Then a game came out that that just had massive geometry smothered in tessellation, like literally hairworks for days, hair everywhere, x64 tessellation. And at the end of the benchmark for this game Ti - 41.7 FPS and the Fury X - 39.2. Would you care? Would any AMD owner care? Would the fate of their card suddenly be in jeopardy? Would their $650 purchase have gone to waste?
To me this should not be a problem because DX12 is still in its early stages it will take a while for any card to have full features of it.
Exactly, i don't think there are any games out anytime soon that use all the DX12 features. Denial, i didn't know the 980ti and FuryX were pretty much tied when it came to AOS, never saw the two compared in any of the reviews, just found them there and there really is very little difference. If that is with Async Compute enabled on the FuryX and disabled on the 980ti, that make me curious how much of difference it is actually making. Would like to see a comparison on AMD hardware with it on and off to see how much real world performance is gained.