Log in or Sign up

Asynchronous Compute

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by Carfax, Feb 25, 2016.

Page 16 of 57

aufkrawall2 Ancient Guru

Messages:

4,498

Likes Received:

1,874

GPU:

7800 XT Hellhound

dr_rus said: ↑

But you do. This is what you've stated two posts above: "So, Hitman profits by Async Compute as well if the hardware supports it, just like Ashes." This is exactly the type of speculation you're now "recommending" me to avoid.
Click to expand...

There are currently no fps gains to be expected in GPU limit, apart from Async Compute (if correctly supported by the GPU).
This is what developers say and there is no DX12 application that would run any faster than with DX11 in GPU limit on Nvidia. So I'm not speculating, i'm just describing empirical evidence.

dr_rus said: ↑

A bigger base in DX12 won't exist until 2018 probably. And by that time all modern videocards will be somewhat of a rarity and it won't matter if some of them will produce 0,2 fps more than another one which produce 2 fps. That's your future 10% difference from whatever: 2 vs 2,2 fps.
Click to expand...

:stewpid:

dr_rus said: ↑

But benchmarks do and BF4 is definitely more optimized for GCN than BF3 ever was.
Click to expand...

I don't know and I don't care. When Bf3 was released, AMD was still stuck with VLIW architecture. Doesn't say anything and Battlefront still runs much faster compared to Nvidia on AMD than BF4.

dr_rus said: ↑

That bolded part is a lie. NV doesn't need any software emulation to meet the specs of DX12, even for Kepler which doesn't support additional queues at all.
Click to expand...

It's not a lie. You are mixing up CUDA and graphics. Nvidia can do parallel queues for CUDA for a long time, but it can't do parallel queues for graphics + compute on the GPU with DX12. Thus the driver needs to emulate Async Compute on CPU, which explains the performance loss and stuttering seen in Ashes with DX12 AC on vs. off on Nvidia.
Be a bit more careful before telling someone to be liar when you easily mix up things.

dr_rus said: ↑

Any D3D12 developer should always start with a no-async-compute path which will actually be the reference path to compare to when implementing async compute path to make sure that it's not actually slowing down the h/w - AMD h/w including. So any D3D12 renderer will ALWAYS have a non-async-compute path. AotS is just an AMD sponsored technology demo, this pretty much explains everything what was happening around it.
Click to expand...

Are console ports to be known for their technical sanity?

dr_rus said: ↑

The only FUD being spread here is coming from you currently.
Click to expand...

I really think you are a Nvidia employee, as you sound exactly like their bull**** on Twitter, taunting their customers...

dr_rus said: ↑

NV doesn't need to improve "that part" of DX12, NV simply needs to launch a 10% faster cards in the next lineup refresh. Considering that most of late wins of GCN is happening in DX11 that's basically the only proper way to handle it.
Click to expand...

Async Compute is always an advantage, like HTT on CPUs when there are enough threads executed. Just listen to some GDC presentations and educate yourself.

aufkrawall2, Apr 6, 2016

#301
Alessio1989 Ancient Guru

Messages:

2,952

Likes Received:

1,244

GPU:

.

Let's do a very little technical immersion about D3D12 specifics: actually there two features that can be actually emulated via driver if not supported by hardware.

The first is implicit and is the range limit on root table SRV range: if the hardware is capable to access directly less then required SRVs descriptor tables the driver can use extra and implicit constants offset in the descriptor heap. This apply to both tier1 and tier2 of resource binding. I am not sure what GPUs need to this, probably none of the three major IHVs has a similar hardware limitation. The overhead added by this emulation is negligible, this is way it has been added. This sort of trick cannot be applied to the other restriction of the resource binding tears. Last time I checked MSDN still show the old limit (5 SRV descriptor tables), though I reported it different times.

The second emulated features is the ability to bypass the Geometry Shader stage to output viewports and render target arrays semantics to all shader stage the infamous (VPAndRTArrayIndexFromAnyShaderFeedingRasterizer performance flag). If the feature is not supported the driver will create a dummy geometry shader on the fly. AFIK older NV architectures do not support this features.+

Async Compute is always an advantage, like HTT on CPUs when there are enough threads executed. Just listen to some GDC presentations and educate yourself.
Click to expand...

It is not always. Running both compute and graphics tasks in parallel, even on different hardware resources, still requires hardware and driver synchronization. They both have a performance cost. Usually little, but it is not completely free. If the synchronization cost is greater than the benefit you will get a performance loss. The same apply to hyper-threading (though for different reason is even more rare to notice on most of games and consumer software). Note that all this does not justify some issues of current NV drivers.

Last edited: Apr 6, 2016

Alessio1989, Apr 6, 2016

#302
dr_rus Ancient Guru

Messages:

3,929

Likes Received:

1,044

GPU:

RTX 4090

aufkrawall2 said: ↑

There are currently no fps gains to be expected in GPU limit, apart from Async Compute (if correctly supported by the GPU).
Click to expand...

This is a lie as well. There are DX12 h/w features which may provide a performance gain when compared to DX11 FL11_0/1/2. GCN doesn't support them however as they are grouped in FL12_1 feature pack. The same features can be accessed in DX11.3 so they're not DX12 exclusive - which is a plus.

aufkrawall2 said: ↑

This is what developers say and there is no DX12 application that would run any faster than with DX11 in GPU limit on Nvidia. So I'm not speculating, i'm just describing empirical evidence.
Click to expand...

Yes, there is. It's those application which are using FL11_0 in DX11 and FL12_1 features in DX12.

aufkrawall2 said: ↑

:stewpid:
Click to expand...

Great reply! Always great when a person goes for personal insults when there's nothing factual to reply instead.

aufkrawall2 said: ↑

I don't know and I don't care. When Bf3 was released, AMD was still stuck with VLIW architecture. Doesn't say anything and Battlefront still runs much faster compared to Nvidia on AMD than BF4.
Click to expand...

It's pretty obvious that you don't know and don't care in general.

aufkrawall2 said: ↑

It's not a lie. You are mixing up CUDA and graphics. Nvidia can do parallel queues for CUDA for a long time, but it can't do parallel queues for graphics + compute on the GPU with DX12. Thus the driver needs to emulate Async Compute on CPU, which explains the performance loss and stuttering seen in Ashes with DX12 AC on vs. off on Nvidia.
Be a bit more careful before telling someone to be liar when you easily mix up things.
Click to expand...

I'm not mixing anything and that is a lie. Additional compute queues can be submitted to the h/w in a serial fashion and this isn't an emulation of anything and it is perfectly within DX12 spec.
Be a bit more knowledgeable next time you'll decide to go around telling people what they should call whom.

aufkrawall2 said: ↑

Are console ports to be known for their technical sanity?
Click to expand...

What's console ports have to do with it? A bad console port will perform bad on any h/w, GCN including, async compute or not. Cue GearsUE.

aufkrawall2 said: ↑

I really think you are a Nvidia employee, as you sound exactly like their bull**** on Twitter, taunting their customers...
Click to expand...

I really think that you're an AMD employee as you sound exactly like their bull**** on Twitter and in async compute PR posts, ****ting into the brains of their customers with lies, FUD and disinformation...

aufkrawall2 said: ↑

Async Compute is always an advantage, like HTT on CPUs when there are enough threads executed. Just listen to some GDC presentations and educate yourself.
Click to expand...

HTT isn't always an advantage as any person who knows what 2+2 is should be aware of. Async compute is an even smaller advantage because there are actually much less idle resources in a typical modern GPU than in a typical modern CPU due to the natively parallel nature of the former. There's also a lot of in-queue concurrency happening at any given time to make the whole secondary queues into what they really are - a hack helping AMD's poor graphics pipeline to achieve a higher GPU utilization. No other GPU on the market need it - Maxwell, Intel, PowerVR are all fine without it. Chances are that this is how it will always be - until AMD will fix their graphics pipeline makeing async compute pointless on their GPUs as well. Dixi.

dr_rus, Apr 7, 2016

#303
Alessio1989 Ancient Guru

Messages:

2,952

Likes Received:

1,244

GPU:

.

dr_rus, it is not only about increasing performance or how good/bad is the graphics pipeline of a hardware architecture. It is also (and mostly) about application rendering pipeline flexibility, avoiding potential hardware/driver stalls and actually current NV hardware/driver combo could be a lot better on that.

Alessio1989, Apr 7, 2016

#304
Yxskaft Maha Guru

Messages:

1,495

Likes Received:

124

GPU:

GTX Titan Sli

dr_rus said: ↑

NV doesn't need to improve "that part" of DX12, NV simply needs to launch a 10% faster cards in the next lineup refresh. Considering that most of late wins of GCN is happening in DX11 that's basically the only proper way to handle it.
Click to expand...

I can imagine that it's no small engineering task, but getting 5-10% extra performance in DX12 games is no small thing.
Nvidia needing to compensate by having 5-10% faster cards than AMD's equivalents would be no good for them.

Alessio, do you think Vulkan is any better off than DX12 for Nvidia's current architectures? I'm thinking of if Nvidia can create its specific Async compute extension to get the boost that it's currently not getting in DX12

Yxskaft, Apr 7, 2016

#305
-Tj- Ancient Guru

Messages:

18,103

Likes Received:

2,606

GPU:

3080TI iChill Black

Alessio1989 said: ↑

dr_rus, it is not only about increasing performance or how good/bad is the graphics pipeline of a hardware architecture. It is also (and mostly) about application rendering pipeline flexibility, avoiding potential hardware/driver stalls and actually current NV hardware/driver combo could be a lot better on that.
Click to expand...

I think it will be once directx releases SM 6.0 api update

-Tj-, Apr 7, 2016

#306
Alessio1989 Ancient Guru

Messages:

2,952

Likes Received:

1,244

GPU:

.

Yxskaft said: ↑

I can imagine that it's no small engineering task, but getting 5-10% extra performance in DX12 games is no small thing.
Nvidia needing to compensate by having 5-10% faster cards than AMD's equivalents would be no good for them.

Alessio, do you think Vulkan is any better off than DX12 for Nvidia's current architectures? I'm thinking of if Nvidia can create its specific Async compute extension to get the boost that it's currently not getting in DX12
Click to expand...

Too early to evaluate it, but Vulkan has the same OpenGL ability to become a proprietary extension clusterfùck.
D3D12 specification about multi-engine is not so strict about implementation, so it's all about hardware or driver issues.

-Tj- said: ↑

I think it will be once directx releases SM 6.0 api update
Click to expand...

SM 6.0 has nothing to do with multi-engine. But yes, more compute shader semantics will be exposed.

Last edited: Apr 7, 2016

Alessio1989, Apr 7, 2016

#307
dr_rus Ancient Guru

Messages:

3,929

Likes Received:

1,044

GPU:

RTX 4090

Alessio1989 said: ↑

dr_rus, it is not only about increasing performance or how good/bad is the graphics pipeline of a hardware architecture. It is also (and mostly) about application rendering pipeline flexibility, avoiding potential hardware/driver stalls and actually current NV hardware/driver combo could be a lot better on that.
Click to expand...

What is? I don't really get what you mean by "it" there.

Yxskaft said: ↑

I can imagine that it's no small engineering task, but getting 5-10% extra performance in DX12 games is no small thing.
Nvidia needing to compensate by having 5-10% faster cards than AMD's equivalents would be no good for them.
Click to expand...

Is would be great for all of us as them compensating this with a general +10% performance increase will mean that NV's cards will be 10% faster everywhere, not just in DX12+async renderers. And it is actually a small thing. 10% is 3 fps when the game is running at 30. A difference between 30 and 33 is hardly ground braking.

dr_rus, Apr 7, 2016

#308
Yxskaft Maha Guru

Messages:

1,495

Likes Received:

124

GPU:

GTX Titan Sli

dr_rus said: ↑

Is would be great for all of us as them compensating this with a general +10% performance increase will mean that NV's cards will be 10% faster everywhere, not just in DX12+async renderers. And it is actually a small thing. 10% is 3 fps when the game is running at 30. A difference between 30 and 33 is hardly ground braking.
Click to expand...

You seem to believe it's easy as pie for Nvidia to simply improve the general performance by 10% to compensate for DX12+Async vs AMD's equivalents.
If Nvidia hasn't made changes to Pascal to have effective support, this general 10% improvement won't magically appear out of nowhere to help Pascal out. I'm not claiming async is easy to implement either however.

If minimum FPS generally is 30 FPS, 3 FPS extra does alot to remove those 20ish FPS dips

6 FPS extra at 60 FPS will also help keep the framerate above 60

Same deal for the enthusiasts having 144 hz monitors

Last edited: Apr 7, 2016

Yxskaft, Apr 7, 2016

#309
Ieldra Banned

Messages:

3,490

Likes Received:

0

GPU:

GTX 980Ti G1 1500/8000

Yxskaft said: ↑

You seem to believe it's easy as pie for Nvidia to simply improve the general performance by 10% to compensate for DX12+Async vs AMD's equivalents.
If Nvidia hasn't made changes to Pascal to have effective support, this general 10% improvement won't magically appear out of nowhere to help Pascal out. I'm not claiming async is easy to implement either however.

If minimum FPS generally is 30 FPS, 3 FPS extra does alot to remove those 20ish FPS dips

6 FPS extra at 60 FPS will also help keep the framerate above 60

Same deal for the enthusiasts having 144 hz monitors
Click to expand...

I agree with what you're saying, the thing is

I'm gonna copy. Paste a post I made on another forum

First things first, there is only one game in which we can isolate the gain from asynchronous shaders from the gain from Dx12. Ashes.

In ashes of the singularity, a compute bound game, there is absolutely no advantage for Gcn w/Async vs maxwell at fp32 parity. This is not debatable, I have proven this.

Conversely, at fp32 parity, nvidia hw has a performance advantage vs amd hw with no async.

Doom is a vulkan title, and the developers have stated they have gotten significant gains out of async shaders for amd hw.

Despite these big gains they perform equal to their nvidia counterparts, which do not make use of asynchronous shaders.

Ergo async shaders provide no tangible benefits over NVIDIA hardware running without them.

The 10% boost is relative to amd hw running without async, not relative to nvidia hw.

Last edited: Apr 8, 2016

Ieldra, Apr 7, 2016

#310
Yxskaft Maha Guru

Messages:

1,495

Likes Received:

124

GPU:

GTX Titan Sli

The discussion was about Async possibly providing a 5-10% improvement boost for AMD hardware, and we'd ideally want that to happen for Nvidia hardware as well.

Yes, I'm aware that percentage is a ratio. dr_rus said that a 10% performance increase for a game running at 30 FPS is just 3 FPS and claims that's too small to matter. IMO he's doing wrong to scoff at that improvement.

My argument is that even 3 FPS can do alot to eliminate or minimize the drops below 30 FPS, and the same is if the game is running at 60 FPS or for the high-end gamers targeting 144 FPS. 5-10% improvement is nothing to scoff at in any case.

Last edited: Apr 7, 2016

Yxskaft, Apr 7, 2016

#311
dr_rus Ancient Guru

Messages:

3,929

Likes Received:

1,044

GPU:

RTX 4090

Yxskaft said: ↑

You seem to believe it's easy as pie for Nvidia to simply improve the general performance by 10% to compensate for DX12+Async vs AMD's equivalents.
Click to expand...

Well, yeah. Last time I checked Maxwell's energy efficiency was way more than 10% better than GCN's so it is actually easy - it's even already happened to be precise as there are almost zero stock clocked Maxwell cards on the market - most of them are factory OCed.
Granted this isn't really relevant for Pascal vs Polaris battle which is coming this year but I think that the changes in comparative power consumption won't be that big and the same argument will be applicable there as well.

Yxskaft said: ↑

If Nvidia hasn't made changes to Pascal to have effective support, this general 10% improvement won't magically appear out of nowhere to help Pascal out. I'm not claiming async is easy to implement either however.
Click to expand...

It can appear even on Maxwell, not "magically" but just from NV's general power advantage. Async has nothing to do with this really. NV will have to balance Pascal cards for modern workloads and that means that they'll have to make sure that Pascal cards of equal price aren't slower than Polaris/GCN cards on the same price points. As simple as that. The reason you're seeing NV cards loosing today lies in the fact that there was no GeForce lineup update since the launch of 980Ti a year ago.

Yxskaft said: ↑

If minimum FPS generally is 30 FPS, 3 FPS extra does alot to remove those 20ish FPS dips

6 FPS extra at 60 FPS will also help keep the framerate above 60

Same deal for the enthusiasts having 144 hz monitors
Click to expand...

In all cases 10% isn't much. The same can be gained by bumping down some quality setting which you may not even notice as a difference in a game.

dr_rus, Apr 7, 2016

#312
Alessio1989 Ancient Guru

Messages:

2,952

Likes Received:

1,244

GPU:

.

dr_rus said: ↑

What is? I don't really get what you mean by "it" there.
Click to expand...

A waste of time and const char* indeed...

Alessio1989, Apr 7, 2016

#313
dr_rus Ancient Guru

Messages:

3,929

Likes Received:

1,044

GPU:

RTX 4090

Alessio1989 said: ↑

A waste of time and const char* indeed...
Click to expand...

If that "it" was DX12 async compute queues then you're incorrect. It is mostly there to fill in the idle times of GCN's graphics pipeline. The rest of cases can be accomplished from the graphics queue or doesn't need to run in parallel to graphics at all - hence why NV's drivers+h/w combo is pretty good at this without any support for concurrent compute queue execution.

Feel free to not waste anymore time and whatever the **** "const char*" is here.

dr_rus, Apr 7, 2016

#314
Stormyandcold Ancient Guru

Messages:

5,872

Likes Received:

446

GPU:

RTX3080ti Founders

dr_rus said: ↑

It can appear even on Maxwell, not "magically" but just from NV's general power advantage. Async has nothing to do with this really. NV will have to balance Pascal cards for modern workloads and that means that they'll have to make sure that Pascal cards of equal price aren't slower than Polaris/GCN cards on the same price points. As simple as that. The reason you're seeing NV cards loosing today lies in the fact that there was no GeForce lineup update since the launch of 980Ti a year ago.
Click to expand...

I don't think that's entirely true. I've always thought we pay a slight premium for Nvidia products for better support. Nvidia still have a better "day one" experience of a game's release track-record than AMD.

Games like Gears of War, Hitman, The Division etc were better/smoother on Nvidia, on release. I don't see this changing and it's entirely reasonable for Nvidia to continue with this pricing strategy if their support also continues. In this regard; AMD has no answer, even when these games were made on GCN-based hardware.

Stormyandcold, Apr 7, 2016

#315
CrazyGenio Master Guru

Messages:

455

Likes Received:

39

GPU:

rtx 3090

async or not amd still rocking on dx12 titles

furyx https://www.youtube.com/watch?v=b3VPy5TxFCI

980ti https://www.youtube.com/watch?v=LJLz9oyAMfo

CrazyGenio, Apr 8, 2016

#316
fantaskarsef Ancient Guru

Messages:

15,750

Likes Received:

9,641

GPU:

4090@H2O

CrazyGenio said: ↑

async or not amd still rocking on dx12 titles

furyx https://www.youtube.com/watch?v=b3VPy5TxFCI

980ti https://www.youtube.com/watch?v=LJLz9oyAMfo
Click to expand...

Did I miss the fps somewhere or where they not noted? It's still so early...

fantaskarsef, Apr 8, 2016

#317
CrazyGenio Master Guru

Messages:

455

Likes Received:

39

GPU:

rtx 3090

fantaskarsef said: ↑

Did I miss the fps somewhere or where they not noted? It's still so early...
Click to expand...

early? 364.72 are game ready drivers for quantum break so i think there is nothing more to do.

the game is performing well in AMD hardware, and the game is from windows store, so it's going to be like that forever.

CrazyGenio, Apr 8, 2016

#318
fantaskarsef Ancient Guru

Messages:

15,750

Likes Received:

9,641

GPU:

4090@H2O

CrazyGenio said: ↑

early? 364.72 are game ready drivers for quantum break so i think there is nothing more to do.

the game is performing well in AMD hardware, and the game is from windows store, so it's going to be like that forever.
Click to expand...

Early in the day for me :wanker: didn't mean to comment on the drivers

Is that thing in the top left the fps counter? I'm not familiar with it tbh.

fantaskarsef, Apr 8, 2016

#319
XenthorX Ancient Guru

Messages:

5,057

Likes Received:

3,435

GPU:

MSI 4090 Suprim X

CrazyGenio said: ↑

async or not amd still rocking on dx12 titles

furyx https://www.youtube.com/watch?v=b3VPy5TxFCI

980ti https://www.youtube.com/watch?v=LJLz9oyAMfo
Click to expand...

It hurst. 980-Ti average 45 and FuryX average 59 frame per second. How the hell? :/

XenthorX, Apr 8, 2016

#320

(You must log in or sign up to reply here.)

Page 16 of 57

Share This Page