Log in or Sign up

Asynchronous Compute

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by Carfax, Feb 25, 2016.

Page 2 of 57

EdKiefer Ancient Guru

Messages:

3,141

Likes Received:

400

GPU:

ASUS TUF 3060ti

Here my take on asynchronous compute , this feature helps with making sure the GPU is a high utilization (GPU% usage).
Now AMD seem to have a lot of efficiency problems with keeping usage up , this is most likely why there pushing it in DX12 .
If card is already at a high usage, then IMO your not going to see improvements with this feature and since Nvidia Dx11 does very well here along with multi-core support, you can't get water from a rock .

This is also probably why we saw AMD with high power usage compared to Nivia for same performance as there cards usage were probably lower .

Dx12 is double edged sword its all up to dev to code game engine for hardware, no mater company .

EdKiefer, Feb 25, 2016

#21
TheRyuu Guest

Messages:

105

Likes Received:

0

GPU:

EVGA GTX 1080

Darren Hodgson said: ↑

It seems like NVIDIA card owners are not going to see big benefits from using DX12 on the basis of what I've seen so far.
Click to expand...

I don't think the sample size is large enough to draw that conclusion yet but the current evidence does seem to point in that direction. If the trend continues in other titles then maybe.

At the same time I wouldn't discount Nvidia's DX11 performance either. They have a very efficient DX11 implementation and their general superiority in the software stack (not just the driver) can't be overstated.

TheRyuu, Feb 25, 2016

#22
nevcairiel Master Guru

Messages:

875

Likes Received:

369

GPU:

4090

Just look at the DX11 vs. DX12 comparison in the latest AT benchmark. It seems obvious to me why DX12 doesn't really benefit NV this much - their DX11 massively destroys AMD as it is. If you look at overall performance, I think the NV cards score in a place where they are supposed to be from their raw hardware power compared to AMD. Only that AMD needs DX12 to get there, and NV can compete with DX11 against that.

nevcairiel, Feb 25, 2016

#23
-Tj- Ancient Guru

Messages:

18,107

Likes Received:

2,611

GPU:

3080TI iChill Black

If async compute with support for high priority jobs and independent scheduling is a hard requirement, consider the use of CUDA for these jobs instead of the DX12 API.

With GK110 and later, CUDA bypasses the graphics command processor and is handled by a dedicated function unit in hardware which runs uncoupled from the regular compute or graphics engine. It even supports multiple asynchronous queues in hardware as you would expect.

Ask your personal Nvidia engineer for how to share GPU side buffers between DX12 and CUDA.
Click to expand...

This should explain a lot. I really doubt its going to be any different with Pascal. Its a new cuda feature after all, started with GK110.

It is HW all the way and apparently very powerfull too, the only SW part is driver telling HW when to utilize it.

Devs need to implement it too, kind of like cuda based water in just cause2 etc.

By Amd they need to implement it too although through dxapi by nv through cuda, nv will give devs instructions just like ms or amd does for its async in dx api.

Oxide glued to amd and nv had to "beg" them to utilize their approach, but back then it was still a bit so so in SW driver part, now it apparently is fixed, but oxide didnt use it yet. Or something in these lines.

I personally wouldnt jump to any conclusions based on one crappy test benchmark.

They're shaddy, seen how they crippled starswarm further on purpose when dx11 nv part ran circles around mantle.

Last edited: Feb 26, 2016

-Tj-, Feb 26, 2016

#24
otimus Member Guru

Messages:

171

Likes Received:

1

GPU:

GTX 1080

Darren Hodgson said: ↑

As a GTX 980 Ti owner who is (was?) very much-looking forward to playing DX12 games at higher framerates than DX11, I have to say that I am concerned at the results of benchmarks in both Fable Legends and Ashes of the Singularity, which currently show either AMD cards with a huge lead or negative performance increases over DX11. It seems like NVIDIA card owners are not going to see big benefits from using DX12 on the basis of what I've seen so far.

And surprisingly NVIDIA themselves do not seem to be in much of a hurry to support or showcase DX12, which is one of the reasons for it being delayed in ARK: Survival Evolved, I believe, from what I read. Makes me wonder if their hardware design is flawed, particularly as this asynchronous compute is supported only through software not hardware even on their newer Maxwell cards. Hmmmmmm...

I suspect DX12 may just turn out of be another overhyped API to join the underwhelming DX10 which launched with Windows Vista but I hope I am wrong. Hitman is supposed to be getting DX12 support as is Rise of the Tomb Raider so we will see from those exactly what benefits we will get from that... if any.
Click to expand...

DX12 seems great and all, really, it does, but the real problem is going to be if it DOES turn out being a real boon for AMD, and not much for the current Nvidia. People take this to mean good things for AMD, but really, it just means bad things for DX12 adoption. It'll just be Mantle all over again until Nvidia pushes out better hardware, which in turn will mean almost no one will seriously use DX12 on anything worth mentioning, outside of some AMD partners and Microsoft, until, like... prolly 2019. Wether AMD fans like it or not, Nvidia has a hell of a lot of marketshare, and expecting developers to just say "Whatever, screw everyone, let's make their games not run as good because PROGRESS! <3 AMD!" isn't very realistic at all.

I just really hope that's not the case at all. We desperately need things like Vulkan and DX12.

otimus, Feb 26, 2016

#25
SabotageX Active Member

Messages:

78

Likes Received:

16

GPU:

EVGA RTX 3090Ti

Probably only Pascal will support it. They will hold until Pascal's launch and then finally tell us that Maxwell doesn't support it and you have to upgrade.

SabotageX, Feb 26, 2016

#26
dgrigo Guest

Messages:

17

Likes Received:

0

GPU:

TitanX

look here http://www.pcgameshardware.de/Ashes...8/Specials/Benchmark-Test-DirectX-12-1187073/
No Async driver can save Nvidia, you can check AMD with no Async shaders still destroy them.
Sadly after my 290X I got a TitanX... feels so stupid now... not to mention the clocks that TI is running in the test...

dgrigo, Feb 26, 2016

#27
PrMinisterGR Ancient Guru

Messages:

8,132

Likes Received:

974

GPU:

Inno3D RTX 3090

Darren Hodgson said: ↑

I suspect DX12 may just turn out of be another overhyped API to join the underwhelming DX10 which launched with Windows Vista but I hope I am wrong. Hitman is supposed to be getting DX12 support as is Rise of the Tomb Raider so we will see from those exactly what benefits we will get from that... if any.
Click to expand...

There is no way it will be like DX10. All the major engines support it already, all the major developer houses too. We have the magic combination of a lower process node after 4 years almost (28 to 16/14nm), and the introduction of two mainstream low-level APIs. One of the reasons that DX12 was introduced was for easier porting and reduction of driver complexity that has reached ridiculous points with DX11. Even niche things like emulators have already adopted it. I'm playing Mario Kart Wii using DX12 since December. This is not going back. The only company who seems to be reluctant about it, and hold it back, is NVIDIA. My guess is for the reasons explained below.

Carfax said: ↑

Thing is, asynchronous compute isn't even a DX12 feature. There is no specific hardware implementation given by Microsoft that IHVs have to follow to be DX12 certified. Also, GPUs have been capable of doing asynchronous compute for years, it's only now that a Microsoft API is exposing that capability.
NVidia's CUDA allows simultaneous computation of graphics and compute workloads for instance, and has for a long time.
Click to expand...

NVIDIA has had problems with context switching since Kepler. They traded off the scheduling hardware for better control of the scheduling via the driver and better thermals. NVIDIA has had simultaneous computation since Fermi, the problem is when you have to switch from one context to another. GCN has zero performance penalty, while NVIDIA themselves say that context switching is a very costly operation.

Make no mistake, these choices have done really well for NVIDIA. They own their current market share to them. NVIDIA GPUs have been more efficient and therefore cooler and faster than their AMD counterparts, since more or less Kepler. The problem that NVIDIA faces now is that these cards were cooler for a reason. That reason being lacking specific hardware. Everyone is working in 28nm, there is no magic, and neither the AMD designers are idiots to produce hotter cards. It was a matter of choice. AMD chose to invest in a single architecture spanning consoles and all their GPUs, NVIDIA chose to go for increased CPU efficiency under DX11.
The charts here are most enlightening about the differences.

Extremetech said:

[...]Attempting to execute graphics and compute concurrently on the GTX 980 Ti causes dips and spikes in performance and little in the way of gains. Right now, there are only a few thread counts where Nvidia matches ideal performance (latency, in this case) and many cases where it doesn’t. Further investigation has indicated that Nvidia’s asynch pipeline appears to lean on the CPU for some of its initial steps, whereas AMD’s GCN handles the job in hardware.[...]
Click to expand...

Carfax said: ↑

Makes me really wonder as well. From what I can tell, the majority of the performance gain in DX12 for the Radeons isn't coming from Asynchronous compute, but from the DX12 command buffers which allows greater CPU parallelism and thus more GPU utilization.
But NVidia should have no problems with that, as seen in the Star Swarm DX12 benchmark that used the same engine, although a much earlier version.
The only way these benchmarks makes sense for NVidia, is if you believe that Maxwell V2 is already getting maximum utilization in DX11, and so DX12 has no impact.
For the Radeons, it's the opposite. They were vastly underutilized in DX11, and now with DX12, they have a new lease on life.
Click to expand...

This is exactly what I believe too. As I said above, there is a reason for the better thermals of Maxwell, that reason being that it literally has less hardware which is being 100% utilized under DX11. That means it shouldn't see any differences with the lower level APIs. As for AMD, it seems that their deep and parallel architecture has had problems being fed (I was the guy who made the overhead thread in the AMD subforum, I would know), and lower level APIs are finally able to feed the cards properly, thus the tremendous performance increases, since the cards have much more HW onboard than their NVIDIA counterparts.

PrMinisterGR, Feb 26, 2016

#28
Keesberenburg Master Guru

Messages:

886

Likes Received:

45

GPU:

EVGA GTX 980 TI sc

dgrigo said: ↑

look here http://www.pcgameshardware.de/Ashes...8/Specials/Benchmark-Test-DirectX-12-1187073/
No Async driver can save Nvidia, you can check AMD with no Async shaders still destroy them.
Sadly after my 290X I got a TitanX... feels so stupid now... not to mention the clocks that TI is running in the test...
Click to expand...

don't be disappointed AMD must miss some dx 12 features that run on Nvidia and intel only. JK3 getting that features tomb raider en much more games. :banana:

Keesberenburg, Feb 26, 2016

#29
Carfax Ancient Guru

Messages:

3,973

Likes Received:

1,462

GPU:

Zotac 4090 Extreme

-Tj- said: ↑

This should explain a lot. I really doubt its going to be any different with Pascal. Its a new cuda feature after all, started with GK110.

It is HW all the way and apparently very powerfull too, the only SW part is driver telling HW when to utilize it.

Devs need to implement it too, kind of like cuda based water in just cause2 etc.
Click to expand...

NVidia need to find a way to enable the use of the GMU on DX12. That's the only way this thing makes sense. It seems like a glaring design oversight if they can't get it to work properly with DX12.

Carfax, Feb 26, 2016

#30
RzrTrek Guest

Messages:

2,547

Likes Received:

741

GPU:

-

SabotageX said: ↑

Probably only Pascal will support it. They will hold until Pascal's launch and then finally tell us that Maxwell doesn't support it and you have to upgrade.
Click to expand...

Nvidia's shareholders agree...

Backward compatibility?

What's that?

:banana:

RzrTrek, Feb 26, 2016

#31
Carfax Ancient Guru

Messages:

3,973

Likes Received:

1,462

GPU:

Zotac 4090 Extreme

PrMinisterGR said: ↑

NVIDIA has had problems with context switching since Kepler. They traded off the scheduling hardware for better control of the scheduling via the driver and better thermals.
Click to expand...

Not really true. NVidia uses something called a GMU or Grid Management Unit which is hardware, and with a similar function as AMD's ACEs. The problem is, is that for some reason it's not compatible with DX12.

Hopefully this may change in the future though, because it's the only way that NVidia will ever have true concurrent graphics/compute workloads on Maxwell v2.

NVIDIA has had simultaneous computation since Fermi, the problem is when you have to switch from one context to another. GCN has zero performance penalty, while NVIDIA themselves say that context switching is a very costly operation.
Click to expand...

GCN has a performance penalty for context switching, it's just less than NVidia's.

The problem that NVIDIA faces now is that these cards were cooler for a reason. That reason being lacking specific hardware.
Click to expand...

As I said earlier, NVidia has a functioning hardware scheduling unit called the GMU which can do concurrent graphics and compute tasks, but the only problem is, is that it only works under CUDA at the moment.

The GMU works, which is why Maxwell v2 has much higher performance when using hardware accelerated PhysX (uses CUDA) than Kepler. A single GTX 980 is getting almost as many FPS as GTX 780 Ti SLi:

Everyone is working in 28nm, there is no magic, and neither the AMD designers are idiots to produce hotter cards. It was a matter of choice. AMD chose to invest in a single architecture spanning consoles and all their GPUs, NVIDIA chose to go for increased CPU efficiency under DX11.
The charts here are most enlightening about the differences.
Click to expand...

Like I said, that's only for DX12. The GMU has a limitation on it for whatever reason which makes it incompatible with DX12..

This is exactly what I believe too. As I said above, there is a reason for the better thermals of Maxwell, that reason being that it literally has less hardware which is being 100% utilized under DX11. That means it shouldn't see any differences with the lower level APIs. As for AMD, it seems that their deep and parallel architecture has had problems being fed (I was the guy who made the overhead thread in the AMD subforum, I would know), and lower level APIs are finally able to feed the cards properly, thus the tremendous performance increases, since the cards have much more HW onboard than their NVIDIA counterparts.
Click to expand...

After thinking on it some more, I no longer believe my initial claim. It seems that something is holding back Maxwell v2, but I seriously doubt it's because it's tapped out in DX11.

GPUs are practically never tapped out, especially in DX11. And while AMD does indeed have a much more parallel architecture than Maxwell v2, it also has lower clock speeds.

One of the reasons why Maxwell v2 is so fast, is because it has significantly higher clock speeds than AMD's Fury. Aftermarket GTX 980 Tis/980s/970s easily boost above 1400MHz at stock clocks and sustain it for instance..

Carfax, Feb 26, 2016

#32
AsiJu Ancient Guru

Messages:

8,958

Likes Received:

3,474

GPU:

KFA2 4070Ti EXG.v2

- nvm -

Last edited: Feb 26, 2016

AsiJu, Feb 26, 2016

#33
RzrTrek Guest

Messages:

2,547

Likes Received:

741

GPU:

-

Carfax said: ↑

Click to expand...

Was that image taken before or after the kepler fix?

RzrTrek, Feb 26, 2016

#34
Alessio1989 Ancient Guru

Messages:

2,959

Likes Received:

1,246

GPU:

.

By a development point of view, Fermi drivers are more awaited than "full" hardware multi-engine support on Maxwell 2.0. On pre-Maxwell 2.0 GPUs, no improvement at all are expected when running graphics and compute jobs in concurrency (just a little less driver overhead on Maxwell 1.0).
Anyway, remember that "async-compute" support can improve performance only when complementary hardware resources are accessed from graphics and compute jobs. If both type of jobs need to access to the same hardware resources, no performance improvements can be obtained.

Alessio1989, Feb 26, 2016

#35
Carfax Ancient Guru

Messages:

3,973

Likes Received:

1,462

GPU:

Zotac 4090 Extreme

RzrTrek said: ↑

Was that image taken before or after the kepler fix?
Click to expand...

Pretty sure it was after. That benchmark was taken at the beginning of this year. It was part of Gamegpu's 2015 re-benchmarks.. Here is the same benchmark but at 1440p. The GTX 980 Ti is almost doubling the Kepler Titan..

Carfax, Feb 26, 2016

#36
PrMinisterGR Ancient Guru

Messages:

8,132

Likes Received:

974

GPU:

Inno3D RTX 3090
Alessio1989 said: ↑

Anyway, remember that "async-compute" support can improve performance only when complementary hardware resources are accessed from graphics and compute jobs. If both type of jobs need to access to the same hardware resources, no performance improvements can be obtained.
Click to expand...

When you say "same hardware resources" and since we speak about compute, I assume you refer to shading units, right?

Carfax said: ↑

Not really true. NVidia uses something called a GMU or Grid Management Unit which is hardware, and with a similar function as AMD's ACEs. The problem is, is that for some reason it's not compatible with DX12.
Click to expand...

It is quite obvious that this unit handles compute queues. The problem is the switching between graphics and compute tasks. The image I put in my previous reply is NVIDIA themselves saying that they can only context switch in draw call boundaries, and even then there is a performance hit.

CUDA is a compute-only thing really. It is quite obvious that that unit can't handle switching between graphics and compute queues, but compute only. The ACEs can do both.

Carfax said: ↑

GCN has a performance penalty for context switching, it's just less than NVidia's...
Click to expand...

Take a look at this thread at Beyond3d. You can see that all NVIDIA hardware is very fast on compute-only or graphics-only tasks, but when switches between them are involved, the latencies go up 60% at least. If you read the thread a bit and see people reporting their results, you will see that there is zero performance penalty for GCN for context switching. To quote one of the posts with a 290:

Code:

Compute only: 1. 52.71ms Graphics only: 26.25ms (63.90G pixels/s) Graphics + compute: 1. 53.32ms (31.47G pixels/s)

Carfax said: ↑

As I said earlier, NVidia has a functioning hardware scheduling unit called the GMU which can do concurrent graphics and compute tasks, but the only problem is, is that it only works under CUDA at the moment.
Click to expand...

No, it cannot. It can only switch between compute tasks. That's why it's not enabled for DX12.

Carfax said: ↑

Like I said, that's only for DX12. The GMU has a limitation on it for whatever reason which makes it incompatible with DX12..
Click to expand...

That reason is that it's compute switching only, and for lightweight PhysX tasks. The top Tesla card for NVIDIA is a Kepler one, least you forget. And the main differentiating factor between Maxwell and Kepler is the go-away with even more hardware scheduling, giving space for more efficient graphics units in Maxwell. Let me quote Anandtech's Maxwell architecture review:

Anandtech said:

Starting with the Maxwell 1 SMM, NVIDIA has adjusted their streaming multiprocessor layout to achieve better efficiency. Whereas the Kepler SMX was for all practical purposes a large, flat design with 4 warp schedulers and 15 different execution blocks, the SMM has been heavily partitioned. Physically each SMM is still one contiguous unit, not really all that different from an SMX. But logically the execution blocks which each warp scheduler can access have been greatly curtailed.
The end result is that in an SMX the 4 warp schedulers would share most of their execution resources and work out which warp was on which execution resource for any given cycle. But on an SMM, the warp schedulers are removed from each other and given complete dominion over a far smaller collection of execution resources. No longer do warp schedulers have to share FP32 CUDA cores, special function units, or load/store units, as each of those is replicated across each partition. Only texture units and FP64 CUDA cores are shared.
Among the changes NVIDIA made to reduce power consumption, this is among the greatest. Shared resources, though extremely useful when you have the workloads to fill them, do have drawbacks. They’re wasting space and power if not fed, the crossbar to connect all of them is not particularly cheap on a power or area basis, and there is additional scheduling overhead from having to coordinate the actions of those warp schedulers. By forgoing the shared resources NVIDIA loses out on some of the performance benefits from the design, but what they gain in power and space efficiency more than makes up for it.
Click to expand...

There is no shared resources management, and the scheduling units have even less capability than Kepler (hence no Maxwell Tesla).

Carfax said: ↑

After thinking on it some more, I no longer believe my initial claim. It seems that something is holding back Maxwell v2, but I seriously doubt it's because it's tapped out in DX11.
GPUs are practically never tapped out, especially in DX11. And while AMD does indeed have a much more parallel architecture than Maxwell v2, it also has lower clock speeds.
Click to expand...

You don't seem to hold any faith to the people who designed this awesome hardware. All indicators show that Maxwell 2.0 literally works 100% under DX11, which is a miracle on its own. If it didn't, it would get performance increases with DX12. I believe that NVIDIA has a PR problem and nothing else.
PrMinisterGR, Feb 26, 2016

#37
dr_rus Ancient Guru

Messages:

3,938

Likes Received:

1,047

GPU:

RTX 4090

-Tj- said: ↑

This should explain a lot. I really doubt its going to be any different with Pascal. Its a new cuda feature after all, started with GK110.

It is HW all the way and apparently very powerfull too, the only SW part is driver telling HW when to utilize it.

Devs need to implement it too, kind of like cuda based water in just cause2 etc.

By Amd they need to implement it too although through dxapi by nv through cuda, nv will give devs instructions just like ms or amd does for its async in dx api.

Oxide glued to amd and nv had to "beg" them to utilize their approach, but back then it was still a bit so so in SW driver part, now it apparently is fixed, but oxide didnt use it yet. Or something in these lines.

I personally wouldnt jump to any conclusions based on one crappy test benchmark.

They're shaddy, seen how they crippled starswarm further on purpose when dx11 nv part ran circles around mantle.
Click to expand...

H/w is just some microcode commands. What it accessible from CUDA can be accessible from DX12 or whatever. The question is - will the HyperQ h/w be fit to handle DX12 requirements? This is an interesting question and I'm pretty sure that there are some problems with this - hence the long wait for concurrent compute implementation from NV.

In any case it is certain that GCN has a much more advanced implementation of the feature and due to architectural choices it will always get more performance out of it. But the whole situation is down side up at the moment thanks to AMD's rather aggressive PR on the topic.

We have two DX12 architectures: Maxwell and GCN 1.1+
- Maxwell is more advanced in rendering features supported (FL12_1) and is able to reach it's peak performance more often in any API.
- GCN is more advanced in how it handles tasks execution (ACEs) but it actually needs to run several compute tasks in parallel to a graphics one to reach it's peak performance and for that it requires the use of DX12 and Vulkan APIs.

That's the main difference between the two. If NV will just leave Maxwell as is in Pascal and just add HBM2 and GDDR5X support they'll need to spend more transistors on SIMDs to overtake GCN in DX12 and Vulkan. Considering that Fiji contains ~10% more transistors than GM200 - this shouldn't be a problem. A Maxwell chip with ~10% more general performance will be on Fiji's performance level even in DX12 AotS benchmark which isn't a very representative title anyway. Same can be said about GM204 vs Grenada chips.

So what we have here is an architecture which is good in new APIs only versus an architecture which is good everywhere. And there is a rather big possibility that the latter architecture won't be kept as is in Pascal and will be further improved. I don't know, I don't see any big wins for AMD here - they'll be on par at best with the same chips they have now.

PrMinisterGR said: ↑

There is no shared resources management, and the scheduling units have even less capability than Kepler (hence no Maxwell Tesla).
Click to expand...

Firstly, there is a Maxwell Tesla. Four of them, actually: M4, M40, M6 and M60.
Secondly, the reason why GK110/GK210 is still used for Tesla range is because the Maxwell range has very limited support for double precision operations which was done because otherwise it wouldn't be much different from Kepler in its gaming performance. The very same thing was done by AMD in Fiji as both were limited by 28nm production process. Compute task scheduling has nothing to do with it as all Maxwell chips support HyperQ as well as GK110.

Last edited: Feb 26, 2016

dr_rus, Feb 26, 2016

#38
PrMinisterGR Ancient Guru

Messages:

8,132

Likes Received:

974

GPU:

Inno3D RTX 3090

dr_rus said: ↑

So what we have here is an architecture which is good in new APIs only versus an architecture which is good everywhere. And there is a rather big possibility that the latter architecture won't be kept as is in Pascal and will be further improved. I don't know, I don't see any big wins for AMD here - they'll be on par at best with the same chips they have now.
Click to expand...

There is a point about people who are buying cards this year, cards that they may want to hold for 2-3 years. As things are starting to look, all of AMD's offering in all price ranges except the top one, look better. Would anyone really consider a 980 over a 390x? Or over a Fury (as both have 4GB VRAM?). Or a 960 over a 380x? Even below things are harder. The 270 demolishes the 750Ti, and the gap just increases with DX12. The only space that an NVIDIA card might make more sense is the ultra top, and the only reason for that is the 6vs4GB of VRAM.

PrMinisterGR, Feb 26, 2016

#39
-Tj- Ancient Guru

Messages:

18,107

Likes Received:

2,611

GPU:

3080TI iChill Black
PrMinisterGR said: ↑

When you say "same hardware resources" and since we speak about compute, I assume you refer to shading units, right?

It is quite obvious that this unit handles compute queues. The problem is the switching between graphics and compute tasks. The image I put in my previous reply is NVIDIA themselves saying that they can only context switch in draw call boundaries, and even then there is a performance hit.

CUDA is a compute-only thing really. It is quite obvious that that unit can't handle switching between graphics and compute queues, but compute only. The ACEs can do both.

Take a look at this thread at Beyond3d. You can see that all NVIDIA hardware is very fast on compute-only or graphics-only tasks, but when switches between them are involved, the latencies go up 60% at least. If you read the thread a bit and see people reporting their results, you will see that there is zero performance penalty for GCN for context switching. To quote one of the posts with a 290:

Code:

Compute only: 1. 52.71ms Graphics only: 26.25ms (63.90G pixels/s) Graphics + compute: 1. 53.32ms (31.47G pixels/s)

No, it cannot. It can only switch between compute tasks. That's why it's not enabled for DX12.

That reason is that it's compute switching only, and for lightweight PhysX tasks. The top Tesla card for NVIDIA is a Kepler one, least you forget. And the main differentiating factor between Maxwell and Kepler is the go-away with even more hardware scheduling, giving space for more efficient graphics units in Maxwell. Let me quote Anandtech's Maxwell architecture review:

There is no shared resources management, and the scheduling units have even less capability than Kepler (hence no Maxwell Tesla).

You don't seem to hold any faith to the people who designed this awesome hardware. All indicators show that Maxwell 2.0 literally works 100% under DX11, which is a miracle on its own. If it didn't, it would get performance increases with DX12. I believe that NVIDIA has a PR problem and nothing else.
Click to expand...

By maxwell they removed DP (double precision) to gave more room to SP. Not that thy cripled SP or compute in general further.

Also that pic you keep posting is only there are loooong commands, otherwise there are no overhead switching issues.
Last edited: Feb 26, 2016

-Tj-, Feb 26, 2016

#40

(You must log in or sign up to reply here.)

Page 2 of 57

Share This Page