Asynchronous Compute

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by Carfax, Feb 25, 2016.

  1. CalinTM

    CalinTM Ancient Guru

    Messages:
    1,689
    Likes Received:
    18
    GPU:
    MSi GTX980 GAMING 1531mhz
    Yep so true. Im in the same boat. Bought my 980 just after a month since september 2014. Enjoyed the games, including witcher 3. I will not upgrade to polaris/pascal GP104. HBM2 is almost 100% confirmed that is NOT coming on amd or nvidia on summer release. It will come on the big boys, GP100 aka. 980 ti successor, and Fury 2 series (or what they are gonna call it), which will compete with GP100. So im not buying anything now. And by the end of 2016 year, my 980 will be like almost low end card, but i dont care, if im gonna spend my money, im gonna spend them, buying ALL the goodies, not just a better GPU, i want HBM too, if u get my idea, and i know hbm is kinda overkill for games, but u get my idea with money spend...
     
  2. theoneofgod

    theoneofgod Ancient Guru

    Messages:
    4,677
    Likes Received:
    287
    GPU:
    RX 580 8GB

    GDDR5X looks impressive but comparing the Fury X to the 980 Ti, HBM didn't do all that much. I guess time will tell.
     
    Last edited: Mar 15, 2016
  3. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,128
    Likes Received:
    971
    GPU:
    Inno3D RTX 3090
    I believe it was the 64 ROPs that are keeping the Furies back.
     
  4. narukun

    narukun Master Guru

    Messages:
    228
    Likes Received:
    24
    GPU:
    EVGA GTX 970 1561/7700
    Do you think nvidia will really implement Async in maxwell 2?.
     

  5. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,918
    Likes Received:
    1,035
    GPU:
    RTX 4090
    There's always something that is conveniently keeping AMD's GPUs back. But it's not a problem because if you wait for two or three years then your Fury X will be 3 fps faster than Titan Volta in 8K in DX15 technical demo.
     
  6. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,128
    Likes Received:
    971
    GPU:
    Inno3D RTX 3090
    I have no idea, to be honest. It doesn't seem to be an issue with them, because the shorted pipeline of Maxwell looks to be much less affected than the equivalent that GCN has. They will probably implement something that won't provide the expected performance increase.

    hue hue hue. Not conveniently at all. It seems that GCN in general is slower in graphics tasks than the equivalent generations of NVIDIA architectures. That's not really a secret. Every piece of hardware is designed with specific balances in mind. AMD went for compute, NVIDIA for graphics.

    I love the irony about future proofing. It is common sense that the architecture that has stayed more or less stable and it's found on all consoles will be much more future proof, as literally everyone in the industry except NVIDIA is working on it. Look at the tragedy of the 680/770. Don't tell me that it's because they are 2GB parts, because the 370 that seems to start to be the equivalent of the 680, is a 2GB part too.

    Again, all of these things make common sense, and again I would say that the best top range card at the moment is the 980Ti, the only reason being the 6GB of VRAM it has. For anything below you would be nuts to go NVIDIA.
     
  7. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,918
    Likes Received:
    1,035
    GPU:
    RTX 4090
    The obvious lack of AMD's presence in any serious GPU compute application kinda doesn't agree with this statement. Benchmarks on a wide number of GPU compute applications don't really show "AMD's going for compute" either.

    The difference is better described by stating that NV's chips have better balance for workloads which are active on the moment of their arrival while AMD's GPUs are more forward looking in general and as such they are overcomplicated for tasks which they mostly run during their lifetime. This has been the case since G70 vs R520 really and it still is true for the most part.

    They are 2GB parts and that's the main reason behind this "tragedy" of yours - which I as a previous owner of both 680 and 770 seems to be completely missing - must be a thing visible to Radeon owners only or something?

    And as for 370 - could you point me to why you think this card is in any better condition than 770 at the moment? And where exactly does it "starting being the equivalent of the 680" help it in achieving a playable framerate?

    [​IMG] [​IMG]
     
  8. Redemption80

    Redemption80 Guest

    Messages:
    18,491
    Likes Received:
    267
    GPU:
    GALAX 970/ASUS 970
    Wow, Hitman is even more of a mess than i thought.

    A horrible experience for all, but as long as it's a few % less horrible on AMD cards then that's ok???

    Async offers nothing but disappointment, but i definitely appreciate how AMD and their supporters have marketed it.
    It's a nice bit of business that we rarely see from AMD.
     
  9. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,128
    Likes Received:
    971
    GPU:
    Inno3D RTX 3090
    You're obviously mixing up the available software stack (OpenCL, Direct Compute, CUDA) with actual hardware capability. NVIDIA has a headstart in compute sales just because of CUDA.

    This is not the point I was making.

    What I say is that AMD cards are more geared towards compute rather than graphics tasks. After this is clear, let's go to the actual part where AMD is more compute heavy and NVIDIA more graphics oriented.

    First there is a very interesting article with summaries of presentations from GDC. One of its most interesting pictures is a slide from the conference. As the article and the presentation said:

    "When deciding whether to use a pixel shader or a compute shader, there are “extreme” difference in pros and cons on Nvidia and AMD cards (as shown by the table in the gallery)."


    Take a look:

    [​IMG]

    Notice the smilie face on the right side. GCN seems to have issues on the graphics side, but it seems to be perfect for compute. Meanwhile all NVIDIA architectures don't even have out-of-order thread completion. Who would have guessed. :infinity:

    The crowd-sourcing compute thread on Overclock.net, explains it a lot better.
    AMD paid for this with all the furnace jokes, but as I have said multiple times, it was a design choice to go for that.

    I agree completely with that statement. But because Moore's law has slowed down and because of the 20nm fiasco, AMD hardware from the "past" has arrived in the "future" you describe. That's the only way that my card can get 780/Titan performance which would be unthinkable at the time I got it.

    It is also my whole point in this thread and we seem to agree.

    AMD tends to overdesign, which is bad in the short term and usually good in the long term. The "long term" has arrived. Please tell me in good conscience that you would think I would have been better with a 680 (even a 4GB one, which was 200 euros more than my 7970), instead of the 7970.

    The problem isn't that they are both slow. The problem is that the GCN 1.0-equivalent Kepler cards are getting the same performance, or a tiny bit better as GCN 1.0 cards a whole tier lower. Before you say anything about 2GB, observe how GCN and Maxwell cards with 2GB are faster than Kepler cards that were destroying them just a year ago. Let me give you some examples, from the list of the latest game benchmarks from Guru3D. Everything I omit is because it might not have a GTX 680/770 in it, there was no cherry picking, just the Guru3D game benchmark list of the past months, in order.

    [​IMG]

    The 4GB 370 is faster than the GTX 770. Before you cry foul, notice the bunch of cards that are faster than the GTX 770 while being 2GB. GTX 950, GTX 960, R9 285, R9 380. All of these cards should have been slower than the GTX 770, but they are not and they have the same amount of VRAM. The reason for that is that NVIDIA designs have this forward thinking problem and now time has caught up. Next up:

    [​IMG]

    Same story here. Clockwork almost. GTX 950, GTX 960, R9 285, R9 380, equal or faster. All of them 2GB. All of them "inferior" cards to the GTX 770. Next up:

    [​IMG]

    Things are looking a bit better here. Still at 1080p the R9 285 and the R9 380 are faster. At 1440p it's much worse, the 370 is just 2fps behind, while the gap with the rest of the 2GB cards is widening to an almost 35%.

    [​IMG]

    The 370 is faster here, in all resolutions. The usual list of 2GB cards that are faster is present. GTX 950, GTX 960, R9 285, R9 380. All 2GB, all faster.

    Your point here is what? That Kepler is fine since the 280x is faster than the freakin' Titan? Is that your point? That the 770 is struggling against a card that is a whole performance class under it, and it's obviously not because of the memory since it's getting beaten down by a ton of 2GB cards that it was supposed to be faster from?

    The point I'm making is that Maxwell will get the way of the Kepler, because Maxwell is Kepler on steroids. Even less preemption, even less hardware scheduling, just more ROPs, and more specialized graphics units.

    Those power savings came from somewhere, there is no TDP magic land.

    In my opinion NVIDIA is currently on damage control mode about their older architectures, and Pascal will be a much closer to GCN design than Maxwell. That will be the point where Maxwell becomes the next Kepler. We have seen that during the current lifetime of the chip actually. The 390 is a better purchase than the 970 on raw performance terms, as is the 390x to the 980. Both of them being slightly overclocked Hawaii versions that was more or less ridiculed when Maxwell came out.

    A few AMD driver revisions and some console engine ports after, they became the go-to of any sane buyer. That's the same old Hawaii. Why does anyone believe that Maxwell will have any better future than Kepler or Fermi, I have no idea.
     
  10. -Tj-

    -Tj- Ancient Guru

    Messages:
    18,102
    Likes Received:
    2,606
    GPU:
    3080TI iChill Black
    Yes you have no idea, since you apparently already know Pascal will be more like GCN, while in fact back in past GCN "copied" Fermi Scalar approach with extra tweaks on compute side..


    Pascal will continue with CUDA compute route, its their thing, don't expect things to change in that regard.
    Its been like so since early G80 scalar chips and improved further. The biggest shift happened since Fermi GF110 +
    So that "expert" at Overclock says it doesn't have extra L caches per block, while its been here since Fermi and you misreading that chart as no out of order thread parts..?
    Here, again I really suggest you to read through whitepapers, so you know what to quote and talk about @ NV side.
    http://www.nvidia.com/content/pdf/f...dia_fermi_compute_architecture_whitepaper.pdf

    https://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf

    http://www.nvidia.com/content/PDF/kepler/NV_DS_Tesla_KCompute_Arch_May_2012_LR.pdf

    https://devblogs.nvidia.com/paralle...ould-know-about-new-maxwell-gpu-architecture/

    https://www.microway.com/download/whitepaper/NVIDIA_Maxwell_GM204_Architecture_Whitepaper.pdf


    And yet you assume Maxwell will end up the same like Kepler, sure, care to share that crystal ball? Or you just make conclusions based on those Gxx04 chips.. :nerd:

    Make conclusions when it starts happening, then I will give you heads up and say hey you were right back then, but until then you're nothing but "smart words wise guy" talking random facts :d :)



    btw 680/770GTX >970/980GTX is just a midrange "performance" side chip, its not relevant how this midrange stacks up against AMD offerings. Yes they were expensive and "competitive" back then, but its was NV milking it in response to initial AMD poor offering (utilization).

    If you must compare then compare GF110 > GK110 > GM200 vs GCN chips..
     
    Last edited: Mar 16, 2016

  11. Stormyandcold

    Stormyandcold Ancient Guru

    Messages:
    5,872
    Likes Received:
    446
    GPU:
    RTX3080ti Founders
    It seems obvious to me that the easiest way to resolve this is to add cuda AC method to the dx12 spec.
     
  12. hexaae

    hexaae Member

    Messages:
    29
    Likes Received:
    0
    GPU:
    nVidia GTX980M 4GB G-Sync
    Yeah CUDA everywhere! CUDA CUDA CUDA! CUDA-POWER!!!
    Pathetic...
     
  13. Terepin

    Terepin Guest

    Messages:
    873
    Likes Received:
    129
    GPU:
    ASUS RTX 4070 Ti
    Yes, let's add vendor-specific feature to standardized API. What a terrific idea! :stewpid:
     
  14. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,918
    Likes Received:
    1,035
    GPU:
    RTX 4090
    I'm not mixing up anything. The h/w doesn't exist without the s/w. And NV's cards are doing just fine in compute be it CL, CUDA or DXCS even though it's pretty obvious that CUDA is their focus of efforts.

    It's not. Graphics and compute aren't mutually exclusive, they run on the same h/w for the most part. The only compute related thing which isn't really used in graphics is the FP64 math. AMD's h/w if more geared towards more complex computations with less fixed h/w usage, that's a balance of units in the h/w, it's not "more geared towards compute rather than graphics", this is incorrect. Depending on what balance of processing the s/w expects from the h/w the h/w may or may not be optimal for this s/w. AMD's h/w is traditionally balanced in favor of s/w which doesn't exist on the market when this h/w is launching. NV's h/w is balanced towards the s/w which does exist. That's the difference and that's part of the reason why AMD's h/w "age" better.

    AMD's general inability to provide a sound developer support or even build a good s/w for their own h/w isn't the same as a confirmation of a compute orientation of their h/w.

    The difference between NV and AMD in that slide isn't that big btw. It boils down to what I've already said: NV's h/w expect less complex compute in a graphics code because this is how games code is right now, today, when we have 3 (three) DX12 games in total.

    And that design choice has brought them to their glorious 20% of the marketshare.

    Oh, we agree on that but where we disagree is in the benefit of such overdesign. I tend think that NV is right in its approach to h/w design as they provide the best possible perf/watt/$ for the time when the h/w is active and then they update the h/w heavily to reflect the changes of the s/w landscape and to provide the best possible perf/watt/$ again. Thanks to that they are constantly beating AMD in their actual product lineup either in performance or in power consumption while AMD is waiting for the future which will come.

    The whole 680 vs 7970 argument doesn't jive with me at all as I don't sit on a mid range card (680) for more than a year or two. 680 was launched in spring of 2012, that's FOUR YEARS ago. It's an eternity from the graphics card market point of view. You can't expect a mid range card to be sufficient after four years on the market.

    And guess what - 7970 is not sufficient today either. There are games where it is meaningfully ahead of 680/770 (thanks to its +1GB of VRAM mostly and not the AMD's arch) but in general a 7970 even when it's ahead of 680/770 doesn't provide a playable framerate either these days. So this "win" which happened four years after the card was launched and two years after the competition launched a new architecture doesn't really help the card at all - especially as you can't really buy either of those cards these days.

    There is no problem. NV doesn't sell Kepler anymore, NV sells Maxwell at the moment and will sell Pascal soon. This is how graphics works - a newer more efficient architecture supplant the older less efficient one. Whatever Keplers of the higher end variety we have still running (770, 780, 780Ti) are mostly fine in newer games with several exceptions the absolute number of which is rather limited (around five titles for the last two years). This "problem" and this "tragedy" exist only in the head of AMD fans who like to compare a 7870 showing 10fps to a 770 showing 9fps in the same benchmark and going ballistic over this completely meaningless fact.

    Oh yeah, that's the newest evidence from the AMD fans which is somehow retroactively being applied to all games out there including the hundreds which were released during the last couple of years.

    VRAM limitation may result in a more powerful and efficient card showing better results. It's always been like this. You can't just say that 2GB isn't an issue for Kepler because of 2GB GCN and Maxwell cards results - these are different architectures and their VRAM usage is different as well in more ways than you can imagine.

    Take a look at 780Ti vs 970 - what do you see there?

    The Division is one of these games which doesn't run efficiently on Kepler h/w. This is most likely due to the fact that the game is using DX11+ which Kepler has no support for and due to the game using more than 2GB of VRAM even in 1080p. There are no other reasons in the physical world why a weaker in pretty much every metric 370 and 950 would be faster than 770 and 780Ti would be so much slower than 970.

    The Division is just an odd case of an engine which isn't good for Keplers in general, using it as a general confirmation of Kepler weakness is the stupidest thing you can do at the moment.

    All of these severely different in their performance levels cards are withing 5 fps from each other which is perfectly in line for a memory limited scenario hitting all 2GB cards in this case. This if anything actually proves that 2GB of VRAM is the reason for such performance of 770 here.


    SWBF's FB3 edition is known to favor AMD's h/w. 290 being above 970 should kinda illustrate this to you. When you take this into account, there's nothing really wrong with Kepler performance in this game. 780Ti is on 970 level, 770 is just a bit slower than 280X - exactly like it was during the launches of these cards.

    BO3 is another game which clearly doesn't run well on Kepler GPUs. Look at where 780Ti is compared to 970. But that's the second such game among what you've posted, for the duration of half a year during which there were more than several dozens of new games released. This really proves the opposite of what you're saying - in the vast majority of modern games Kepler is fine. Most games where Kepler takes a hit aren't running well on Maxwell either - and this is a result of console GCN optimizations kicking in and PC NV h/w optimizations being omitted because of budgets most probably.

    For someone who doesn't "cherry pick" you also seems to have comfortably avoided several other fresh Guru3D benchmarks showing Kepler in a much better light, like these ones:

    [​IMG]

    [​IMG]

    [​IMG]

    My point is that there is nothing strange going on with Kepler - a newer architecture came, Kepler cards are EOL and since all consoles are using AMD h/w "the default optimizations" doesn't really help NV's h/w in general.

    And 770 isn't really worse than 7970 or 280X as all of them are below a comfortable performance levels in 1080p at the moment - and it really doesn't matter that in these uncomfortable performance levels 280X is 50% faster because of whatever - the games are still unplayable on it in the exact same way as on 680/770.

    Will? A lot of games are favoring GCN lately for both Kepler and Maxwell. And this has nothing to do with Maxwell being anything on steroids and everything to do with GCN being the default optimization target for any multiplatform release out there. As I've said GCN has a different balance of units compared to NV's h/w and since GCN has got the majority of console market that balance suddenly turned from AMD's fantasy into reality. This is the biggest win AMD got from consoles, make no mistake about that.

    But this win came when NV is ready to make another architecture jump, and you can be sure that Pascal will be rebalanced to take the new situation into account as this what NV's do - they always release the h/w for the workloads which are prevalent on the market at the moment. And thus this window of opportunity for AMD to get anything from GCN beating both Kepler in Maxwell in newer titles are quickly closing.

    There is better designed h/w however, and Maxwell is a better designed h/w than GCN when it comes to power efficiency. It's rather unlikely that any of this efficient power design is messing with performance in the newer titles compared to GCN btw.

    Well, with Kepler being mostly fine as we've seen above I don't see anything tragic in Maxwell becoming the next Kepler, bolded or not. It is expected from the next architecture to be better suited for the newer titles.

    390 is faster than 970 because it is an overclocked Hawaii, right. It's rather hard to find a reference clocked 970 however, and 390's biggest advantage is its 8GB of VRAM really.
    And 390X isn't really better than even a reference clocked 980 and it's hard to find one as well.
    What's so strange and revolutionary about AMD's reaction cards being on par or better than the cards they were reacting to? That's pretty much what I'd expect from them, why is it suddenly a big issue? Need I remind you that it's been less time since 300 series launch than Maxwell2 had prior to 300 series launch? You seems to comfortably forget all those things and assume a lot.

    I don't give a **** about Maxwell's future as by the end of this year I'll be running Pascal most likely. And you will still be running around NV's driver forum convincing yourself that GCN h/w is somehow better than Maxwell - when Maxwell will be long dead and buried on the market and some Pascal card will be swiping the floor with any 300 series Radeon. This is how graphics market works, not how you propose it to work - sitting on one card for four years is an atypical situation here and the results which such old h/w is showing in newer titles are mostly meaningless.
     
  15. Stormyandcold

    Stormyandcold Ancient Guru

    Messages:
    5,872
    Likes Received:
    446
    GPU:
    RTX3080ti Founders
    CUDA doesn't have to be used, just the method added to the spec. From what I understand the Nvidia method is missing something compared to AMD method. The current method is also vendor-specific...AMD. Therefore it's not really standardized and instead is skewed to AMD for this specific feature.
     

  16. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,918
    Likes Received:
    1,035
    GPU:
    RTX 4090
    Not really. D3D12 multiengine is a pretty generic method from h/w's point of view. There's a lot of FUD flying around the h/w implementation of execution of what API is submitting through the method though with AMD's trying to claim a "proper" implementation while there is unlikely to even exist a "proper" h/w implementation for this as all of them have their own issues and benefits.

    CUDA doesn't handle graphics so it's not really related to D3D12 which is actually about graphics only. There are ways of implementing the concurrent async compute differently to how GCN do it however and I expect NV to use a couple of such ways at some point.
     
  17. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,128
    Likes Received:
    971
    GPU:
    Inno3D RTX 3090
    This is my whole argument. Saying that you have money so that you can afford $700 purchases every 14 months has nothing to do with the argument at hand. You are basically telling me that I'm right but you don't care. Whatever dude.
     
  18. He explained to you quite succinctly why older hardware becomes obsolete and why Nvidia release their current hardware to match the needs of contemporary gaming software. I don't get why you feel this need to constantly defend AMD's offerings. It seems like most AMD fans have an underdog complex and just can't accept the fact that Nvidia holds the majority of the market share simply because they have been offering better products for quite a few years now.
     
  19. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,128
    Likes Received:
    971
    GPU:
    Inno3D RTX 3090
    This has nothing to do with market share. I am not defending anyone's offerings. The whole argument of the thread is whether NVIDIA's current architectures are ok in a DX12 world. There is no "obsolete" hardware, that is all in the eye of the beholder in the end. There are just good an less-than-good investments. NVIDIA hardware has proven to be quite topical in terms of performance, and not forward thinking at all. Getting stuck at 28nm exaggerated this. Current NVIDIA cards will not see the uplift that most owners expect from newer APIs. This has happened before and it is happening again with Maxwell. Async Compute will not bring performance improvements on the current NVIDIA line, and neither will DX12 for most cases. Most people who have given their hard earned money to NVIDIA expect a similar uplift that AMD cards will get.

    He is basically saying that the Kepler cards dropping a whole performance class down in newer games is somehow ok and that a repeat of this with Maxwell is ok again.

    In the end he's saying what I am saying, he's just pretending it's ok because somehow all of us want to give money to chip makers every year.
     
  20. EdKiefer

    EdKiefer Ancient Guru

    Messages:
    3,140
    Likes Received:
    395
    GPU:
    ASUS TUF 3060ti
    Man, you just don't get it or want to understand .
    First Maxwell is current model that are sold with "dx12" ready, not older ones like firmy and kepler .The older models don't get slower with age either, but again Nvidia must put a lot of RD into there driver prior to launch as there pretty well optimized, meaning in general you see some increase with new drivers the first year and then not so much improvement after in general .
    The whole AC big increase with AMD thing, is because AMD drivers an HW can't keep the card fed well in Dx11 . Nvidia Dx11 is very good at keeping the card utilized fully. Remember the whole Mantle vers Dx11 in BF4 , Nvidia did pretty good there .
    So yeh, I don't expect or think as far as performance goes Dx12 will bring much, but I do expect better IQ which has been slow to come .
     
    Last edited: Mar 17, 2016

Share This Page