1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Nvidia Talks About Higher OC clocks on the Founder 2080 cards - also PCB Photo

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Aug 21, 2018.

  1. Robbo9999

    Robbo9999 Maha Guru

    Messages:
    1,272
    Likes Received:
    207
    GPU:
    GTX1070 @2050Mhz
    Nothing wrong with the guy surmising performance based on specs, it's just that...surmising. It's interesting to speculate, nothing wrong with it.
     
  2. Fox2232

    Fox2232 Ancient Guru

    Messages:
    9,380
    Likes Received:
    2,025
    GPU:
    -NDA +AW@240Hz
    Did you buy your terascale RX570 form China for $100? No wonder you write what you did. You should have bought 14nm Polaris based card not 28nm Tonga.
    Well done trash post.

    Your Athlon X4 CPU combined w/ GTX1050Ti speaks miles for your understanding.
     
  3. sdamaged99

    sdamaged99 Ancient Guru

    Messages:
    2,025
    Likes Received:
    21
    GPU:
    Inno3d GTX1080 Ti
    You can estimate teraflops performance from memory speed / bandwidth and clockspeed. CUDA cores are not really part of that equation. I'm not the only one who has done this, a few notable Tech YouTubers have also done the same and came to similar conclusions

    This was from Forbes (who may be quoting someone else)

    "Seriously, glance at the clock speeds for the 20 Series. Check out the unimpressive CUDA core increase over the 10 Series. Realize that the memory configuration is the same as the 10 Series (albeit with GDDR6 instead of GDDR5). Take a hard look at what the performance increase should be. Most in the tech media are putting it at maybe 10% to 15% over the 10 Series when it comes to the majority of games out there. But you'll pay 40% to 50% higher prices for this generation's replacements based on MSRP. And you know we won't be paying MSRP. . ."


    All Nvidia talked about was Raytracing. Nothing on performance in games. Another red flag is a month of preorders, with no performance benchmarks allowed in the meantime. It's possible that the proprietary raytracing stuff will just die a death, as it will likely absolutely murder framerates similar to other gameworks technologies. Lets see
     
    Last edited: Aug 22, 2018
  4. Evildead666

    Evildead666 Maha Guru

    Messages:
    1,032
    Likes Received:
    156
    GPU:
    Vega64EKWB 240PP12V
    3DFX tried to cut out the Board partners.
    Then Nvidia bought all their remains.
    Nvidia knows all too well what happens when you cut out the AIB crowd. ;)

    They won't step on their toes too much, but I suspect people have been asking for High Build Quality cards straight from Nvidia, and they Obliged.
     

  5. metagamer

    metagamer Maha Guru

    Messages:
    1,112
    Likes Received:
    420
    GPU:
    Palit GameRock 2080
    The 970 matched the 780ti and it had a lot less CUDA cores, it had a 256bit memory bus *cough* vs a 384bit memory bus therefore memory bandwidth was nowhere near. It ran around 200mhz faster than the 780ti though. And it matched the 780ti more often than not.

    New architecture can also be a lot more efficient you see. So less CUDA cores and lower core clocks doesn't necessarily mean lesser performance. Let's not forget that the slower core clocks will be offset by higher memory clocks, to a certain point.
     
  6. fantaskarsef

    fantaskarsef Ancient Guru

    Messages:
    10,476
    Likes Received:
    2,705
    GPU:
    1080Ti @h2o
    Just to add to that, iirc Maxwell had better compression algorythums than the generations before, that's how they were able to cope with less bandwidth for the memory.
     
    Denial and metagamer like this.
  7. metagamer

    metagamer Maha Guru

    Messages:
    1,112
    Likes Received:
    420
    GPU:
    Palit GameRock 2080
    Yes, you're right. You never know, Turing might just have some wizardry up it's sleeve too. But just looking at numbers and determining performance solely like that is silly so I don't know why people do it.
     
    fantaskarsef likes this.
  8. Denial

    Denial Ancient Guru

    Messages:
    12,164
    Likes Received:
    1,340
    GPU:
    EVGA 1080Ti
    Turing has double the L1/L2 cache as Pascal which is going to alleviate hits to memory along with Variable Shading and Texture-Space Shading, both of which should make the overall process more efficient for memory bandwidth. And then yeah, whatever changes they made to delta compression.
     
    fantaskarsef likes this.
  9. sdamaged99

    sdamaged99 Ancient Guru

    Messages:
    2,025
    Likes Received:
    21
    GPU:
    Inno3d GTX1080 Ti
    If it is indeed faster though, why not show some benchmarks? It seems like this has been a deliberate move, which doesn't bode well
     
  10. Denial

    Denial Ancient Guru

    Messages:
    12,164
    Likes Received:
    1,340
    GPU:
    EVGA 1080Ti
    Because I don't think it's that much faster. The only comparison they showed in regular workloads was with DLAA vs TAA in the infiltrator demo - where the framerate was doubled. You'll probably get some other titles that utilize DLAA, from what I understand it isn't that hard to implement, but aside from that I only expect a 25-30% performance uplift over 1080Ti with default clocks. Perhaps with overclocking - 2ghz, etc it will start to shine but idk. Like I said in the other thread, this entire series is going to be value-add features. The NGX stuff looks like an avenue for Nvidia to add a bunch of features to the card over its lifespan.
     
    fantaskarsef likes this.

  11. fantaskarsef

    fantaskarsef Ancient Guru

    Messages:
    10,476
    Likes Received:
    2,705
    GPU:
    1080Ti @h2o
    I agree. I think we will only see what Turing really is worth for us if you look at it's performance compared to Pascal in DX11 (which won't do miracles I'm afraid). DLAA / DLSS is just a method of reducing workload in the GPU by using an approximative algorythum instead of brute calculations, hence that's how they got such a boost in that scenario. They compared it to TAA though, which because of the performance hit isn't the usual "gamer's choice". I too think we need to see overclocking performance too, since that's what "we" games will run, and that's where the real worth of an upgrade will be determined, not if you need RTX / DXR or not.
     
  12. Fox2232

    Fox2232 Ancient Guru

    Messages:
    9,380
    Likes Received:
    2,025
    GPU:
    -NDA +AW@240Hz
    But take it to your real world. If There is TAA with huge time per frame and then then DLAA with just 1/2 frame time (doubling fps). What's performance without AA?

    I would easily disable AA entirely in FPS game if it meant more than double fps. And I would likely go for downsampling, as higher resolution means more details in distance and higher per pixel precision for textures.
    + I do not like TAA much anyway. It blurs things bit more than I am comfortable with in almost every game. Only real benefit is complete removal of shimmering.
     
  13. Noisiv

    Noisiv Ancient Guru

    Messages:
    6,611
    Likes Received:
    463
    GPU:
    RTX 2070 Strix
    Pure Bandwidth alone should be a hint at performance.
    • GTX 2070/1070Ti= +75%
    • 2070/1080= +40%
    • 2070/1080 (14Gbps) = +27%
    • 2080Ti/1080Ti = +27%


    We know all too well that Nvidia does not throw in additional bw unless it's needed (omg only 192 bit; remember?). It hurts power, it ads complexity and it's wasted.
    So I am not worried one bit about the mandatory performance uplift compared to Pascal.
     
  14. Denial

    Denial Ancient Guru

    Messages:
    12,164
    Likes Received:
    1,340
    GPU:
    EVGA 1080Ti
    There are other things in the architecture that can be leveraged for more performance. For example Vega shipped with RPM for FP16 calcs, which AFAIK, was only utilized in one game (Farcry 5) but Nvidia has a similar function now (they actually had it with GP100's SM but not in consumer Pascal).. so hopefully more games will utilize that now that both vendors support it.

    Well didn't the RTX Quadro slides say 1.5x from the architecture itself? I can't find the slide now.. I know someone posted it here showing ATAA vs whatever for UE but I'm pretty sure it showed a base 1.5x increase in performance with AA disabled entirely.

    Well this release may be different due to the RTX stuff though, which AFAIK is extremely bandwidth dependent. For example this slide:

    https://pbs.twimg.com/media/DkqLJVjUcAAjJOj.jpg

    Titan V's numbers are 9.1 / 9.7 / 18.8 - Turing's theoretical speed should be faster than what it is here but according to Morgan McGuire (Engineer for Nvidia) some shaders see less of a speedup from RT cores compared to Volta due to HBM vs GDDR6. So it seems like Raytracing is somewhat bottlenecked by memory bandwidth moreso than traditional workloads.

    Also I found the other slide I was talking about:

    https://pbs.twimg.com/media/DkqK-93UYAEGq8B.jpg:large

    So out of the box Nvidia is claiming the RTX 6000 is 1.5x faster than a Titan V in raster workloads. So you figure 2080Ti is slightly cut down but will have faster clocks. I guess 50% is what we should expect for regular workloads. Idk, I expect less but WE'LL SEE.
     
    Last edited: Aug 22, 2018
    fantaskarsef likes this.
  15. Texter

    Texter Ancient Guru

    Messages:
    2,914
    Likes Received:
    98
    GPU:
    Club3d GF6800GT 256MB AGP
    It's not even reducing the workload, it's merely shifting the workload to the Tensor cores, as far as I can tell, and it could actually be four times as intensive as TAA for all we know, except that it frees up CUDA core time for rendering. So that should work well, unless you're also trying to run ray-tracing if the performance hit in Tomb Raider is as good as it gets. I have no idea how many G-rays, Tensor cores and RT engines, let alone JHH's f@cking axis-fluid RT units you need for decent RTX...I'm just a consumer, I'll wait for some reviews.
     

  16. Fox2232

    Fox2232 Ancient Guru

    Messages:
    9,380
    Likes Received:
    2,025
    GPU:
    -NDA +AW@240Hz
    It is understandable. When you run shader code on part of texture, you do load data for all that stuff on few blocks.
    But raytracing with 6 samples per pixel pixel can use data from 6 completely different places in scene and if it goes for multiple bounces... Basically I think it would be more latency sensitive than bandwidth if data store was optimized for raytracing delivery.

    But can you really deliver to GPU cache just information that one pixel needs from each ray, or are you pulling bigger data blocks from memory?
     
    Denial likes this.
  17. Denial

    Denial Ancient Guru

    Messages:
    12,164
    Likes Received:
    1,340
    GPU:
    EVGA 1080Ti
    I could be wrong about this and I know people keep posting block diagrams but I'm 85% sure that Tensor/RT cores are not discreet cores. Basically the SM's get partitioned and those portions "become" Tensor/RT cores. So I don't think you actually free anything up when running stuff on Tensor - on the flipside you also don't lose anything when not running Tensor.

    I think this for two reasons - 1. The RT/Tensor changes are linked to the INT/FP separation in the SM itself + ALU changes and 2. There is no way they managed to get 4300 CUDA cores in a 250w TDP but then also turn on some magical other cores and keep the TDP similar.

    85% sure but yeah, could be wrong.

    You're right - could be latency. I just assumed bandwidth but HBM does have a massive latency advantage as well. He didn't clarify in his tweet, just said the difference was HBM vs GDDR6.
     
  18. Agent-A01

    Agent-A01 Ancient Guru

    Messages:
    11,350
    Likes Received:
    871
    GPU:
    1080Ti H20
    How about you tone it down some, you come across as an asshat in all your posts.

    I expect the difference in normal games to be derived from the single precision difference.

    11.4TFLOPS vs 16 TFLOPS.

    If the 2080Ti clocks high like pascal, you can expect >20tflops with OC
     
    Fox2232 likes this.
  19. wavetrex

    wavetrex Master Guru

    Messages:
    457
    Likes Received:
    212
    GPU:
    Zotac GTX1080 AMP!
    Fixed function binary electronics require very few transistors to implement, compared to general purpose and logic running electronics, which need to adapt to whatever code is being pushed through them.

    Those "tensor cores" are array addition+multiplication circuits, basically running the same operation over and over again ad-infinitum.
    They are, in a sense, very similar to the days of old when first "hardware accelerated graphics" were implemented, in which the 3D chip was basically just doing lots of identical calculations very fast (for that time), leaving the CPU to push the stream of numbers into them and interpret the results.
    If you want a bit of brain explosion, look at this: http://www.felixcloutier.com/x86/FMUL:FMULP:FIMUL.html - That's not exactly what they are doing, but pretty close.

    The CUDA cores on the other hand are quite close to a CPU's floating point unit, running all kind of operations, addition, multiplication, inverse (1/x), square root, trigonometry, and of course... memory access, decision (IF, CASE), jump... and so on and so on not getting too much into detail here.

    It is also the reason why bitcoin has moved from CPU's to GPU's to specialized ASICs, as simpler electronics can do the same few operations MUCH faster than more complex electronics which need to adapt to the incoming code.

    Nvidia could probably increase number of "Tensor Cores" 10 times with only adding 10% to the total transistor budget, but not very useful if the other parts of the chip can't feed those cores.
    It's all about balance (which makes this advanced micro-engineering so hard)
     
    Embra and Noisiv like this.
  20. TheDeeGee

    TheDeeGee Ancient Guru

    Messages:
    5,981
    Likes Received:
    458
    GPU:
    MSI GTX 1070
    First the 10 series price bumping and now this...

    I liked Nvidia you know.
     

Share This Page