Why exactly are the 3D-V-Cache Chips so much faster for some games? Think we'll get a 7900X3D?

Discussion in 'Processors and motherboards AMD' started by BlindBison, Aug 5, 2022.

  1. BlindBison

    BlindBison Maha Guru

    Messages:
    1,449
    Likes Received:
    530
    GPU:
    RTX 2080 Super
    I've watched a lot of tests for the 5800X3D vs 5800X or 5900X for example and it's interesting how in some games it seems to only slightly outpace the non-3D-cache version while in other titles (especially the super badly optimized stutter fest games seems like) see HUGE uplifts from the 3D V Cache (games like Elden Ring / some UE4 stutterfest ports, that kind of thing).

    It "seems like" the extra cache is basically letting the CPU "brute force" past bad porting efforts/specific memory bottlenecks or perhaps games that constantly have to go out to RAM or some such. Maybe my observation is wrong, that could be so (why I'm asking).

    The weird thing is I remember seeing TechDeals and even Ian from TechTechPotato saying they didn't expect the extra cache to be such a huge deal (maybe I'm misrepresenting them, but that's my recollection) in general terms. Yet in gaming it seems to be a genuine gamechanger since even with meaningfully lower clocks, it still obviously crushes the standard 5800X.

    It's so much of a game changer in fact that I'm sitting here scratching my head why CPU manufacturers haven't attempted something like this much much sooner. I'm no expert so I imagine they have their reasons. In anycase, assuming the next desktop zen 4 generation retains the same core counts, a theoretical 7900X (12 core / 24 thread 2 CCX) with each CCX having the 3DV-Cache or a 7950X3D seem like they'd be incredible chips on paper.

    I remember prior to launch tech channels and the like suggesting the reason it wasn't done before is the tech is expensive, but it really doesn't seem that way to me how AMD's done it at least since the 5800X3D matches or beats Intel's top offering in most cases at a much lower price point. Perhaps I'm overlooking or misunderstanding some things though. Thanks!
     
    akbaar likes this.
  2. vestibule

    vestibule Master Guru

    Messages:
    885
    Likes Received:
    336
    GPU:
    Radeon RX6600XT
    Yep, the 3DV does seem especially good at assisting with the heavy lifting.
    AMD seem to have quite the eco system going on with their hardware.
    My self, when I look at the price pf ram and a mobo @ B450 levels and then subtract that cost from a 5800X3D. Then it looks full on viable. But I have to wait for the next gen kit to come out first.
     
    BlindBison likes this.
  3. Horus-Anhur

    Horus-Anhur Ancient Guru

    Messages:
    4,650
    Likes Received:
    5,450
    GPU:
    RTX 2070 Super
    When a CPU is working, it needs data. Ideally we would have all the data accessibly to the CPU, instantly. But that is impossible, because of size and cost.
    If the CPU doesn't have the data it needs to work on, then it sits idle until it gets it.
    Ram is somewhat slow, so we use caches of very fast memory inside or near the CPU.

    For comparison the L3 cache on a 5800X3d, has a bandwidth of ~600GB/s. DDR4 3200, in dual channel, will have a bandwidth of 52GB/s, theoretical. In reality it's a few GB/s lower.
    In terms of latency, for memory it will be around 60-70 ns. Maybe 50s, in a well optimized kit. But L3 cache on a 5800X3d is around 12ns.
    So RAM is much slower than cache, both in latency and in bandwidth.

    So having more cache, means we can have more data in it. Reducing the probability of a cache miss, and having to go to memory.
    Fetching data from the L3 cache will be much faster, than fetching data from ram. So the CPU will spend less cycles idling.
    The CPU has a unit whose job is to predict what instructions are going to be needed in advance, so it can fetch the data into the caches, in time for execution. But because a lot of data a CPU needs to work on is branching, and with high dependency, it's just a matter of probabilities of the needed data being in a cache or not.

    The thing is that not all programs and games have the same memory requirements.
    Some programs might have critical data sets that fit inside small caches. But other might require bigger sizes.
     
    Last edited: Aug 7, 2022
    Zenoth, yasamoka, Valken and 3 others like this.
  4. vestibule

    vestibule Master Guru

    Messages:
    885
    Likes Received:
    336
    GPU:
    Radeon RX6600XT
    Apparently that is true AI or so I have been told. CPU's have been doing this for along time now and these days they are considered to be really good at it..
    I wonder what the latency is out side of average when the prediction is wrong.
    Not quite the learning CPU thought. :p
     
    BlindBison likes this.

  5. BlindBison

    BlindBison Maha Guru

    Messages:
    1,449
    Likes Received:
    530
    GPU:
    RTX 2080 Super
    Thanks that’s very helpful!
     
  6. Kool64

    Kool64 Maha Guru

    Messages:
    1,397
    Likes Received:
    604
    GPU:
    RTX 3090
    the prediction function is one of the major causes for vulnerabilities so sadly they can't predict too well lol
     
  7. vestibule

    vestibule Master Guru

    Messages:
    885
    Likes Received:
    336
    GPU:
    Radeon RX6600XT
    OK, show us the proof as knowledge is power an non of us on this forum want to be left in miss informed situation.
     
  8. Horus-Anhur

    Horus-Anhur Ancient Guru

    Messages:
    4,650
    Likes Received:
    5,450
    GPU:
    RTX 2070 Super
    Not exactly the branch prediction, but the OoO as a whole. And associated things, like speculative execution, simultaneous multithreading, cache isolation, etc.
     
  9. Horus-Anhur

    Horus-Anhur Ancient Guru

    Messages:
    4,650
    Likes Received:
    5,450
    GPU:
    RTX 2070 Super
  10. cucaulay malkin

    cucaulay malkin Ancient Guru

    Messages:
    6,118
    Likes Received:
    3,343
    GPU:
    RTX 3060 Ti
    Last edited: Aug 12, 2022
    BlindBison likes this.

  11. Zenoth

    Zenoth Maha Guru

    Messages:
    1,342
    Likes Received:
    283
    GPU:
    MSI RTX 3080 12GB
    So is that a case of system memory having a lot to 'catch up' to? Or is it a case of CPU Cache having been developed to go that fast "too soon"?

    I don't understand the reason(s) behind the huge gap in bandwidth speed between dual channel memory (be it DDR4 or even DDR5) and its latency (on average), and that of CPU Cache's.

    Basically, what explains that 'gap'? It's enormous. Something in terms of engineering and industry standards? I feel like system memory at this point in time should be much faster when compared to CPU Cache. But on the other hand, if it's now possible to provide significantly more Cache for CPUs then what does that mean for the future of system memory? If CPUs (or at least AMD ones) keep getting more and more (and faster) 'extra' L3 Cache in their 3D CPU variations then wouldn't that eventually make non-3D variants practically obsolete? I mean I know... not that naive, it could still take 4 or 5 CPU generations to 'superseed' non-3D CPUs but I feel like one day it's bound to happen.

    Because I don't see the point of going DDR5, then DDR6, then DDR7, if every single time we move up one notch in system memory generation the gap is either the same proportionally as the previous generation's gap, or even bigger as new generations get out. In other words, it feels like the '3D' variants of AMD CPUs act like a bandaid because system memory is increasingly slower by comparison of what CPUs should actually be able to perform with (with actual memory speeds that they're capable of processing data at, within the speeds of L3 Cache).
     
    BlindBison likes this.

Share This Page