The AMD EPYC Milan-X server CPU platform has a layered L3 cache, which significantly increases the amount of available cache, bringing it towards max 768MB.... AMD EPYC Milan-Xspecifications leak, revealing up to 64 cores, 280W TDP, 768MB L3 cache.
I wonder if they are going to go the IBM route with virtual caches in the future. Seems like a much better solution for enterprise.
I think amd is really onto something now with all the add on cache they're going to be throwing onto even their desktop CPU's
That's a LOOOOOOT of cache! Is this like the L4 that intel stuck onto the desktop Broadwell parts then, or something else? I remember that giving a rather spicy boost to performance that kept the architecture on par with far more recent things...
yes for traditional enterprise you are probably right as long as "better" means cheaper. but for high tech enterprise nothing beats on die cache. and massive on die cache at that.
It is on die - 256MB of Shared L2/L3 virtual cache per 8 core chip. It obviously comes with a latency penalty but the benefits of the L3 being shared with the L2 supposedly outweighs that.
This is better, this is legit l3 cache, thats added on top of the existing cache and connected via tsvs (Through Silicon Via), there is no latency penalty, unlike an l4 cache which would be slower.
I'd be really curious to see what would happen if someone were to [somehow] put in a 64MB (not a typo) stick of RAM in, install Windows XP, and play some early 2000s game just to see how it runs. Obviously it's utterly useless, but it's weird to think about having more cache than you have RAM, and enough cache to actually run your entire system and a simple game. Hard to wrap your head around.
Yeah - the idea behind IBM's implementation is that it massively reduces L3 latency at the expense of slightly increased L2. If you get lucky, you can hit L3 for nearly identical latency as L2 but in the vast majority of scenarios the latency of your L3 is going to be significantly faster than traditional L3 - for example the L3 seen here. In traditional PC workloads you're not going to see much of a benefit but in use cases IBM's cloud servers, or Epyc's servers are being used, having a fast L3 is extremely useful. Keep in mind IBM's system is also scaling up to ~512mb of combined L2/L3 per module and 8GB of cache (virtual L4 in this case) in a 32 chip system. AMD's stacking here would be adjacent to this.