Review: Intel Core i9-14900KS processor

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Mar 27, 2024.

  1. TLD LARS

    TLD LARS Master Guru

    Messages:
    804
    Likes Received:
    381
    GPU:
    AMD 6900XT
    This feels a bit like: "Your findings are irrelevant because my findings show that you are using your pc wrong".
    I hit 28GB usage yesterday in Darktide and From the Debths
    25GB in satisfactory.
    24GB in Cities skyline 2
     
  2. Agent-A01

    Agent-A01 Ancient Guru

    Messages:
    11,640
    Likes Received:
    1,143
    GPU:
    4090 FE H20
    Not sure how you "feel" like that's what I am saying.

    It "feels" like you are ignoring my points completely and are falling back to a straw man. Cherry picking statements while ignoring the broader scope of the context.
     
  3. TLD LARS

    TLD LARS Master Guru

    Messages:
    804
    Likes Received:
    381
    GPU:
    AMD 6900XT
    Ok. I will go and work overtime to pay for my expensive nonsense memory kit.

    I the meantime you could bring us some numbers instead of the very vague numbers you brought earlier:

     
  4. user1

    user1 Ancient Guru

    Messages:
    2,830
    Likes Received:
    1,346
    GPU:
    Mi25/IGP
    this isn't entirely true, the if links are asymmetrical just like zen3, but instead writes are double reads(according to aida anyway),1 ccd with an fclk of 2000 is about 64gb/s reads /copy, an 128gb/s writes theoretical max, this is true regardless of whether running the memory controller in 1:1 or 1:2 ,
    so long as the fclk remains equal or greater than the uclk, there is no significant bandwidth penalty running speeds of 8000mt/s or greater as the uclk opperates at >2000mhz (for the fast side of the link anyway), , and running at 8000 + is enough to overcome the latency penalty of running the uclk at a lower speed,

    its very similar to zen3 , with the difference being you can actually run fast enough speeds to overcome the penalty, and there is room for more memory bandwidth.

    obviously 6000mt/s is going to be alot easier than running 8000+ , but there is performance to be had.

    to add, if you have a zen 4 that does 2133fclk, then the maximum theoretical bandwidth is going to be 136gb/s for the fast side of the link, which equates to 8533mt/s where the fabric is synced to the uclk
     
    Last edited: Apr 2, 2024

  5. Carfax

    Carfax Ancient Guru

    Messages:
    4,003
    Likes Received:
    1,476
    GPU:
    Zotac 4090 Extreme
    Whenever I've seen comparisons (especially in gaming) between 1:1 and 1:2 ratio with Zen 4, the former is almost always faster than the latter despite the bandwidth advantages gained.

    There is a reason why AMD themselves said that DDR5 6000 is the "sweet spot" for Zen 4, because the CPU performs best when the memory controller is running at the same speed as the RAM.

    AMD Confirms DDR5-6000 RAM Is The Sweet Spot For Ryzen 7000 CPUs | Tom's Hardware (tomshardware.com)

    This is in contrast to Intel CPUs, where the RAM is running at twice the speed of the memory controller when using DDR5.
     
    tunejunky likes this.
  6. Agent-A01

    Agent-A01 Ancient Guru

    Messages:
    11,640
    Likes Received:
    1,143
    GPU:
    4090 FE H20

    Right. I'll get on that after you show how a slow 64GB kit is going to be faster or worthwhile for any gamer.

    Clearly you're just here to argue over nothing. So moving on.

    Check out this visual representation of a dual CCD CPU.

    https://www.reddit.com/r/overclocki...sual_explanation_of_why_higher_memory_clocks/

    Regarding your bandwidth numbers, take your theoretical numbers and cut them in 1/2.
    Yes, they are asymmetrical. But you have them inversed. Writes are 1/2 the speed of reads.

    64 GB/s is the maximum theoretical amount with 1:1 FCLK mode at 2000 from a single FCLK link.

    With dual CCD there are 2 FCLK links so there is more bandwidth. But that doesn't matter if a game is forced to a single CCD(which is 99% of games).

    In the case of a single CCD, having 8000 doesn't help at all because the bandwidth is already saturated while also penalizing latency.
    So there's going always going to be a deficit in performance when latency matters.
     
  7. user1

    user1 Ancient Guru

    Messages:
    2,830
    Likes Received:
    1,346
    GPU:
    Mi25/IGP
    I don't contest that the dual ccd has more bandwidth due to additional links, but it's not quite that simple, the visual you're looking at is exactly what I'm talking about, you can see that each link is asymmetrical, 32b/cycle read and 16b/cycle write,
    [​IMG]
    [​IMG]
    now there is an issue, because aida shows the opposite behaviour, don't know if its an aida bug or the amd slides are wrong, anyway in effect the slow side of the link is 64gb/s and the fast side is 128gb/s this is because the gmi links are faster on zen4 and the fabric is "double pumped " aka 256bitx2 ,

    if the gmi links were the same as zen2/3 , which is what you're claiming when you say 2000fclk is 64gb/s max ,because thats how it was on zen2/3,

    then you would see a 64gb/s + 32gb/s just like the single ccd chips on zen2/3 and 128gb/s + 64gb/s max for dual die chips, which is not what happens.


    and lastly 8000mt/s is actually fast enough to overcome the latency penalty of the slower memory controller, you can achieve the same or lower latencies than 6000mt/s in the 1:2 mode. and you don't need as much soc voltage/ consumes less power, so in general it would be desirable to run 1:2 if you can.
    And if next generation board can run higher speeds , then if you were running say 8533mt/s then you might see a larger improvement, cause 8000 with synced uclk is just barely better/equal to a 6000 setup
     
    Last edited: Apr 3, 2024
  8. tunejunky

    tunejunky Ancient Guru

    Messages:
    4,638
    Likes Received:
    3,273
    GPU:
    7900xtx/7900xt
    the mobos necessary for these feats of tweaking are all flagships. i got a B650E for extra stability but there's nothing for me to be had over 6400
    and there is definitely a mobo lottery as well as RAM lottery
     
    Carfax likes this.
  9. Agent-A01

    Agent-A01 Ancient Guru

    Messages:
    11,640
    Likes Received:
    1,143
    GPU:
    4090 FE H20
    GMI is just another acronym for Infinity fabric. What you are talking about being "double pumped" is the FPU units, dual 256bit FPUs. AMD used that terminiolgy when they announced it about AVX. They did that instead of adding 512bit FPU like older Intel chips used.
    That does not mean it doubles the IF bandwidth. It still is 64GBs 32GBs read/write.
    Now you could also be talking about EYPC parts that have dual link IF between CCDs. That is a doubling of bandwidth but that does not exist on Rapheal / AM5 consumer parts.

    It is the same. See above. 64GB/s / 32GB/s for Raphael parts(7xxx series)

    6400-6800(depending on IMC quality) is the sweet spot. 1:1:1 mode. 8000 will require UCLK at 1:2 which means the memory controller runs at 1/2 speed and in this case = 2000Mhz. At 6600, the memory controller is running at 3300.

    It's not any different in regards to intel. Alder lake introduced decoupled memory controller and memory frequency.
    DDR4 could run at 1:1, 1:2, or 1:4.

    DDR4 4000-4266 is what most good IMCs could run. But DDR4 4800 or so was possible.
    1:1 mode was always faster in games despite the added bandwidth. Since the IMC ran at 1:2 mode at 4800 it actually caused worse performance because of the latency penalty.
    Same thing applies here outside of 1:1 UCLK. Faster memory controller = less latency.
     
    Carfax and Krizby like this.
  10. user1

    user1 Ancient Guru

    Messages:
    2,830
    Likes Received:
    1,346
    GPU:
    Mi25/IGP
    The gmi links are implemented via phys that provide connectivity, they are not the data fabric on the io die, both implement the infinity fabric protocol, but they are physically not the same, on older zen chips the gmi links, the data fabric and the memory controller all operated at the same speed, however on zen4, the memory controller and data fabric are decoupled, and the GMI links seem to operate at a different speeds on some products,

    on the eypc genoa cpus, the gmi links are only 36gb/s, and 2 links are required for 72Gb/s throughput(https://www.amd.com/system/files/documents/4th-gen-epyc-processor-architecture-white-paper.pdf), given the 32bytes per clock , this puts the gmi links for those processors only operating at around 1.15ghz , and the data fabric of the io die probably doesn't operate at such a low speed . Genoa isn't raphael obviously but I think its worth mentioning.

    if raphael does not have a different speed for the gmi links, then it is physically impossible for any zen4 ryzen to hit >100gb/s for writes and reads , it would be 128gb/s and 64gb/s theoretical max for dual ccd processors running 2000fclk, so if that is the case than aida must be bugged basically. however if it did run at a different speed it might also explain why getting much more than 100gb/s is difficult , because the gmi link speed could be capping it.

    to summarize,
    the data fabric on the io die, is 64b/cycle, which is what allows it to feed the 2x32b/cycle read via gmi links and the 64b/cycle link to the io controller, if the data fabric was 32bytes a cycle aka 256bit , then you have the zen3 bandwidth, would never exceed 64gb/s read or write with 2000fclk, dual die or not.

    the link between the memory controller an the data fabric is 32bytes a cycle but operates at the speed of the memory, and not the memory controller otherwise you'd see a halving of the bandwidth in 1:2 mode ,

    the gmi links may operate on their own clock or a divider/multiplier that is not exposed to the user, with evidence supporting this being high writes in aida, genoa's configuration and the 100gb/s wall,
    evidence against it is this article in chipsandcheese , which shows them being not being able to exceed 64gb/s write on a 7950x with 2000fclk and 6000 mt/s memory, however its worth noting that they only manage 73gb/s read, which is low even for zen4. so its hard to say if their conclusions are valid and were not a result of some other issue, https://chipsandcheese.com/2022/11/08/amds-zen-4-part-2-memory-subsystem-and-conclusion/,

    I will say that given the article in chips and cheese, It does seem more like an aida bug. in which case people should probably stop using aida as a memory benchmark, since its been 18 months since zen4 launched and apparently hasn't been fixed.



    I am familiar with the uclk /mclk relationship, on zen4 the penalty of running 1/2 uclk , its very small, much less than the impact of running alderlake in gear 1 vs gear 2
    [​IMG]

    [​IMG]
    being able to run something like 8000mt/s cl36 is probably going to be better performance than something like 6000 cl30 but maybe not highly tuned setup , obviously as you mentioned many chips can do more than 6000mt/s , with good samples doing 6600mt/s in 1:1, but thats a bit of luck, as some won't even do 6400, however all of them will do 8000 in 1:2 mode with a capable motherboard , and if you could get even higher speeds working which does seem to be possible on a few board, then the latency benefit would probably yield something more tangible.


    If anybody knows what the stock fclk of the 7995wx or genoa 96core cpu, that would be helpful at clearing up the gmi thing.

    I apologize for the long winded offtopic posts:oops:
     
    tunejunky likes this.

  11. Carfax

    Carfax Ancient Guru

    Messages:
    4,003
    Likes Received:
    1,476
    GPU:
    Zotac 4090 Extreme
    This just goes to show how inefficient Zen 4's memory controller is compared to Intel's, or maybe the chiplet design is also reducing the efficiency of Zen 4's memory performance.

    My 14900KF gets this score at DDR5 7400 CL34-44-44-34 CR2. If I were capable of running my RAM at the same speed and timings as that dude, my latency would be in the mid to low 40s.

    [​IMG]
     
    tunejunky and user1 like this.
  12. user1

    user1 Ancient Guru

    Messages:
    2,830
    Likes Received:
    1,346
    GPU:
    Mi25/IGP
    its probably the GMI links connecting the io die to the ccds, the raptorlake/alderlake chips, have a better data path being monolithic and all. On zen4 bandwidth to the igp isn't nearly as bad as the the ccds/ccxs,
    I am tempted to upgrade and buy that asrock b650m-hdv board that guy is using for the apu with 10600mt/s, its only like $100 usd in my country, seem like it would be fun to play with.:D
     
    Carfax and tunejunky like this.
  13. Agent-A01

    Agent-A01 Ancient Guru

    Messages:
    11,640
    Likes Received:
    1,143
    GPU:
    4090 FE H20
    That is correct for dual CCDs. 128/64. 64/32 for single CCD @ 2000 FCLK.

    And you are also correct about AIDA, it is bugged. I don't use it anymore as it gives erroneous data even for intel. Change BCLK just a little? Numbers go crazy.


    It really depends on the game. Some show minor differences some moreso.

    It's been tested, save a few games that are bandwidth starved, latency is always best for most cases. I see a lot of people on OCN saying 8000 is slower in games than 1:1:1 mode after multiple tests.
     

Share This Page