PSA: 535 system stability concerns.

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by Astyanax, Jun 5, 2023.

  1. WontonNoodle

    WontonNoodle Active Member

    Messages:
    68
    Likes Received:
    7
    GPU:
    Nvidia GTX 3070M
    if the system was stable before this driver then it means it's the driver's fault. how can you say someone's ram wasn't stable when it was fine before this driver.
     
  2. Astyanax

    Astyanax Ancient Guru

    Messages:
    17,011
    Likes Received:
    7,351
    GPU:
    GTX 1080ti
    BUZZ, Wrong.

    By having an understanding of the technical properties of silicon electronics, and their unstable by design nature.

    Stability can only be suggested, not confirmed through the testing performed within the context of the current software environments and tools available.

    There are always uncontrollable edge cases that are missed once you operate beyond the design specifications of the processor.
     
    Last edited: Jun 6, 2023
    Xtreme512 and chinobino like this.
  3. Krizby

    Krizby Ancient Guru

    Messages:
    3,067
    Likes Received:
    1,741
    GPU:
    Asus RTX 4090 TUF
    Or mount a fan on those hot DDR5, they tend to throw error when temp is above 50C, at least in my case
     
    Xtreme512, Carfax and OnnA like this.
  4. sertopico

    sertopico Maha Guru

    Messages:
    1,444
    Likes Received:
    374
    GPU:
    Palit Gamerock 4090
    Thank you.

    I read a lot about it, from what I've understood this has been introduced after Intel patched the Spectre/Meltdown vulnerabilities some years ago. I have now increased the Ring PLL SFR Voltage from 0.9 (default) to 0.93, let's see.. Hogwarts Legacy is the only game that brings up this WHEA Error anyway. Not too bad, the game is average to say the least, technically as well as gameplay-wise.

    As soon as Intel releases Raptor Lake refresh I will finally change platform and let the 4090 express its full potential. :D
     
    Last edited: Jun 6, 2023

  5. janos666

    janos666 Ancient Guru

    Messages:
    1,648
    Likes Received:
    405
    GPU:
    MSI RTX3080 10Gb
    Awhh... I was so excited to see some actual, interesting-looking, playable game (I liked the Portal games) with UE5 and Lumen. But it's a bit disappointing.
    The first thing I noticed was how "noisy" the shadows and reflections are unless the corresponding quality settings are set close to or at maximum where the performance is not as great as I hoped. The next thing I noticed is that reflections are just plain broken in some cases, even with "hardware lumen" (which looks significantly better but is still far from perfect). Look at the triangle in the middle: https://ibb.co/GCy45KT
     
  6. Astyanax

    Astyanax Ancient Guru

    Messages:
    17,011
    Likes Received:
    7,351
    GPU:
    GTX 1080ti
    Sadly, if its D3D12 most of the causes are things like a missing or evicted resource that the engine still believed was present in memory, it looks like this rise in reports occurred before this new driver though, so its probably on the game developers to identify the issue with DRED and Nsight.
     
  7. Carfax

    Carfax Ancient Guru

    Messages:
    3,956
    Likes Received:
    1,450
    GPU:
    Zotac 4090 Extreme
    I'm using a Fractal Design Torrent case which has the best airflow, so the DIMMs are never hot. I think the highest I've seen them go is low 40s.

    Increasing the voltage seems to have fixed Dead Space remake crashing though, as I played for about an hour straight and had no crashes. Took Isaac all over the ship as well trying to trigger a crash, but nothing happened.

    Voltage before was 1.47v, now it's 1.49v.
     
    Krizby likes this.
  8. donnieyeen

    donnieyeen Member

    Messages:
    14
    Likes Received:
    4
    GPU:
    3080RTX 10GB
    As I said in the other thread, do not "guess" about things. Test your memory with TM5 or Karhu - and you will quickly see if everything is "stable". You don't guess stability - you test it with proven tools. Or test the driver on STOCK.

    Also one thing I doubt about is, that RAM is the issue for crashes. It has to be something very fundamentally wrong if your pure crashing on the DDR5.

    DDR5 has a hugely revamped ECC - the ECC is a thing on DDR5, it's very good and it will prevent a lot of crashes by simply correcting your timings and things you did wrong for the cost of perfomance that is.. Unless you miss voltages of course.
     
  9. Archvile82

    Archvile82 Master Guru

    Messages:
    542
    Likes Received:
    334
    GPU:
    ROG STRIX 4090 OC
    I noticed with this set of drivers I was seeing apps reporting failures on boot ( Logitech hub, icue ) then something Nvidia related with live kernel. I reverted back the the redfall driver as that set were working well on my system. I can confirm I have had my Ram set to CR 1 so I have changed that back to CR 2 and will give the drivers another go.
     
  10. Carfax

    Carfax Ancient Guru

    Messages:
    3,956
    Likes Received:
    1,450
    GPU:
    Zotac 4090 Extreme
    My memory was already stress tested before, but these new timings I've used are fairly new and I haven't stress tested them as that can be time consuming to do properly. I've never messed with the tertiary timings, but I've slowly been adjusting them over the past few weeks. All my games were stable on 535.98 drivers except for Dead Space, and honestly I'm not surprised because Dead Space constantly moves data around between SSD, RAM and VRAM due to being totally seamless.

    Dead Space is actually a great stability test, much like the Battlefield games.

    It's definitely the RAM. I already suspected it was even before @Astyanax commented on it because the main modus operandi of memory instability are crashes and freezes with no errors in the event log.

    Yep, my RAM is fairly heavily tweaked and overclocked and has been exceptionally stable since I've had it, but memory tweaking can be tricky as there are so many variables involved.
     
    Last edited: Jun 7, 2023

  11. Astyanax

    Astyanax Ancient Guru

    Messages:
    17,011
    Likes Received:
    7,351
    GPU:
    GTX 1080ti
    You misunderstand the ECC on DDR5, it is not cpu interactive(sideband) like true ECC is, it only corrects data corruption on READS from the sticks, and not writes to it.

    [​IMG]

    So if your data is corrupted at write time, or due to CR1 failing the first bit at latch time, it will write the bad data and ECC will be of no use here because on die ECC thinks the bad data is correct.
     
    Last edited: Jun 7, 2023
    386SX likes this.
  12. snight01

    snight01 Master Guru

    Messages:
    453
    Likes Received:
    87
    GPU:
    GB RTX 4090gamingOC
    these drivers actually helped my find the stable oc on my card.
     
  13. donnieyeen

    donnieyeen Member

    Messages:
    14
    Likes Received:
    4
    GPU:
    3080RTX 10GB
    ΒΈ

    What is your kit and is it AMD or Intel system? If you want, you can post zentimings picture, or better yet join the zentimings discord and can help there with proper timings setup.
     
  14. janos666

    janos666 Ancient Guru

    Messages:
    1,648
    Likes Received:
    405
    GPU:
    MSI RTX3080 10Gb
    This makes me wonder why they didn't just transition to full ECC with DDR5 (or even as soon as DDR4 where "training" became so important for regular operation). My little Skylake-Pentium home server runs with ECC RAM in a semi-consumer Gigabyte motherboard (C232 workstation chipset "rebranded" as X150). It's so stupid how the cheapest Pentium CPUs offered full ECC but "Core" models don't. As much as I know, pretty much all Ryzer CPUs support ECC RAM but only unofficially. And since it's unofficial, most motherboards lack the required traces on their PCBs.
    Contrary to what most people would think, full ECC would be a VERY handy tool for memory/controller overclocking, since you could catch the slightest errors with confidence during a stress test or normal operation, simply by looking at a reliable counter (instead of waiting for a complete crash which can be masked by the "partial ECC" methods for some time). Instead, they use a "hidden" and less reliable method.
     
  15. Goras

    Goras New Member

    Messages:
    2
    Likes Received:
    0
    GPU:
    3060ti
    Same here on RDR2 (vulkan) with no XMP nor OC

    wtf.. I will try
     

  16. Light35

    Light35 Member

    Messages:
    48
    Likes Received:
    28
    GPU:
    3070 ti
    Price.
    Because true ECC ram is much more expensive and not feasible for the consumer market.
     
  17. janos666

    janos666 Ancient Guru

    Messages:
    1,648
    Likes Received:
    405
    GPU:
    MSI RTX3080 10Gb
    Why would it be inherently much more expensive? Look at the graph in #31. The extra ECC bits are already stored in the RAM, so the "extra" capacity is already there (just in a different manner). But it would be more fault-proof if the IMC handled the ECC calculations (back and forth). And as I already hinted above, pretty much all current consumer desktop CPU IMC can handle "real" ECC RAM, it's just artificially disabled on some models (for example, a Skyle Pentium G4400 I have in my home server officially supports ECC RAM when paired with the approved workstation chipset and thus the extra wires are there on the motherboard PCB - and that motherboard wasn't expensive at all, cheaper than many "gamer" boards with a very similar design but no ECC support). All it would really take to do "full" ECC is some extra wires on all motherboards (not just server and workstation models) and putting one extra storage chip on the RAM PCB instead of building the extra capacity into all chips.
    Alternatively, you could cheap out on the extra RAM chip and wires and still have the CPU IMC do the work but that could "waste" a bit of raw bandwidth - based on design choices). But this would need a brand new design instead of utilizing something that's already there anyway (it sounds easier to unify desktop and server/workstation, doesn't it?).
     
  18. Astyanax

    Astyanax Ancient Guru

    Messages:
    17,011
    Likes Received:
    7,351
    GPU:
    GTX 1080ti
    ECC employs additional trace and for DDR5, for the first time ever, means additional pins vs the non ecc variant.
     
  19. janos666

    janos666 Ancient Guru

    Messages:
    1,648
    Likes Received:
    405
    GPU:
    MSI RTX3080 10Gb
    So do it like PCI-e slots: You can put all kinds of 1x, 4x, 8x, 16x long cards into 16x long slots (that's irrelevant here but technically, you can also plug 16x long cards into 1x slots in case the slot's end is left open ended, some boards do that).
     
  20. Astyanax

    Astyanax Ancient Guru

    Messages:
    17,011
    Likes Received:
    7,351
    GPU:
    GTX 1080ti
    Can't, leaving unused pins in the trace and socket allow for noise to affect speeds and stability.
     

Share This Page