Fix game stutter on Win 10 1703-1809

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by Mott, Apr 3, 2018.

  1. Astyanax

    Astyanax Ancient Guru

    Messages:
    9,069
    Likes Received:
    3,110
    GPU:
    GTX 1080ti
    Microsoft said they fixed it

    • Addresses an issue that ignores the MM_DONT_ZERO_ALLOCATION flag. This issue leads to degraded performance, and, occasionally, error 0x139 appears.
    Standby list uses this flag, regions of memory in use by the standby list are available for immediate overwriting with new process data.

    The issue was mitigated to a high degree following this, however what remains is that the standby cacher is extremely aggressive for affected users resulting in janky memory mapping.
     
    AveYo likes this.
  2. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,276
    Likes Received:
    7,357
    GPU:
    GF RTX 2070 Super
    I am a bit skeptical about this statement. When the code allocates the memory it can use MM_DONT_ZERO_ALLOCATION flag to prevent memory manager from zeroing the allocated chunk. But how that flag can be used by memory manager in standby list is beside me. When memory is claimed from standby list it should be zeroed and placed to zeroed list - even if the chunk was allocated with MM_DONT_ZERO_ALLOCATION flag this does not mean that new allocation specifies the flag again.
    I am sure MM_DONT_ZERO_ALLOCATION flag influences the allocation of the memory, but application code should explicitly specify it.
    I am not sure that when memory manager tosses the chunks in standby list it needs named flag.
    If you have evidence then post it.

    Update: And even more
    https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-mmallocatepagesformdlex
    - so this flag is about nonpaged physical memory in kernel space (for device drivers mostly), while standby list is about paged memory for processes
    https://docs.microsoft.com/en-us/ar...f-windows-memory-management-revealed-part-two
     
    Last edited: Aug 20, 2020
    AveYo and aufkrawall2 like this.
  3. Astyanax

    Astyanax Ancient Guru

    Messages:
    9,069
    Likes Received:
    3,110
    GPU:
    GTX 1080ti
    No, the allocated memory is dereferenced without overwrite, having to zero fill a chunk of memory has cpu overhead.

    This is not entirely correct, the standby list is an extension of the file system and operates in kernel memory and has nothing to do with process memory.

    it accelerates the early read in of executables, dll's and various other files traced by the readyboot etl and layout.ini

     
    Last edited: Aug 20, 2020
    AveYo likes this.
  4. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,276
    Likes Received:
    7,357
    GPU:
    GF RTX 2070 Super
    @Astyanax

    You read the words "working set" in your quote
    https://docs.microsoft.com/en-us/windows/win32/memory/working-set

    From "Windows Internals":
    Page frame number database
    Several previous sections concentrated on the virtual view of a Windows process—page tables, PTEs, and VADs. The remainder of this chapter will explain how Windows manages physical memory, starting with how Windows keeps track of physical memory. Whereas working sets describe the resident pages owned by a process or the system, the PFN database describes the state of each page in physical memory. The page states are listed in Table 5-19.
    TABLE 5-19 Physical page states
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    • Active (also called valid)
    The page is part of a working set (either a process working set, a session working set, or a system working set), or it’s not in any working set (for example, a non-paged kernel page) and a valid PTE usually points to it.​
    • Transition
    This is a temporary state for a page that isn’t owned by a working set and isn’t on any paging list. A page is in this state when an I/O to the page is in progress. The PTE is encoded so that collided page faults can be recognized and handled properly. (This use of the term transition differs from the use of the word in the section on invalid PTEs. An invalid transition PTE refers to a page on the standby or modified list.)​
    • Standby
    The page previously belonged to a working set but was removed or was prefetched/clustered directly into the standby list. The page wasn’t modified since it was last written to disk. The PTE still refers to the physical page but it is marked invalid and in transition.​
    • Modified
    The page previously belonged to a working set but was removed. However, the page was modified while it was in use and its current contents haven’t yet been written to disk or remote storage. The PTE still refers to the physical page but is marked invalid and in transition. It must be written to the backing store before the physical page can be reused.​
    • Modified no-write
    This is the same as a modified page except that the page has been marked so that the memory manager’s modified page writer won’t write it to disk. The cache manager marks pages as modified no-write at the request of file system drivers. For example, NTFS uses this state for pages containing file system metadata so that it can first ensure that transaction log entries are flushed to disk before the pages they are protecting are written to disk. (NTFS transaction logging is explained in Chapter 13, “File systems,” in Part 2.)​
    • Free
    The page is free but has unspecified dirty data in it. For security reasons, these pages can’t be given as a user page to a user process without being initialized with zeroes, but they can be overwritten with new data (for example, from a file) before being given to a user process.​
    • Zeroed
    The page is free and has been initialized with zeroes by the zero page thread or was determined to already contain zeroes.​
    • Rom
    The page represents read-only memory.​
    • Bad
    The page has generated parity or other hardware errors and can’t be used (or used as part of an enclave).​
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    ...
    Of the page states listed in Table 5-19, six are organized into linked lists so that the memory manager can quickly locate pages of a specific type. (Active/valid pages, transition pages, and overloaded “bad” pages aren’t in any system-wide page list.) Additionally, the standby state is associated with eight different lists ordered by priority.

    Page list dynamics
    Figure 5-37 shows a state diagram for page frame transitions. For simplicity, the modified-no-write, bad and ROM lists aren’t shown.
    [​IMG]

    Page frames move between the paging lists in the following ways:
    ■ When the memory manager needs a zero-initialized page to service a demand-zero page fault (a reference to a page that is defined to be all zeroes or to a user-mode committed private page that has never been accessed), it first attempts to get one from the zero page list. If the list is empty, it gets one from the free page list and zeroes the page. If the free list is empty, it goes to the standby list and zeroes that page.
    One reason zero-initialized pages are needed is to meet security requirements such as the Common Criteria (CC). Most CC profiles specify that user-mode processes be given initialized page frames to prevent them from reading a previous process’s memory contents. Thus, the memory manager gives user-mode processes zeroed page frames unless the page is being read in from a backing store. In that case, the memory manager prefers to use non-zeroed page frames, initializing them with the data off the disk or remote storage. The zero page list is populated from the free list by the zero page thread system thread (thread 0 in the System process). The zero page thread waits on a gate object to signal it to go to work. When the free list has eight or more pages, this gate is signaled. However, the zero page thread will run only if at least one processor has no other threads running, because the zero page thread runs at priority 0 and the lowest priority that a user thread can be set to is 1.
    ■ When the memory manager doesn’t require a zero-initialized page, it goes first to the free list. If that’s empty, it goes to the zeroed list. If the zeroed list is empty, it goes to the standby lists. Before the memory manager can use a page frame from the standby lists, it must first backtrack and remove the reference from the invalid PTE (or prototype PTE) that still points to the page frame. Because entries in the PFN database contain pointers back to the previous user’s page table page (or to a page of prototype PTE pool for shared pages), the memory manager can quickly find the PTE and make the appropriate change.
    ■ When a process must give up a page out of its working set either because it referenced a new page and its working set was full or the memory manager trimmed its working set, the page goes to the standby lists if the page was clean (not modified) or to the modified list if the page was modified while it was resident.
    ■ When a process exits, all the private pages go to the free list. Also, when the last reference to a page-file-backed section is closed, and the section has no remaining mapped views, these pages also go to the free list.
    ...

    and
    https://forums.guru3d.com/threads/a-bit-detailed-info-about-superfetch-in-windows.419263/
     
    AveYo likes this.

  5. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,276
    Likes Received:
    7,357
    GPU:
    GF RTX 2070 Super
    @Astyanax

    And about cache manager:
    Chapter 11. Cache Manager
    The cache manager is a set of kernel-mode functions and system threads that cooperate with the memory manager to provide data caching for all Windows file system drivers (both local and network). In this chapter, we’ll explain how the cache manager, including its key internal data structures and functions, works; how it is sized at system initialization time; how it interacts with other elements of the operating system; and how you can observe its activity through performance counters. We’ll also describe the five flags on the Windows CreateFile function that affect file caching.

    Key Features of the Cache Manager
    The cache manager has several key features:
    • Supports all file system types (both local and network), thus removing the need for each file system to implement its own cache management code
    • Uses the memory manager to control which parts of which files are in physical memory (trading off demands for physical memory between user processes and the operating system)
    • Caches data on a virtual block basis (offsets within a file)—in contrast to many caching systems, which cache on a logical block basis (offsets within a disk volume)—allowing for intelligent read-ahead and high-speed access to the cache without involving file system drivers (This method of caching, called fast I/O, is described later in this chapter.)
    • Supports “hints” passed by applications at file open time (such as random versus sequential access, temporary file creation, and so on)
    • Supports recoverable file systems (for example, those that use transaction logging) to recover data after a system failure
    Although we’ll talk more throughout this chapter about how these features are used in the cache manager, in this section we’ll introduce you to the concepts behind these features.

    Single, Centralized System Cache
    Some operating systems rely on each individual file system to cache data, a practice that results either in duplicated caching and memory management code in the operating system or in limitations on the kinds of data that can be cached. In contrast, Windows offers a centralized caching facility that caches all externally stored data, whether on local hard disks, floppy disks, network file servers, or CD-ROMs. Any data can be cached, whether it’s user data streams (the contents of a file and the ongoing read and write activity to that file) or file system metadata (such as directory and file headers). As you’ll discover in this chapter, the method Windows uses to access the cache depends on the type of data being cached.

    The Memory Manager
    One unusual aspect of the cache manager is that it never knows how much cached data is actually in physical memory. This statement might sound strange because the purpose of a cache is to keep a subset of frequently accessed data in physical memory as a way to improve I/O performance. The reason the cache manager doesn’t know how much data is in physical memory is that it accesses data by mapping views of files into system virtual address spaces, using standard section objects (file mapping objects in Windows API terminology). (Section objects are the basic primitive of the memory manager and are explained in detail in Chapter 10.) As addresses in these mapped views are accessed, the memory manager pages in blocks that aren’t in physical memory. And when memory demands dictate, the memory manager unmaps these pages out of the cache and, if the data has changed, pages the data back to the files.

    By caching on the basis of a virtual address space using mapped files, the cache manager avoids generating read or write I/O request packets (IRPs) to access the data for files it’s caching. Instead, it simply copies data to or from the virtual addresses where the portion of the cached file is mapped and relies on the memory manager to fault in (or out) the data into (or out of) memory as needed. This process allows the memory manager to make global trade-offs on how much memory to give to the system cache versus how much to give to user processes. (The cache manager also initiates I/O, such as lazy writing, which is described later in this chapter; however, it calls the memory manager to write the pages.) Also, as you’ll learn in the next section, this design makes it possible for processes that open cached files to see the same data as do processes that are mapping the same files into their user address spaces.
     
    AveYo likes this.
  6. AveYo

    AveYo Member

    Messages:
    43
    Likes Received:
    48
    GPU:
    8800GS 384MB
    You're over-thinking it, the flag description really has the gist of the bug:
    Games (and programs such as 7z) were doing the right thing, setting the flag when allocating their internal cache to gain i/o performance (or more often, ask a driver to do it for them).
    Nobody realized for 2+ years that under low memory, windows was mapping such allocated pages to the standby list - ignoring the flag - effectively disobeying the one mandatory condition of never exposing it to user-mode programs unless overwritten beforehand.
    The end result was that pages were deemed tainted and the game had to re-allocate instead of simply overwriting the already allocated area, defeating the purpose of setting the flag in the first place.

    But you can't talk about the standby memory bug without mentioning the massive failure of automatically clearing the dirty standby cache in gigabytes of useless stuff such as previously viewed movies, pictures, documents, updates, network files and telemetry data mostly coming from os built-in uwp apps and system processes. These went hand-in-hand.

    Now days there is not point in running the most affected versions 1703 and 1709 over 1607 or 1803+,
    tough nvidia said it patched it driver-side for any version (not available for series 500 and lower),
    and recent os and store updates no longer have standby cache quirks. 20H2 looks very promising atm even on potato PCs, if you manage to pass the setup :)
     
    Smough likes this.
  7. Smough

    Smough Master Guru

    Messages:
    573
    Likes Received:
    100
    GPU:
    GTX 1060 3GB
    I tried 1607, but imo the best version of Windows I tried was 1803, no contest. I decided to install 1809 because I was having stutter issues at some games at 1803, but looking back, maybe those games were just unoptimized AF, since all others ran perfectly and I noticed more issues with these very same games at 1809, odd stuff. For example, The Division 2 having massive stutter at Fullscreen, but not at Borderless, SW Battlefront 2 also seems to run a bit worse, has a few random stutters here and there, very rare ones, but they do happen. 1803 did not suffer from these strange issues and I am sure it's 1809, I reinstalled it a few times just to see if it would get fixed, but it does not. Means that at some point I will have to go back to 1803. I guess I will make a partition for it or just throw it on my secondary SSD and try it.
     
    AveYo likes this.
  8. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,276
    Likes Received:
    7,357
    GPU:
    GF RTX 2070 Super
    No. It is you who overthinks this. This MM_DONT_ZERO_ALLOCATION flag is for functions for device drivers (for kernel code overall), so app will not use it. Read MSDN doc carefully. This flag is used in functions from DDK - Drivers Development Kit.
    Also zeroing memory during allocation has nothing to do with standby list. Try to connect these two stages of page life for me if you see how.
     
    Last edited: Aug 30, 2020
    AveYo likes this.
  9. AveYo

    AveYo Member

    Messages:
    43
    Likes Received:
    48
    GPU:
    8800GS 384MB
    I did mention games asking the driver to do it for them. Some games requiring admin rights do "pesky" stuff themselves to squeeze performance and/or fight piracy/cheats. On this very thread we use kernel functions via pinvoke running as system, it's actually the rule, not the exception if you want to get things done fast in windows instead of relying on the over-zealous "safe" slow api's.
    Let's consider game assets, like maps or large textures. When uncompressed these are usually padded to a fixed size, to facilitate mapping them on-the-fly to a ram/disk-cache. There is no point in zeroing that cache memory beforehand if it's gonna be overwritten afterwards.
    All is fine an dandy until you get low on memory and windows system starts marking unused / rarely used such pages as standby, without noticing the flag screaming "hey, I'm private dirty memory, please clear me before you set me free". Windows system can do that because:
    [​IMG]
    And that's the connection between these two stages of page life!
     
    Last edited: Aug 29, 2020
    enkoo1, mbk1969 and Smough like this.
  10. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,276
    Likes Received:
    7,357
    GPU:
    GF RTX 2070 Super
    How exactly?

    No, my friend. Those functions are Win API functions (available in Win SDK - Software Development Kit). They are not kernel functions.

    Which code do load assets - game code (user space) or driver code (kernel code)? I think it is a game code which loads assets from files and then passes them to video card drivers.

    You understand (of course) that Windows SDK is installed on my rig(s), do you? But I have never developed drivers (for modern Windows) so DDK is not installed here.
    This is the search result for "MM_DONT_ZERO_ALLOCATION" among h-, c- and cpp-files in two folders "C:\Program Files" and "C:\Program Files (x86)":
    [​IMG]

    And these are flags for VirtualAlloc-functions in SDK:
    [​IMG]
     
    Last edited: Aug 30, 2020
    AveYo likes this.

  11. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,276
    Likes Received:
    7,357
    GPU:
    GF RTX 2070 Super
    1. I found only one function to use that "MM_DONT_ZERO_ALLOCATION" flag - "MmAllocatePagesForMdlEx":
    2. From Overview of Windows Memory Space for drivers we read:
    So if function MmAllocatePagesForMdlEx allocates only nonpaged memory this memory will never be placed in any lists (free, standby, zeroed) maintained by memory manager for pageable memory.

    PS And in Memory Management for Windows Drivers they write
     
    AveYo likes this.
  12. AveYo

    AveYo Member

    Messages:
    43
    Likes Received:
    48
    GPU:
    8800GS 384MB
    The biggest rift in W10 hardware compatibility comes with version 1703, and the weirdest comes with version 1709.
    Oem's and the corporate segment did not have enough time to pressure Microsoft into fixing their crap,
    as they had to plan an upgrade anyway in sight of Meltdown and Spectre discovery (thanks to.. 1703 release).
    Intel stole the horror show so Microsoft got away with it, the underlying issue lingered for 18 months.

    - standby memory was visibly not cleared of junk while under low physical memory
    - some programs kept increasing their allocation over time until crashing (for example DOTA 2)
    - some uwp apps never released huge media files previously opened but at least did not leak further
    - standby-clearing solutions proved effective and got popular
    - Microsoft and Nvidia could not add 1 + 1 and kept ignoring "gamers", Valve implemented engine-side workarounds
    - and then 7-zip exposed large page addressing being compromised, pointing in the right direction to get it fixed
    - going with bigger pages than default 4K and other AWE stuff for ram-caches is something that game apis do often
    - there were other reports such as compiler failures, that after inspection revealed zeroed data in the output
    - there were malware in the wild grabbing private memory by simply doing repeated allocations

    It's so very Microsoft that after they patched it, cherry-picked the most obscure and "benign" thing to publish,
    when this crap had even worse security and data reliability implications than cpu vulnerabilities.
    And since it took them so long, there will never be a driver update for all that hardware stuck with Anniversary Update.
    Not worth fixating on a certain flag, on a certain api that may or may not be the sole culprit even if Microsoft says so. Unfortunately we don't have a PoC and I'm not familiar with DX programming.
    Tough, their explanation aligns perfectly with what was observed, the devil is not even hidden in the details.
    System itself trashed security boundaries behind drivers and applications backs, causing painful reallocation and memory leaks.
     
    Smough likes this.
  13. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,276
    Likes Received:
    7,357
    GPU:
    GF RTX 2070 Super
    @AveYo

    We are just trying to separate speculations from actually possible things. The flag "MM_DONT_ZERO_ALLOCATION" bug fix is big enough without trying to attach it to standby list problems. MS could silently fix something in memory manager that affected the user processes memory dynamics - standby lists as well.
    We are programmers, and we should seek the knowledge/understanding of technologies (to show off at least).
     
    AveYo likes this.
  14. AveYo

    AveYo Member

    Messages:
    43
    Likes Received:
    48
    GPU:
    8800GS 384MB
    Don't think you can casually achieve multiple virtual allocations pointing to the same physical page, overwrite access rights and overwrite mapped data by only using canonical functions like VirtualAlloc & co. to manipulate memory all while flagging the operation corectly. But the bastard standby zeroing daemon did so with impunity for 2 years, according to Microsoft. People that have definitive information are probably under embargo, so for now (or ever) that's all we've got.
    What they did not mention is that they've addressed the large pages issue at the same time. Poorly. Since it's still there in 1803 and 1809 for many systems, with only 1903+ being symptom-free. LTS* support also means not forcing oem's to redo drivers, so no fundamental back-ports on the horizon.

    I guess doing some traces with pre- and current nvidia drivers on 1803 rtm and a game demo would shed programming light on how they've fixed the stutters driver-side and what apis were involved. But how come AMD drivers have been more resilient? Not skipping safety checks, probably.
    I agree there is value in busting this, but can't do much more than "speculate" - I miss the hardware - covid forced me to a i3-2120, GTX 560, 4GB DDR3, 1TB 5400rpm HDD, running 19042.487 - this cat is on it's 9th life, might go out with a bang at any time to celebrate the best year ever :rolleyes:
    I'm happy that 20H2 appears robust and snappy atm, firefox is smooth, and DOTA, CS:GO and RE2 are still playable. But I do miss my rog laptop, may his new owner enjoy using it for a long time..
     
    Smough and mbk1969 like this.
  15. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,276
    Likes Received:
    7,357
    GPU:
    GF RTX 2070 Super
    Do not forget that user space VirtualAlloc-functions are one thing, while MS zeroing daemon is part of OS memory manager itself working in kernel mode is completely another thing.

    Also we should not forget that not all apps do use Win API memory functions directly. Many apps do use their language runtime frameworks/environments like C, C++ (MSVC). This helps with portability.
     
    AveYo likes this.

  16. Smough

    Smough Master Guru

    Messages:
    573
    Likes Received:
    100
    GPU:
    GTX 1060 3GB
    So, closing up. 1703 and 1709 would be the "worst" Windows versions because they are doomed with standby memory issues?
     
    AveYo likes this.
  17. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,276
    Likes Received:
    7,357
    GPU:
    GF RTX 2070 Super
    Not on all rigs.
     
    AveYo likes this.
  18. Astyanax

    Astyanax Ancient Guru

    Messages:
    9,069
    Likes Received:
    3,110
    GPU:
    GTX 1080ti
    you can definitely exceed the potential for issues by having enough memory for standby list to completely populate while still having free memory for new allocation.
     
    AveYo likes this.
  19. S3r1ous

    S3r1ous Member Guru

    Messages:
    103
    Likes Received:
    13
    GPU:
    Palit 1070Ti JS
    ive stuck to 1809 ltsc for long time, but recently installed 1909
    (waiting til it passes 1k in build numbers/updates seems safe bet for stability)
    and its been real good, i would say it even feels better to use than any build before it,
    i still get stutters from time to time but thats alleviated with carefully selected customization/tuning
    started using dxvk too whenever possible to make those dx9/11 games with issues more stable
     
    AveYo likes this.
  20. Smough

    Smough Master Guru

    Messages:
    573
    Likes Received:
    100
    GPU:
    GTX 1060 3GB
    What sort of tuning?
     

Share This Page