Another look at HPET High Precision Event Timer

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by Bukkake, Sep 18, 2012.

  1. Smough

    Smough Master Guru

    Messages:
    622
    Likes Received:
    107
    GPU:
    GTX 1060 3GB
    I tested Windows 10 1909 that has the 10 MHz QPC, latency measurement with Latencymon was as low, if not lower than 1803 with some tweaks I do. Very good in the regard, but the system felt "muddy", like most of the stuff I did had an small delay to respond, fast, sure, but not as snappy as 1709 and 1803 versions are. Even with Spectre&Meltdown disabled, the 10 MHz QPC will remain, there is no way to change it. Speaking in general terms, it won't affect the normal user because not everyone can notice these changes as some of us do, gaming felt the same, no difference that I can remember.

    I decided to go back to 1803 and everything feels ok, its a bit snappier. When Windows 2004 comes out, I will try it, it promises a lot, but we'll have to see.

    Just stay on any version you have, don't rollback, try to tweak it as much as possible, fr33thy guides are ok, even so he doesn't explain certain things, some of that stuff just works. Keep drivers up to date, use DDU in case you think your GPU driver its giving you issues and install the newest one. ISLC is not needed anymore imo, but if you use Windows Defender, disable all ASLR security layers if you feel your games have some problems. You won't get hacked or anything by disabling this, don't worry.

    Remember MSI modes (guide from mbk1969) here: https://forums.guru3d.com/threads/w...ge-signaled-based-interrupts-msi-tool.378044/

    Set affinities:

    Try using Park Control, great tool: https://bitsum.com/product-update/parkcontrol-v1-0-3-0-released/

    Disable Windows spying "features": https://www.oo-software.com/en/shutup10

    Optimize the visual effects to minimize system lag due to the graphics interface.

    Google Windows 10 unneeded services and disabling them slowly, check what they are for and if you don't need it, disable, then try your system and so forth. Generally, this is to lower latency a bit and RAM usage at idle.

    DO SYSTEM RESTORE POINTS BEFORE DISABLING WINDOWS SERVICES because if your Windows stops booting, you will be able to recover. If not, you will have to reinstall.
     
    Last edited: Mar 21, 2020
  2. HeavyHemi

    HeavyHemi Ancient Guru

    Messages:
    6,954
    Likes Received:
    959
    GPU:
    GTX1080Ti

    Wow...are you really that thick? I specifically said it was the FIRST POST IN THIS THREAD. The thread we are in now. Here's the link, post one: https://forums.guru3d.com/threads/another-look-at-hpet-high-precision-event-timer.368604/
    Holy cow. WFT are you even babbling at me with? Calm down and read for content. Like I said, this thread is a running gag that keeps recycling every so often because nobody reads back more than a page if that. 8 friggen years of the same crap over and over. Derp.
     
  3. aufkrawall2

    aufkrawall2 Master Guru

    Messages:
    772
    Likes Received:
    107
    GPU:
    6800 reference UV
    Some components like the Explorer definitely got more sluggish with 1903 vs. 1809, but that doesn't mean there would be a general degradation of performance of any applications.
     
  4. Groot

    Groot Member

    Messages:
    26
    Likes Received:
    4
    GPU:
    GTX 1080

  5. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,686
    Likes Received:
    7,919
    GPU:
    GF RTX 2070 Super
    Funny thing is in both screenshots of assembly code the code is completely the same.
     
  6. Groot

    Groot Member

    Messages:
    26
    Likes Received:
    4
    GPU:
    GTX 1080
    :D Just exercising his right to be human but it does show the new way.
     
    Last edited: Mar 22, 2020
  7. BetA

    BetA Ancient Guru

    Messages:
    4,351
    Likes Received:
    299
    GPU:
    G1-GTX980@1400Mhz
    yeah, i know, he did say:

    im shure, if i ask him he could give me some more Information on this..

    Best Regards
     
  8. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,686
    Likes Received:
    7,919
    GPU:
    GF RTX 2070 Super
    He described the difference in words, so we can trust him. I am just mildly curious (and too lazy to disassemble the code myself) .
     
  9. Groot

    Groot Member

    Messages:
    26
    Likes Received:
    4
    GPU:
    GTX 1080
    Before, 1607
    Code:
    Old way, TSC divide by 1024
    
            mov     r11, [7FFE03B8H]   ; qpcbias
            rdtsc                      ; Read TSC to EDX:EAX
            shl     rdx, 32
            or      rdx, rax           ; EDX:EAX to RDX
    ;===============================
            lea     rax, [rdx+r11]     ; rax = tsc + bias
            mov     cl, [7FFE03C7H]    ; (10 for me)
            shr     rax, cl            ; divide by 1024
            mov     [QPC], rax         ; store result
    
    1903
    Code:
    New way, convert TSC to 10MHz
    
            mov     r11, [7FFE03B8H]   ; qpcbias
            rdtscp                     ; Read TSC to EDX:EAX
            shl     rdx, 32
            or      rdx, rax           ; EDX:EAX to RDX
    ;-----------------------------
            mov     rax, [r9+8H]       ; Magic Number, (10000000 * 2^64) / TSC Frequency
            mov     rcx, [r9+10H]      ; Offset (zero for me)
            mul     rdx                ; Convert TSC to 10MHz
            add     rdx, rcx           ; Apply offset (none for me)
    ;-----------------------------
            lea     rax, [rdx+r11]     ; rax = tsc + bias
            mov     cl, [7FFE03C7H]    ; (0 for me)
            shr     rax, cl            ; zero shift
            mov     [QPC], rax         ; store result
     
    I've left out some conditional code and renamed to try and make the comparison simpler. No serializing instructions in the earlier code but maybe not so important since TSC resolution is being cut so much. A 32-bit OS would be somewhat more convoluted with the multiply.

    Hope it helps.
     
    Nastya, BetA and mbk1969 like this.
  10. Smough

    Smough Master Guru

    Messages:
    622
    Likes Received:
    107
    GPU:
    GTX 1060 3GB
    So is it "better" or just the same? Or slower? Also, 1709 and 1803 use the old way, its from 1809 and upwards that the QPF it's 10 MHz.
     
    Last edited: Mar 23, 2020

  11. janos666

    janos666 Master Guru

    Messages:
    998
    Likes Received:
    159
    GPU:
    MSI RTX3080 10Gb
    Same hardware source, probably mostly the same code, roughly 2-3 times higher effective frequency... Why would you assume it to be worse?
     
  12. Smough

    Smough Master Guru

    Messages:
    622
    Likes Received:
    107
    GPU:
    GTX 1060 3GB
    Well, the QPF being higher in theory leads to more latency. Also, why was it raised in the first place? If its for security reasons or whatever, then the user should still have the right to choose it, not having it pushed into the throat. Since this Spectre&Meltdown obsession started, it seems like you must accept this security even if you may want to get rid of it.
     
  13. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,686
    Likes Received:
    7,919
    GPU:
    GF RTX 2070 Super

    One note. Can the value of (10000000 * 2^64) be stored in 64-bit register? 10000000 left shifted by 64 bits will leave zeros, imo.

    PS Second note. This the code for QPC. And I was questioning the code of QPF.
     
    Last edited: Mar 23, 2020
  14. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,686
    Likes Received:
    7,919
    GPU:
    GF RTX 2070 Super
    It was for time maintenance (synchronization) reasons. There was a link here or in stand-by memory fix thread.
    Anyway if it is still TSC then nothing to worry about. Little increase in QPF can`t cause huge problems in existing code.
     
  15. Groot

    Groot Member

    Messages:
    26
    Likes Received:
    4
    GPU:
    GTX 1080
    One would have to test it.

    It's not the frequency but the time taken to get a result.

    Yes and yes. 64-bit integer division is done using RDX as the upper 64-bits and RAX as the lower 64-bits. Integer dividing 10,000,000 by TSC frequency alone will usually result in zero therefore we multiply 10,000,000 by 2^64 which simply means in this case putting it in RDX. It also means we don't have to divide the result by 2^64 (SHR 64), just take it straight from RDX instead.

    QPF is just a hard coded value, no calculation done. Could easily be something else if wanted.
    Maybe something like
    Code:
            mov     rcx,TSCF           ; Time Stamp Counter Frequency
            mov     rdx,10000000       ; The 10MHz QPF MS wants
            xor     eax,eax            ;
            div     rcx                ;
            mov     [MagicNumber],rax  ; Store the result in HalT and share for use by QPC
                                       ; note, maybe some adjustment for rounding or not?
    
                                       ; Example, if TSCF = 3.0GHz then result is 0xDA740DA740DA74
    
                                       ; If TSC reads 6,000,000,000 then QPC adjusted would be
            mov     rcx,6000000000     ; TSC
            mov     rax,[MagicNumber]  ; 0xDA740DA740DA74
            mul     rcx                ; rdx = 19,999,999
                                       ; as QPC = QPF * TSC / TSCF
                                       ; 10,000,000 * 6,000,000,000 / 3,000,000,000 = 20,000,000
    
     

  16. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,686
    Likes Received:
    7,919
    GPU:
    GF RTX 2070 Super
    I suspect you are talking about different latencies. You meant that due to changes in latest Win10 builds namely function QueryPerformanceCounter will take a bit longer time to execute (in TSC code path), while I don`t really get what latency Smough meant because he mentioned only QueryPerformanceFrequency function - like new result of 10 MHz will increase latency comparing to old 3.smth MHz.
     
  17. Groot

    Groot Member

    Messages:
    26
    Likes Received:
    4
    GPU:
    GTX 1080
    Some QPC results with Haswell and TSC

    Code:
               QPC Calls/s   |    Relative
                 Millions    |   Speed to W7
    W7  SP1         192      |      100%
    W10 1703        192      |      100%
    W10 1709        118      |       61%
    W10 1903         87      |       45%     
    Seems 1709 introduced serializing for TSC and runs RtlQPC a little quicker than QPC, at 122 million calls per second. Results may vary with different HW / code paths and cache retention. For comparison HPET runs at around 1.67 million calls per second on a singular IO while TSC can run concurrently on each thread, better not get them out of sync though.
     
  18. Marctraider

    Marctraider Member

    Messages:
    17
    Likes Received:
    6
    GPU:
    670GTX
    Seems everyone is overthinking this, and Fr33thy doesn't seem to know what forcing max 0.5ms resolution timer does. Some of his explanation have merit, but I'm also seeing lower CPU-Z scores when changing to useplatformtick yes (5% ish). Could also be a weird thing on CPU-z side though.

    Forcing maximum resolution timer can actually slightly decreased throughput (obviously, but probably within margin of error on high-end systems) but increases granularity of a lot of things that depend on the timer resolution.

    Most prominently in-game fps limiters. So lets say you have a 120hz screen, trying to run consistently on 240fps (with in-game cap), you will get much closer to that 240 fps target due to the increased precision/granularity the timer gives, resulting in less tearing and smoother framepacing. Observed this behavior in both CSGO and various Cryengine games.

    But I don't see how this actually improves FPS, even in cpu-bound games, unless somehow they are really badly programmed and rely on timer tick for their speed.

    Also he advises to use disabledynamictick yes, so far all my results have showed slightly worse results across the board, and again lowered thoughput in games/benchmarks.

    Soon I'll have a new Z390 board installed with custom bios with HPET exposed, will be interesting to see if I can reproduce his findings.
     
  19. X7007

    X7007 Ancient Guru

    Messages:
    1,577
    Likes Received:
    29
    GPU:
    Sapphire 6900XT
    Can you say what exactly you saw with disabledynamictick yes?
     
  20. Marctraider

    Marctraider Member

    Messages:
    17
    Likes Received:
    6
    GPU:
    670GTX
    I see decreased throughput in CPU-heavy benchmarking, albeit a percentage or two at best, it is always negative, never positive. And this is to be expected; Look at linux for instance were you can i.e. compile kernel with 1000hz tick rate, vs lower tickrate or tickless, there is also a difference between throughput and, I guess 'deterministic' latency. It won't actually positively affect input latency in games i.e. As for mouse polling results, I have never seen them getting better due to this tweak either. Last but not least, dynamic ticks have been implemented now since Windows 8, and who knows how much software and drivers are now actually written with this mode in mind. Going back to 'legacy' mode could actually cause more harm than good.

    On a completely different note, as some have argued it is better to leave HPET ON in Bios, and disable in Windows; It looks like this is ONLY the case when you use 'useplatformtick yes'. If you're not using that option, you'll wont see a regression keeping HPET off in bios.

    But why use platformtick yes anyway. It causes stuttering in certain conditions, I've seen consistently lower single threaded benchmark on CPU-z with useplatformtick yes and using 0.5/1.0ms windows timer resolution. Not to mention another bug that makes windows move jerky once music is playing, because it looks like it starts to dynamically adjust the windows resolution timer on the fly with this option on instead of lowering it to a fixed value while audio stream is playing.

    Really can't recommend doing this just because a tool shows 0.500/1.000 instead of 0.496/0.997.
    It is a debug command for a reason.
     
    Last edited: Apr 25, 2020
    enkoo1, aufkrawall2 and artina90 like this.

Share This Page