How does capping framerate reduce input lag?

Discussion in 'Rivatuner Statistics Server (RTSS) Forum' started by BlindBison, Oct 2, 2018.

  1. BlindBison

    BlindBison Ancient Guru

    Messages:
    2,414
    Likes Received:
    1,139
    GPU:
    RTX 3070
    Hi there guys,

    I read here: https://forums.guru3d.com/threads/rtss-vs-in-engine-framerate-capping.415494/

    that frame limiters can reduce input lag by 1 or 2 frames (1 if it's Rivatuner -- 2 if it's an in-game frame limiter) where if you hit 70 fps uncapped your input lag will be greater than if you hit 70 fps capped.

    This latency reduction also goes a long way towards explaining the low-lag traditional vsync implementations that have been described on Blur Busters that involve capping your framerate to keep buffers empty (as I understand it).

    My question is in the title -- why or rather how does capping your framerate reduce latency over the latency you'd have if you weren't limiting your framerate but were hitting the same frames per second value?

    Is it because the game can measure input closer to the delivery of the frame since it knows exactly when to expect it?

    I don't really get how something like this is possible and I'd really like to know how it works. Thank you very much, I really appreciate it.
     
  2. BlindBison

    BlindBison Ancient Guru

    Messages:
    2,414
    Likes Received:
    1,139
    GPU:
    RTX 3070
    @RealNC

    Tagging you if you have the chance to take a look -- I'm basing this question largely off of your answers from the other thread mentioned in the OP here. Thanks for your time.
     
  3. RealNC

    RealNC Ancient Guru

    Messages:
    4,954
    Likes Received:
    3,233
    GPU:
    4070 Ti Super
    I don't know 100% for sure, but I have a pretty good guess: when the GPU is completely saturated, the CPU can run ahead of the GPU. The game will start preparing new frames faster than the GPU can render them. This means that the CPU will need to wait at some point until the GPU is ready to render the next frame. (It's a bit like vsync backpressure, but not as bad.) The higher "max pre-rendered frames" is set to, the worse the lag gets.

    A frame limiter prevents that. It keeps the CPU from preparing more frames in advance. The lowest input lag is achieved when the CPU is not outpacing the GPU.
     
    BlindBison likes this.
  4. BlindBison

    BlindBison Ancient Guru

    Messages:
    2,414
    Likes Received:
    1,139
    GPU:
    RTX 3070
    @RealNC

    Thanks a lot, that's helpful.

    One thing that's confusing me is the "Max Prerendered Frames" setting (Nvidia control panel).

    Intuitively I would think setting this to 1 (minimum available) would have the same effect as limiting your framerate or capping just under refresh (59.9 on a 60 Hz panel) for Low Lag Vsync described on Blur Busters since it should force the CPU to stop preparing extra frames.

    What does capping your framerate achieve that reducing the max prerendered frames setting does not? If MPF is set to 1, what is frame-limiting doing to further reduce latency? (How can it reduce more than the 1 that is already being forced I mean)

    For example:

    1) Framelimit just beneath refresh rate (59.92 on 59.935 Hz panel) w/ Rivatuner
    2) Set max prerendered frames to 1
    3) Use double buffered, not triple buffered Vsync to reduce buffered frames/input lag

    Seems to be the proper approach for lowering input latency, but I have no idea why this works and going purely off of what I've read thus far about how the issue is the CPU preparing extra frames, I would think that changing MPF to 1 would do all of the above by itself, but that simply doesn't seem to be the case (I've tested this myself and each step does seem to noticeably help)
     
    Last edited: Oct 3, 2018

  5. CaptaPraelium

    CaptaPraelium Guest

    Messages:
    229
    Likes Received:
    63
    GPU:
    1070
    Don't forget what was said regarding OW/CSGO and co, where lower 'pressure' on the GPU nullified these effects. Related, is the need for the 150% scaling in-game to obtain these results.
    Everything RealNC has said so far is correct but I think given the audience at these forums it's important to emphasise one word: "Can". As in, "...can reduce input lag" doesn't always mean it "will".
    To be clear this isn't a correction I'm just trying to draw attention to part of what you're saying here so it isn't misunderstood.
     
  6. RealNC

    RealNC Ancient Guru

    Messages:
    4,954
    Likes Received:
    3,233
    GPU:
    4070 Ti Super
    With a frame limiter, it's as if you were using MPRF 0. Meaning the CPU will not even try to prepare a single frame in advance.
     
  7. RealNC

    RealNC Ancient Guru

    Messages:
    4,954
    Likes Received:
    3,233
    GPU:
    4070 Ti Super
    That is true. In games like CS:GO and OW, this isn't much of a problem. In other games though, it's a bigger problem. Games like Witcher 3, the Tomb Raider games, Skyrim SE, stuff like that, will generally max out your GPU, especially at 1440p and 4K. This is where it can help a lot to tune the graphics settings of the game in order to reach a target FPS cap most of the time.
     
  8. CaptaPraelium

    CaptaPraelium Guest

    Messages:
    229
    Likes Received:
    63
    GPU:
    1070
    Right. So it's important for people to realise that if they are trying to reduce latency on some GPU-heavy game on ultra settings at high res, then there is benefit to this, but perhaps in other cases such as playing FPS games with settings reduced to low for higher framerates, not so much. It's very much a case-by-case situation.

    The CPU always has to prepare the frame in advance of the GPU.
    The answer to OP's question is that MPRF 1 limits the CPU's framerates to whatever the GPU can maintain but that doesn't always equate to a framerate limit below the refresh rate, or that vsync will have the effect of limiting framerates. Your post linked from the blurbusters article he referred to, explains this. https://forums.guru3d.com/threads/the-truth-about-pre-rendering-0.365860/page-12#post-5380262 ...Edit: Well, that's an educated guess since we don't have any actual data to work with here.
     
    Last edited: Oct 3, 2018
  9. RealNC

    RealNC Ancient Guru

    Messages:
    4,954
    Likes Received:
    3,233
    GPU:
    4070 Ti Super
    When not using a CPU-based frame limiter, that's true. With RTSS at least, it isn't. It will just block the game, and thus it won't be able to prepare a frame in advance. It's as if you could set MPRF to 0. Like fully synchronous rendering. Kind of.

    Normally, because we're using at least two CPU cores these days, CPU and GPU are asynchronous. The game can prepare a frame on one thread, the GPU driver can do stuff on another, and stuff happens in parallel. Which is fine. Unless the GPU is fully maxed out, in which case the CPU doing stuff in advance becomes a bad thing, because that stuff it did will become old (like reading mouse input from the player) as it will have to wait for the GPU. So that 1 pre-rendered frame will then actually have an impact on input lag. With a frame limiter this won't happen, since there's not even one pre-rendered frame prepared in advance. It will only be prepared when the GPU is actually ready to render it immediately afterwards.

    You would think that this would impact performance in a negative way. But remember that if the FPS cap is reached, you're there already, you don't need more FPS. If the FPS cap is not reached, then the frame limiter won't activate anyway, and you have normal asynchronous frame preparation as usual. Magic :p
     
    Last edited: Oct 3, 2018
    BlindBison likes this.
  10. BlindBison

    BlindBison Ancient Guru

    Messages:
    2,414
    Likes Received:
    1,139
    GPU:
    RTX 3070
    @RealNC @CaptaPraelium

    Thank you for your replies, that's very helpful.

    I'm left with just one remaining question here then and it's regarding Rivatuner VS in-game frame limiters and the latency disparity there.

    Alright, so for example -- we have our Max pre-rendered frames set to 1 in the Nvidia Control Panel and we use RTSS to limit the framerate resulting in the minimum latency at that target framerate while we're hitting it -- effectively we have 0 prerendered frames as stated/explained above.

    How then do In-game frame limiters push input lag/latency down even further than this? Or, said another way -- how can we get lower than 0 Pre-rendered frames with the in-game limiter?

    I had read previously on one of Durante's posts for his GeDoSaTo tool that one thing that can be done while limiting the framerate is reading input from players closer to the delivery of the actual frame since you know exactly when that frame will be delivered. Is this what In-game limiters are doing that Rivatuner is not then?

    How is it that in-game limiters can reduce latency by more than RTSS if RTSS + NCP Max Prerendered Frames set to 1 already gives 0 prerendered frames?

    Thanks for your time, I really appreciate it.
     

  11. RealNC

    RealNC Ancient Guru

    Messages:
    4,954
    Likes Received:
    3,233
    GPU:
    4070 Ti Super
    You're answering your own question below :)

    That is most probably the reason. Actually not even just "probably", but it looks like it has to be the main reason. An in-game limiter is part of the game itself, so the developers of the game know exactly when to wait. A good frame limiter will not read input from the player first, and wait afterwards. It will first wait if needed, and only then read player input. And a really good in-game limiter will actually wait even more, predicting how long the next frame will take to render. Almost no games do this, however; if done though, it can reduce input lag as close to zero as the frame render time can allow. I think a third party Quake engine implementation has this, and Durante added an option for this in GeDoSaTo. I don't know of any others.

    RTSS doesn't do that (it's not an easy thing to do externally.) All it can do is block the game at the time it tries to present a new frame. Which still gets rid of the pre-rendered frame backpressure when the GPU is maxed out, so input lag is lowered. Just not as much as an in-game limiter could. (If the in-game limiter is actually implemented correctly. It could be that some games have bad frame limiters that don't provide any advantage over RTSS.)
     
    Last edited: Oct 3, 2018
    BlindBison likes this.
  12. BlindBison

    BlindBison Ancient Guru

    Messages:
    2,414
    Likes Received:
    1,139
    GPU:
    RTX 3070
    @RealNC

    Thanks a lot, that's helpful.

    I wonder if that's what many 30 fps console titles do -- read input very close to the frame delivery since they know exactly when to expect the next frame.

    Based on the Test BattleNonsense (Youtube Channel) ran on RTSS and Overwatch's in-game frame limiter, Overwatch seems to do something like this then since its in-game limiter had lower latency than cappping FPS via Rivatuner (though the frame persistence/pacing was not as accurate as RTSS).

    I think it would be so awesome if RTSS had a feature like this (like GeDoSaTo/some In-game limiters) -- where you could delay reading input so it's read closer to the delivery of the frame.

    Especially since RTSS is so accurate in it's frame times if at all possible this would seem to me like the logical next step in further reducing latency. Maybe someday :p I can dream.

    Durante seems to have managed it with GeDoSaTo so it should be possible I would think (though I've never used GeDoSaTo before).
     
    Last edited: Oct 3, 2018
  13. RealNC

    RealNC Ancient Guru

    Messages:
    4,954
    Likes Received:
    3,233
    GPU:
    4070 Ti Super
    I don't know. I suspect it's not doing any frame render time prediction (that can go wrong, especially in such low frame rates with huge frame render time variations.) Limiting the frame rate to 30FPS in PC games with RTSS does actually give you quite OK input lag. With an in-game limiter it's even lower. And that's with a mouse even. With a controller, games become quite playable, as input lag is not very apparent. Mouse look is where you really feel input lag the most. Using analog sticks feels very disconnected from the game to begin with, so input lag is not too important.

    That would be great for vsync users. Or get a g-sync display which solves most of the problem :p Any further reductions in latency through predictive frame limiting is then in the realm of diminishing returns.
     
    BlindBison likes this.
  14. BlindBison

    BlindBison Ancient Guru

    Messages:
    2,414
    Likes Received:
    1,139
    GPU:
    RTX 3070
    @RealNC

    Yeah, I was planning to pick up a G-sync monitor this fall + upgrade my PC, but I'm underwhelmed by the RTX series of cards.

    Their price-to-performance ration seems poor and the RTX cores + DLSS I probably wouldn't even use. Ray Tracing based on the demos and benchmarks we've seen so far simply costs too much for anyone playing at 1440p or certainly 4K and I've read from two sources now that 4K DLSS is almost identical to 1800p + TAA/SMAA so what's the point? Why not just use those settings instead?

    Anyway, I'm hoping for a 1080 Ti price cut soon -- will probably pick up a 1440p Gysnc monitor and a 180 Ti / 8700K Prebuilt if I find one for a good price there.

    But yes, especially for traditional V-sync users, that delay reading input frame limit feature would be very helpful if it worked correctly -- it's odd to imagine the technical reasons behind it being so difficult given that an accurate frame limiter with frames being delivered at the target framerate should be able to get away with reading input very close to the actual frame delivery there I would think.

    Do you know if there's anywhere I could message/communicate with the RTSS developers about this kind of thing? Couldn't hurt to make a feature request.

    Of course they may already be aware of that approach and of course it's possible it may have been ruled out for a variety of technical reasons, but I'd feel a lot better personally knowing they're aware and have considered it going forward or why it hasn't been implemented similarly to Durante's solution there.
     
  15. RealNC

    RealNC Ancient Guru

    Messages:
    4,954
    Likes Received:
    3,233
    GPU:
    4070 Ti Super
    It might actually be a good idea to grab a 1080 Ti sooner rather than later, since they will become expensive in a while. Even though it doesn't make sense, previous gen GPUs become more expensive rather than cheaper as time goes on; it's weird, but it always happens. They were overstocked a while ago (due to the mining craze), and NVidia forced all that stock to their partners, but they actually halted production of new Pascal GPUs. It might be that prices right now are very close the lowest they will ever be, since the available stock is at it's maximum right now, and it's very unlikely that NVidia will ever produce more Pascals in the future.

    Also, getting a g-sync monitor feels like a GPU upgrade anyway. At least if you're a vsync user. Playing at 90FPS g-sync feels like 120FPS/120Hz vsync, and there's no stutters in FPS < Hz situations, so it makes it feel like you have a more powerful GPU than you actually do. A 1080 Ti with a g-sync display is very future-proof, especially with a 1440p display (as opposed to 4K.)

    Well, you can post a thread in this section:

    https://forums.guru3d.com/forums/rivatuner-statistics-server-rtss-forum.54/
     
    Last edited: Oct 3, 2018
    BlindBison likes this.

  16. CaptaPraelium

    CaptaPraelium Guest

    Messages:
    229
    Likes Received:
    63
    GPU:
    1070
    Multithreading still exists (existed? LOL you get me) on single core processors. Anyway that's besides your point, I'll get back on track.... The asynchronism here is due to the GPU hardware being separate, it's not the two CPU cores, it's the one CPU and the one GPU.

    API calls must be converted to hardware-specific instructions by the driver, before the GPU hardware can receive those instructions. You always must have one pre-rendered frame.

    So as per the above, it's still pre-rendered, just waits until the GPU queue is empty, so there is no wait (ie: latency) for the GPU, and as mentioned above, this will only have a positive impact if you are waiting for the GPU.
    This is why in the abovelinked thread, the resolution scale in game had to be raised to 150%, to see the benefit.
    It's not the additional pre-rendered frames that are doing this, you always need the frame rendered by CPU first, it is the lack of wait for the GPU.
    Like you said:
    Even still, it would have the negative impact of reducing to as much as halving your framerate since the CPU is not allowed to do its work while the GPU is busy, so now the GPU will have to wait for the CPU to prepare (pre-render) the frame before it can carry on....but of course, there's this:
    So if you can always maintain a framerate above the cap you want then it's always good. If you're dipping below the cap, then MPRF 1 will take care of ensuring no additional pre-rendered frames.

    So... it can (there goes that word again) but it really depends. In a perfect world, our CPU pre-render occurs in the same time frame as the GPU rendering and neither ever has to wait for the other and we maintain effective asynchronous operation with minimal input lag. Of course, in the real world, we can become GPU or CPU bound and then things change.

    In OP's case, with a 770 and a 75Hz display with vsync on, I'm sure this will be beneficial to him. I'm not arguing the advice you're giving to him, just being picky with some details because I know that gamers are prone to taking snake oil and might mistakenly think that this is always best for everyone.

    XD
    I have been waiting for @Unwinder to not be busy with Turing, but a related feature I wanted to request is context queue depth monitoring. Would be very handy for fine-tuning stuff like we've been discussing, and I think it may be necessary in order to implement blocking simulation threads (like reading input) as requested by OP.
     
    BlindBison and RealNC like this.
  17. RealNC

    RealNC Ancient Guru

    Messages:
    4,954
    Likes Received:
    3,233
    GPU:
    4070 Ti Super
    The issue is that dipping below the cap happens for a reason. If that reason is a GPU bottleneck, then you get the sudden input lag increase.

    Unwinder added an experimental throttling option that we suggested to have RTSS throttle the game at all times, even if the cap wasn't reached, in the hope that this will always prevent the CPU from waiting on the GPU (meaning it's as if the game is always capped, even if it dips below the cap,) but it didn't work. No idea why, so it wasn't worth pursuing further. Someone smarter than me could perhaps come up with a "throttling" limiter that could achieve this.

    Unreal Engine's frame limiter has a feature called "smoothed framerate" which does something similar, but for entirely different reasons. Maybe that would be the way to do this. Take the average FPS of the past 20 frames or so (a number I just pulled out of my ass,) to do a dynamic cap based on that average.
     
    Last edited: Oct 3, 2018
    BlindBison likes this.
  18. BlindBison

    BlindBison Ancient Guru

    Messages:
    2,414
    Likes Received:
    1,139
    GPU:
    RTX 3070
  19. CaptaPraelium

    CaptaPraelium Guest

    Messages:
    229
    Likes Received:
    63
    GPU:
    1070
    Since you're capped at a framerate approaching half the framerate the card is capable of (as a result of synchronous operation), this would require a sudden jump in GPU frametime for that frame, approaching double what you've capped it to. This is a stutter anyway and a problem that should be fixed elsewhere.

    The trouble with predictive frame capping like we've discussed here, is that like most performance tweaks, it's a trade off...The problem is what we're trading. If we under-predict the blocking time, than it's no big issue, it just wastes some opportunity to further reduce input lag... but, if we over-predict it, we're looking at duplicate frames. Not good. The only way to really ensure it is to be so conservative with blocking times, that we now lose so much opportunity to reduce input lag, that we're not saving much input lag at all. Furthermore, we have now reduced the input lag by some small amount (at 60FPS we might save a few ms at best) but we have introduced a variance in input lag whereas before we might have had 2-3ms more lag but it was the same amount of lag on every frame, so we've saved 5ms sometimes but at the cost of it varying by 5ms all over the place. Is that really worth it?

    Certainly we would be better off for input lag, by not capping the framerate, given the capability of the GPU. Of course, there is the issue of tearing, but if you've spent enough on the GPU that it has almost double the required power to meet your desired refresh rate, perhaps some of that money would have been better spent on an adaptive sync monitor.... Or perhaps it's worth considering, reducing a graphics option or two, to eek out that extra little bit of performance. Of course, then there's the argument that it doesn't look like you want it to look.... I could go on and on here but the end result is that we would need to trade one thing for another and none of them are really going to get us what we want. We always end up sacrificing framerate or input lag or latency variation or graphics quality or resolution or dropped frames or something.

    I hate being the guy to say this kind of stuff but here goes.....
    I think there comes a time when we have to accept that in order to achieve certain desired performance, appropriate hardware is required. If the hardware isn't available, we need to lower our expectations to meet its capability. Basically I'm saying, it's time to either turn down graphics options or buy new gear.
    Like realNC said very early in the thread,
    Or a better GPU, or, or, or.... There's only minor tweaks we can really get through software before the cost approaches the benefit.
    Pls no shoot messenger :(
     
  20. RealNC

    RealNC Ancient Guru

    Messages:
    4,954
    Likes Received:
    3,233
    GPU:
    4070 Ti Super
    @CaptaPraelium I wasn't talking about predictive limiting. Just normal limiting, and why it's beneficial in the majority of cases when you play GPU-bound games.
     

Share This Page