Why not always have triple buffering on?

Discussion in 'Videocards - AMD Radeon Drivers Section' started by Coldblackice, Mar 28, 2013.

  1. Coldblackice

    Coldblackice Member Guru

    Messages:
    129
    Likes Received:
    0
    GPU:
    EVGA 3080 FTW3
    Is there any reason to not have triple buffering permanently enabled (through either D3DOverrider or RadeonPro)?

    Seems like there are little downsides to doing so, but many upsides.
     
  2. yasamoka

    yasamoka Ancient Guru

    Messages:
    4,875
    Likes Received:
    259
    GPU:
    Zotac RTX 3090
    An additional frame of input lag. This is where Adaptive / Dynamic VSync comes in. You will always have compromises when syncing to refresh rate with the framerate lower than the refresh rate.
     
  3. kevsamiga1974

    kevsamiga1974 Master Guru

    Messages:
    882
    Likes Received:
    1
    GPU:
    EVGA GTX 580 SC
    Increased VRAM use...
     
  4. The Mac

    The Mac Guest

    Messages:
    4,404
    Likes Received:
    0
    GPU:
    Sapphire R9-290 Vapor-X
    incorrect.

    TB counteracts input lag caused by vsync, not the other way around.

    The only disadvantage is as kevsamiga has said. Incread vram usage.

    If you have a 2 or 3 gig card however, its minimal and will not impact anything.

    So the answer is, as long as you have over 1gig of vram (you appear to have 3gig) on you card, leave it on.
     
    Last edited: Mar 28, 2013

  5. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Only real downside is additional lag of size of 1 frame.
    1s/selected display frequency

    60 Hz: +16.7ms
    120 Hz: +8.3ms
    increase in lag. No problem at all in most single player games. Only issue in highly competitive FPS online games.
     
  6. Espionage724

    Espionage724 Guest

    What some game said for an in-game tooltip about triple buffering in the past, it decreases input latency, while increasing GPU usage
     
  7. The Mac

    The Mac Guest

    Messages:
    4,404
    Likes Received:
    0
    GPU:
    Sapphire R9-290 Vapor-X
    incorrect.

    This is for "Flip Que" or Render Ahead, NOT DirectX Tripple Buffering...

    DirectX Tripple buffering under no circumstances introduces any lag whatsoever.

    Anyone who tells you otherewise is confusing Render Ahead Buffers for directx tripple buffering.
     
    Last edited: Mar 28, 2013
  8. yasamoka

    yasamoka Ancient Guru

    Messages:
    4,875
    Likes Received:
    259
    GPU:
    Zotac RTX 3090
    How does it counteract input lag caused by VSync?

    In all my readings about it, I've seen that it causes an additional frame of input lag.

    The way I understand it, having two back buffers and having to display the older frame without dropping it, means that an additional frame of input lag is added.

    I also felt the added lag in Crysis 2, and had to lower flip queue size, which you mentioned has no effect in newer drivers, yet subjectively it did, but I do not feel any added input lag in BF3 (all done with VSync on, of course).

    I sure hope what you're saying is true. May I hear your explanation please?

    Thanks.
     
  9. ArgonV

    ArgonV Master Guru

    Messages:
    412
    Likes Received:
    6
    GPU:
    AMD XFX 7900 XTX
    Here's a good explanation on wiki of how and why triple buffering helps reduce input lag in some cases:

    In computer graphics, triple buffering is similar to double buffering but provides a speed improvement. In double buffering the program must wait until the finished drawing is copied or swapped before starting the next drawing. This waiting period could be several milliseconds during which neither buffer can be touched.

    In triple buffering the program has two back buffers and can immediately start drawing in the one that is not involved in such copying. The third buffer, the front buffer, is read by the graphics card to display the image on the monitor. Once the monitor has been drawn, the front buffer is flipped with (or copied from) the back buffer holding the last complete screen. Since one of the back buffers is always complete, the graphics card never has to wait for the software to complete. Consequently, the software and the graphics card are completely independent, and can run at their own pace. Finally, the displayed image was started without waiting for synchronization and thus with minimum lag.[1]

    Due to the software algorithm not having to poll the graphics hardware for monitor refresh events, the algorithm is free to run as fast as possible. This can mean that several drawings that are never displayed are written to the back buffers. This is not the only method of triple buffering available, but is the most prevalent on the PC architecture where the speed of the target machine is highly variable.

    Another method of triple buffering involves synchronizing with the monitor frame rate. Drawing is not done if both back buffers contain finished images that have not been displayed yet. This avoids wasting CPU drawing undisplayed images and also results in a more constant frame rate (smoother movement of moving objects), but with increased latency.[1] This is the case when using triple buffering in DirectX, where a chain of 3 buffers are rendered and always displayed.

    Triple buffering implies three buffers, but the method can be extended to as many buffers as is practical for the application. Usually, there is no advantage to using more than three buffers.
     
    Last edited: Mar 28, 2013
  10. yasamoka

    yasamoka Ancient Guru

    Messages:
    4,875
    Likes Received:
    259
    GPU:
    Zotac RTX 3090
    Thank you for the explanation.

    Indeed, I am testing now with BF3. VSync off vs. VSync on + TB, there's almost (none?) no difference with input lag. It seems the perceived (slight) additional input lag may be due to the nature of the frames being in sync with the monitor vs. them being out of sync (VSync off), if you get what I mean.

    I'm testing with an FPS cap, though, at my refresh rate. Without this FPS cap, VSync's input lag is unbearable, and Triple Buffering does seem to reduce this input lag, though not to the same degree as with the FPS cap on.

    Any reason for this?
     

  11. The Postman

    The Postman Ancient Guru

    Messages:
    1,773
    Likes Received:
    0
    GPU:
    MSI 980 TI Gaming
    CFX systems dont need to force triple buffered vsync since it is done by default when you enable vsync. Double buffered should be enough.

    Please someone correct me if I am wrong.
     
  12. yasamoka

    yasamoka Ancient Guru

    Messages:
    4,875
    Likes Received:
    259
    GPU:
    Zotac RTX 3090
    That's true. The way CFX/SLI in AFR works is similar to Triple Buffering, so it doesn't need it enabled.

    There is no double buffering possible with CFX/SLI in AFR.
     
  13. Coldblackice

    Coldblackice Member Guru

    Messages:
    129
    Likes Received:
    0
    GPU:
    EVGA 3080 FTW3
    So which triple buffering method is preferable -- RadeonPro? D3DOverrider? Other?

    And should one first try to determine if the game has native triple buffering beforehand (and then how to decide which is preferable to use -- native or non-native)?
     
  14. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Theory of VSynced Double / Triple buffering is one thing. real world implementation is another story.
    If Double buffering waited for monitor to redraw image before flipping and rendering new image to back buffer, then double buffering would completely eliminate frame tearing but increased delay between what's on LCD and actual state of engine. But it does not sync with monitor refreshes to keep speed and image as actual as possible.

    Triple buffering does not sync to monitor properly too, but 2 back buffers remove tearing on PC buffer side level.

    There is:
    Buffer in LCD = Front Buffer in PC <=> Backbuffer 1 / Backbuffer 2
    Buffer in LCD and front buffer practically works as shared memory and are always exactly same.
    Backbuffer 2 flips with Front buffer only when it's ready/completely rendered.
    Same goes for BB1. If flips not by copying which takes time, it simply changes small address (pointer) to where each buffer is held so.

    Technically speaking normal no vsync and double vsync works very same.
    since both use double buffering.

    Triple buffering can be used without vsync. And as I wrote if vsync flipped image at time of vertical blanking period (time when monitor renders "empty" rows bellow/above visible area) then triple buffering would not be necessary for single displays environments.

    Edit: I personally see TB as "solution/override" to broken fast DB VSync.
     
    Last edited: Mar 29, 2013
  15. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    <I guess it's too long to read so skip to Bold text down there.>:bang:
    Just to add about tearing (as it should be in theory, implementation is another thing):
    NoVSync DB - unlimited number of tears based on how many times actual fps is higher than LCD refresh rate [Hz] + 1 for bad LCD timing sync (can be hidden by chance)
    NoVSync TB - One tear for bad LCD timing sync (can be hidden by chance)
    VSync DB - unlimited number of tears based on how many times actual fps is higher than LCD refresh rate [Hz]
    VSync TB - No tearing

    As for delay (lag) lets make model situation of 50Hz to make it simple:
    Triple Buffer has 2 backbuffers:
    Backbuffer 1 is fully rendered and is "Aging", while Backbuffer 2 "Split" is being overwritten and becomes Aging buffer with full content and BB1 starts it's overwrite.
    Once "Aging" buffer is flipped, "Split" buffer completes its render circle and starts aging while BB2 (previously FB memory region) is rendered again and again to have as new data as possible.
    You can imagine is as One complete image and second image which has bottom half newer than full image with upper half which is from present time.

    Without VSync Aging period is out of business so technically time delay is not increased.
    With v SYnc:
    Monitor refreshes every 20ms. If new image (aging buffer) is ready on refresh it's flipped. Lets have it always ready since he have powerful PC capable to do 100fps.
    Whats happening in BBs in those 20ms after flip:
    Once BB2 is not half older/new but just new, it starts aging. Ready time at 100fps system is 0.01 ~ 10ms since it could be at last pixel of image.
    BB1 is new halved buffer half blank half rendered at 1st it will take 10ms for it's filling.
    Afterwards circle continues with BB2 again. Till monitor 20ms cycle goes around and asks for data. Which are not provided from Split buffer, but from "Aging" buffer.
    At this model at time of Monitor refresh "Aging" buffer is 10~20ms old, while "Split" buffer is old between 0.01~10ms but contains proportionally older data. (at 5ms it has 50% new content and 50% content older than "Aging" buffer.)

    While double buffer only have that "Split" buffer which in reality is always partially filled with older and newer data.
    Since we had model with 100fps system, upper part is new image and lower part is 10ms older image.
    So situation vary between 2 extremes At time of flip:
    1) New part is just 1 pixel and that is 0.01ms old. All bellow is from previous image and is therefore 10.01ms old.
    2) New Part is covering all except 1 pixel, it's new but rendering of all those pixels took 9.99ms anyway. And last pixel from previous image is 20ms old.
    Usual situation:
    upper 50% rendered: New part is 5ms old since it took that much time to render 1/2 of image, lower part is 15ms old.

    If you average age of each pixel you would get to 10ms with DB with nasty 10ms difference somewhere in middle at any time.
    While TB age of ready to flip buffer ("Aging" one) is in such situation between 10~20ms, averaging to 15ms age.


    So it's not really correct to add one whole frame time to TB process, but still it is increasing lag by 1/(2*FPS) . So on system with 60Hz LCD and HW capable to do average 60fps, TB adds in average 8.3ms instead of stated 16.7ms.
    And real world tearing behaves differently than theoretical one too. And varies between intel/AMD/nVidia.
     

  16. The Mac

    The Mac Guest

    Messages:
    4,404
    Likes Received:
    0
    GPU:
    Sapphire R9-290 Vapor-X
    here is an easy to understand explaination:

    http://www.anandtech.com/show/2794

    @fox2232: Do not confuse "input lag" with "rendeirng latency". They are not the same thing.

    personally, i find the versitility of Radeon Pro far superior to DXTory. There are a whole host of other visual enhancement in Radeon Pro that are not available elsewhere without mucking about with dll injectors(smaa, sweetfx, fxaa, HBAO, etc)
     
    Last edited: Mar 29, 2013
  17. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    I've read that exact thing as they made it. With pictures it's easier to explain. But their statement that DB with VSync eliminates completely tearing is not unfortunately from this world.

    And relation between Frontbuffer and monitor buffer is in their theory different too. In reality what's in FB goes immediately and repeatedly to display till FB changes.
    While their images gives feeling, that monitor lovely asks for new full image. (Which adds one more theoretical buffer)

    But in general it's nicest and most user friendly explanation on net I saw anyway.
    Mine is pretty dirty without images. If I drew them, it would allow me to skip parts where I wanted to allow people to imagine what content of buffers would be at different times and what are methods to sync flipping.
     
  18. The Mac

    The Mac Guest

    Messages:
    4,404
    Likes Received:
    0
    GPU:
    Sapphire R9-290 Vapor-X
    i would agree, its not complete.

    But it would be difficult ot go into more detail without some serious engineering knowlegde.

    Trying to analyse the latency between the front buffer availabilty, and actual on screen display is picking some very large nits.

    as far as mitigating any addition screen tearing, a simple frame limiter is all that is needed.

    Most people find that Vsync+TB+59FPS limit mitigates 99% of tearing, and introduces no desernable input lag.

    This is achieved in radeon pro by setting vsync to on, TB on, and DFC to 59fps.
     
    Last edited: Mar 29, 2013
  19. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Fully agree on that, I use VS+TB+DFC 120fps @120Hz. For one additional reason:
    Lower fps than Hz introduces occasional double wait time.
    Situation:
    LCD refresh every 10ms. Image ready every 11ms. but is 0.5ms delayed.
    Code:
    Frame	ready time [ms]
    1	10,5
    2	21,5
    ...
    9	98,5
    10	109,5
    11	120,5
    12	131,5
    
    10th frame will be rendered at 110th ms from start, but 11th image missed time frame to be displayed at 120th ms and will be shown at 130th ms.

    This is as rare as once per second if DFC is set 1 lower than display Hz.
    (And not very noticeable, impossible to measure this by fraps/RPro frametime benchmark. If they measured times when FB pointer changes instead of times where new calls are made into pipeline, then they would measure this.
    And real outputted frame times and provide real insight into what's going from GPU instead of what's going in and at what rate.)

    Having them same fps & Hz does not remove this, just reduces it's rate since there are times where frame times are longer than required to achieve true 120fps.
     
  20. The Mac

    The Mac Guest

    Messages:
    4,404
    Likes Received:
    0
    GPU:
    Sapphire R9-290 Vapor-X
    at 120hz, this may be the case, but for most of the world we are lucky to get 60fps.

    yes, at 59fps limit, you are effective skipping one frame every second, but it guarentees there will always be a full frame available to the front buffer avoiding tearing. All its really doing is forcing the back buffers to further decouple (TB is suppsed to do that, but in real world senarious it isnt always perfect) from vsync. Qualitatively, its not going to be noticeable.

    from the interview with AMD, yesterday, im sure its only a matter of time beofre some tools appear that can measure pipeline thoughput (ie INTO the pipeline, as apposed to fraps that measures the front buffer flip)
     
    Last edited: Mar 29, 2013

Share This Page