MSI AB / RTSS development news thread

Discussion in 'MSI AfterBurner Application Development Forum' started by Unwinder, Feb 20, 2017.

  1. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    Forgot to document one more change I implemneted in RTSS 7.3.5 beta 4 build 27700. Those who created their own overlays with OverlayEditor most likely know that overlay layouts look best when you use them with the same font and zooming ratio as the creator. Overlay layer's sizes are specified in font width/heigh based units, so if you use a font with different width/height ratio overlay proportions can be slightly distorted, some overlay elements may look too small etc.
    To let you to see the overlays as intended by author (i.e. with original creator's font/size settings) each .ovl file stores original creator's font settings and you may always apply them after loading overlay layout on your system with Layouts -> Edit -> Master settings (the same can be done via Ctrl + Shift + M keyboard shortcut from editor). However, many users miss that fact and never do it. :) So now OverlayEditor automaticallly apply master settings when you load new overlay layout from GUI menu.
     
    Last edited: Sep 7, 2023
    Kaminari, lionhad, LazyBum and 10 others like this.
  2. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    7.3.5 beta 4 seem to be rather mature for public release, so most likely it will become the first 7.3.5 beta build available in Guru3D downloads section. I plan to put it in Guru3D news and download page a few days later to make it visible to bigger user base.
     
    ParKur, yen1co, LocoDiceGR and 8 others like this.
  3. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    After thinking a bit more, I decided to roll back this change and use the previous manual master settings apply mode instead of applying it on overlay load automatically. Automatic master settings apply can be annoying for those who peek into the editor to preview available built-in layouts without intention to use it. Resulting automatic font name and size change in this case is rather confusing and undesired.
     
    BlindBison, hitzz and Dan Longman like this.
  4. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    I mentioned that I plan to launch current RTSS 7.3.5 beta in news/download section in the nearest few days. The previous version released there was official 7.3.4, so we need to prepare cumulative changes list for all the previous 7.3.5 betas. Full changes list comparing to 7.3.4 includes the following:

    · Ported to VC++ 2022 compiler. Please take a note that due to this change RivaTuner Statistics Server will no longer be able to start under Windows XP. Please stay on the previous versions of the product if you need this OS support.
    · Please take a note that size of mandatory VC++ 2022 runtime redistributables roughly doubled comparing to the previously used VC++ 2008 redistributables, and we’d like to avoid providing overblown application distributive, drastically increased in size due to bundling newer and much heavier VC++ redistributables with it. To deal with this issue we provide our own original tiny web installer for VC++ redistributables, which allowed decreasing the size of final application distributive drastically even comparing to the previous VC++ 2008 based version. Please take a note that install time can be increased slightly due to downloading VC++ 2022 runtimes redistributables on the fly during installation. If you install RivaTuner Statistics Server offline, you can always deploy required VC++ 2022 distributives later with web installer by launching .\Redist\VCRedistDeploy.bat
    · Fixed issue in asynchronous skin scaling implementation, which could cause deadlocked RTSS.exe to stay in memory after closing application with [x] button from skinned GUI when skin scaling was enabled
    · Now uninstaller removes configuration files for OverlayEditor, HotkeyHandler and DesktopOverlayHost when you choose clean uninstallation mode. Please take a note that your own overlay layouts stored inside .\Plugins\Client\Overlays folder will never be removed during uninstallation by design
    · Now RivaTuner Statistics Server ignores its own process and DesktopOverlayHost in screen and videocapture requests. So you no longer see unwanted screenshots or videos captured from OverlayEditor's or DesktopOverlayHost's 3D windows when you open them simultaneously with other 3D applications and initiate screen or video capture
    · Improved hypertext parser:
    o Image loading <LI> hypertext tag handler has been improved to allow loading embedded images from external folders
    o Now hypertext parser supports application specific embedded images. You may use this feature to display game specific logos in your overlay layouts. Sample.ovl layout included into distributive demonstrates this technique by displaying game specific logos for Escape From Tarkov, Forza Horizon 5 and Ratchet and Clank : Rift Apart
    o Now hypertext parser accepts both ANSI and UTF-8 encoded degree Celsius symbol
    · Improved OverlayEditor plugin:
    o Fixed keyboard based layer position adjustment when “Snap to grid” option is disabled
    o Fixed buffer overrun in OverlayEditor's GUI, causing it to crash when total text length displayed in "Cell" column of "Text table properties" window was longer than 260 symbols
    o Fixed status bar panes and hypertext debugger panel rendering for high DPI scaling modes
    o Now OverlayEditor supports saving overlay layouts to or loading overlay layouts from external folders. To allow you to differentiate local (i.e. stored inside .\Plugins\Client\Overlays) and external layouts, local layouts will be displayed as naked filename only in editor's window caption (e.g. "Overlay editor - sample.ovl"), while external layouts will be displayed with full filepath
    o Now OverlayEditor supports context highlighting for text file embedding <F=textfile.txt> hypertext tag. Visual tag browser displayed when you type in <> in hypertext field also support inserting <F> tag
    o OverlayEditor is no longer rendering the very first frame with no sensor data displayed, now it is always rendering the first frame after polling all data sources
    o %CPUShort% macro is additionally packing Ryzen CPU names now. "Ryzen N" name is packed to "RN", and "Ryzen Threadripper" is packed to "TR" with this macro
    o Added conditional layers support. This powerful feature allows you to add programmability to your overlays and make them look different depending on some conditions, which you can program yourself. For example, you may create differently looking overlay on AMD and NVIDIA GPUs, you may create different representation of CPU usage bars depending on logical CPU count, add special layers displaying thermal alerts when either CPU or GPU is overheating, add layers visible depending on PTT-styled keyboard polling and so on. Conditional layers support is based on two key components:
    § Seriously improved correction formula parser in data source settings window, which is allowing you to program complex logical conditions and create so called boolean data sources, which report true (1) or false (0) depending on condition you define:
    · Relational operators support: <,>,<=,>=,== and !=. Result is boolean true (1) or false (0). For example, you may define boolean data source called IsGpuOverheating with correction formula “GPU1 temperature” >= 80, which will return 1 when GPU temperature is above or equal 80C, otherwise it will return 0
    · Logical operators support: ||, &&, !. Result is boolean true (1) or false (0). Logical operators allow you to combine multiple conditions, for example you may define boolean data source called IsGpuFanFail with correction formula (“GPU1 temperature” >= 80) && (“GPU1 fan tachometer” == 0), which will return 1 when GPU fan is not spinning AND temperature is critical (so it is not an idle fan stop case)
    · Bitwise operators support: &, |, ^, ~, << and >>. Please take a note that ^ was used for power operator before, however we see no practical use for power operators in correction formulas so it is no longer supported. ^ is used for bitwise XOR operator now
    · Ternary conditional operator support: condition ? expression1 : expression2. Result is expression1 if condition is true, otherwise it is expression2. Please take a note that Basic-styled syntax for ternary conditional operator syntax is also supported, so you can also use alternate if(condition, expression1, expression2) syntax depending on your preferences
    · Hexadecimal const values support. C-styled syntax with 0x prefix is supported, for example x+10 formula can be also represented as x+0xa
    · New cpuvendor, gpuvendor and cpucount variables support allow you to check CPU/GPU vendor identifiers or logical CPU count and use it in your overlay. For example, you may define IsNVIDIAGpu boolean data source and set correction formula to gpuvendor == 0x10de, then use it to display NVIDIA logo in your overlay only when NVIDIA GPU is installed on end user’s PC. Modified sample.ovl overlay layout demonstrates this technique to display AMD/Intel CPU logos depending on CPU vendor id and use different CPU usage bars layouts depending on logical CPU count
    · New rtssflags variable support allows you to check some global RTSS flags. It allows you to check if framerate limiter, video capture or benchmark mode is currently active. For example, you may define boolean data source called IsBenchmarkActive and set correction formula to (rtssflags & 0x100) != 0 to check state of benchmark mode activity bit
    · New validate(expression) function returns boolean true (1) when expression result is valid, otherwise it returns 0. For example, you may use it to check if some data source if physically supported on your system (unsupported values are invalid and reported as N/A). If you’re importing a data source from external provider, e.g. HwInfo, data can be invalid and reported as N/A when provider application is not running, so you may also effectively use validate() function to check if data is currently available. This function is useful when you combine it with ternary conditional operator, for example you may define formula validate(x) ? x : 0 for a data source importing data from HwInfo to make sensor report 0 when HwInfo is not running
    · New key(vkcode) functions allows to poll keyboard and return key press counter and current key up/down state bit. Please take a note that OverlayEditor uses new dedicated HotkeyHandler’s interface to access its low-latency keyboard polling status, so HotkeyHandler must be also active for this function to work. For example, you may define boolean data source called IsKeyDown and set correction formula to (key(0x41) & 0x80000000) != 0 to report 1 when keyboard key ‘A’ is down, then use it to apply PTT-styled visibility to show some specific layer. Alternately, you may define boolean data source called IsKeyToggled and set correction formula to (key(0x41) & 1) != 0 to check bit 0 of key press counter, which is incremented each time you press it. This way you can effectively implement some layer visibility toggle depending of this pre-programmed key in your overlay
    § New “Visibility source” setting in layer properties allows you to use one of boolean data sources defined in your overlay and representing some logical condition to show or hide the layer depending on it. If there is no binding in “Visibility source” setting, the layer will be always visible as before. Otherwise it will be visible only when visibility source reports a value different from zero
    § Added <IF>/<ELSE> and <SWITCH>/<CASE> hypertext extension tags support. Power users may embed these extension tags directly into hypertext instead of “Visibility source” setting to make some parts of layer visible depending on some condition. Please take a note that nested conditional blocks are not supported, so new <IF> tag always closes the previous open conditional block or immediately opens new one. Also, more complex expressions are not allowed into hypertext too, you can only use boolean data sources there. The only exception is ! (NOT) symbol, which is allowing you to invert value reported by boolean data source. Also, please take a note that <IF>/<ELSE>/<SWITCH>/<CASE> tags are extension tags parsed at OverlayEditor plugin level. They are not native hypertext tags, so you cannot use them to format hypertext inside external applications like CapFrameX or AIDA
    o Added PresentMon data provider. Now presentation related metrics from Intel PresentMon (including GPU busy, introduced in PresentMon 1.9.0) can be displayed directly in OverlayEditor’s layouts:
    § Added helper PresentMonDataProvider.exe application, which localizes all PresentMon interoperability in a separate process. PresentMonDataProvider supports PresentMon data streaming either from modern independently installable PresentMon service (downloadable from https://game.intel.com/story/intel-presentmon/) or from legacy PresentMon console application, bundled with the plugin. Please take a note that modern PresentMon service provides additional CPU/GPU telemetry, so this data is not available in OverlayEditor’s PresentMon data provider if you don’t install the service and stream it from legacy console PresentMon
    § Please take a note that PresentMon reports data with noticeable time lag, which varies from 0.5 to 2.5 seconds on our hi-end test system. We added our own msReportingLag to PresentMon data provider, so you may see it in your overlay layouts. Lag is just a part of problem, the worst thing is that the lag is not static due to batching streamed frames inside PresentMon (which means that it may collect a few frames then stream them all at once). So, if you try to render PresentMon's frametime graph in realtime using streamed data as soon as you receive it, graph scrolling will be extremely jerky due to batching. However, it is still possible to implement smooth scrolling of PresentMon's frametimes with a simple trick, if you apply some fixed delay to it. Delay must be large enough to compensate the maximum PresentMon's reporting lag. We selected fixed 3000ms delay in our implementation, which allows smooth scrolling. Delay is not hardcoded, it is defined by PM_DisplayDelay overlay environment variable (can be edited in Layouts -> Edit -> Environment variables)
    § Added new built-in overlay layouts demonstrating PresentMon integration functionality and displaying native realtime RivaTuner Statistics Server’s frametime graph on top and overlapped PresentMon’s frametime and GPU busy graphs below. Most of reviewers prefer to see the frametime graph displayed on per-frame scale, as it is the only real way to diagnose and see single frame stutters. However, native Intel's PresentMon overlay displays it on averaged time scale. So to allow you to compare apples to apples we included two different versions of overlay layouts for PresentMon in RivaTuner Statistics Server distributive: presentmon_frame_scale.ovl and presentmon_time_scale.ovl. presentmon_frame_scale.ovl displays PresentMon't frametimes on per-frame scale, similar to native RivaTuner Statistics Server’s frametime graph. presentmon_time_scale.ovl displays PresentMon's frametimes on fixed time scale, defined by user adjustable overlay refresh period (33ms by default). Averaging window for this overlay layout is adjustable via environment PM_AveragingWindow variable and it is set to double refresh period (66ms) by default. Both layouts display PresentMon's data with fixed 3000ms display delay to allow smooth scrolling
    § Both built-in PresentMon based overlay layouts use new conditional layers functionality to display dynamic “Limited by CPU/GPU” bottleneck indicator. The indicator is based on boolean IsGpuLimited data source applied to PresentMon’s frametime and GPU busy streams and defined as (msGpuActive / msBetweenPresents) >= 0.75. In ideal GPU limited case this ratio should be as close to 1 as it is possible, but in reality there is always some CPU overhead so the threshold was reduced to 0.75 to take overhead into account. Please don’t forget that you can always edit built-in layout and increase the ratio inside IsGpuLimited data source’s formula, if you find the threshold too low
    o Now OverlayEditor supports environment variables for overlay layout. The variables can be changed in Layouts -> Edit -> Environment variables field. Currently environment variables are used to tune advanced properties of PresentMon data provider. Power users may also use environment variables during development of complex overlay layouts with hardware dependent conditional layers (e.g. sample.ovl, which is displaying Intel or AMD logo depending on CPU vendor). In such usage scenario you may use overlay environment variables to emulate different hardware and test your overlay look on it (e.g. set environment variables to "cpuvendor=0x1022;cpucount=8;gpuvendor=0x1002" to emulate a system with 8 thread AMD CPU and AMD/ATI GPU on a PC with Intel CPU and NVIDIA GPU)
    o Minimum refresh period for overlay layout is no longer limited by 100ms. Now you can decrease it down to 16ms. Please take a note that such low refresh period is intended to be used with PresentMon's data sources only. Use it with caution and avoid defining such low refresh period values for overlays using other types sources, which poll hardware on each refresh period and may decrease system performance due to such high polling rate
    o Improved OverlayEditor’s data sources polling architecture. Now each data source can be polled asynchronically with own refresh period. This feature is currently reserved for new PresentMon’s data sources only, which can be polled and updated with independent refresh period. New PM_RefreshPeriod environment variable defines asynchronous refresh period for all PresentMon’s data sources at once. If PM_RefreshPeriod is not defined or set to 0 in environment variables, PresentMon's data sources will be also polled synchronously with the rest data sources
    o Added power user oriented config file switch allowing using idle based rendering loop for OverlayEditor’s window instead of default timer based rendering loop
    · Improved HotkeyHandler plugin:
    o Added asynchronous keyboard status polling interface for interoperability with new OverlayEditor plugin
    · Bundled DesktopOverlayHost tool has been upgraded to v1.3.3:
    o DesktopOverlayHost is now compiled as UIAccess process, which allows it to be displayed on top of most of modern fullscreen applications similar to xbox gamebar gadgets. You may use it to display mirrored overlay copy on top of applications like Destiny 2 and CSGO, which do not allow traditional hook based overlay rendering. Please take a note that Microsoft is allowing UIAccess window to be displayed on top of normal windows (including fullscreen game applications) only when the process is installed in secure location (i.e. Program files and subfolders). So you won't be able to use UIAccess topmost rendering functionality if you install RivaTuner Statistics Server inside some custom location (e.g. D:\Applications\RTSS)
    o Added tiny DesktopOverlayHostLoader.exe companion process. UIAccess processes cannot be launched by Windows task scheduler, so companion loader process is necessary to start DesktopOverlayHost at Windows startup
    o Added power user oriented config file switch allowing enabling flip model for DesktopOverlayHost’s Direct3D11 rendering backend
    · ReShade compatibility related D3D1xDevicePriority setting has been reverted to select old ascending D3D1x device selection priority by default. So it is no longer necessary to change this setting to unlock overlay support in D3D10 applications
    · Slightly changed Vulkan layer to improve conformance to Vulkan specs
    · Added experimental support for "Beta : Use Unicode UTF-8 for global language support" option enabled in administrative regional OS settings. Now each localization description file contains additional "Codepage" field, defining runtime ANSI to UTF8 conversion rule for selected language pack
    · Seriously revamped German localization by Klaus Luppert
    · Added target process filtering support for debug logging system
    · Added On-Screen Display profile for The Texas Chainsaw Massacre and common technique aimed to improve stability in future applications using similar behaviors
    · Updated profiles list
     
    Last edited: Sep 13, 2023
    BlindBison, Kaminari, Mr. LX and 7 others like this.

  5. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    Next beta of RTSS will also contain new power user oriented feature in OverlayEditor, which I added yesterday and which is already reflected in changes list:

    o Added power user oriented config file switch allowing using idle based rendering loop for OverlayEditor’s window instead of default timer based rendering loop

    It is not a functionality related improvement, it is more like educational feature aimed to help those who want to use OverlayEditor to learn behaviors of different framerate limiting modes (async vs front edge sync vs back edge sync) and framerate sampling modes (frame start vs frame presentation). If you use specific combination of framerate limiting and frametime calculation settings, you can be puzzled with rather untypical difference pattern observed in OverlayEditor's start-to-start (top) and present-to-present (bottom) frametime graphs. You may see something like that in OverlayEditor's own frametimes if you try to limit framerate to 60 FPS with async or back edge sync mode:

    upload_2023-9-12_7-1-56.png

    For engaged async or back edge sync framerate limiter modes you'd normally expect close to flat start-to-start frametimes and jittering present-to-present frametimes, but instead of that you see periodic jigsaw-styled spikes there. But it is expected effect of synchronous timer based render loop implementation with timer ticks discarding support in OverlayEditor. Timer based render loops are untypical for games, but they are frequently used for mixed 2D/3D applications, such as editors, which combine pasively rendered 3D view with 2D GUI controls which are processed in the same thread. Such render loop in OverlayEditor runs at fixed 16ms tickrate (which results in 1000/16 ~= 63 FPS framerate) and may discard timer ticks taking too much time to leave enough time for 2D GUI processing. Attempt to sync it to different (but close) tickrate with async/back edge sync limiter results in periodic timer tick discard events, and that's exactly what you're seeing on bottom present-to-present frametime graph.
    Games and 3D only applications traditionally use different render loop implementation, idle based. Which means that rendering is not synchronized to fixed timer events and render loop just eats all idle time to render as many frames as it can. New power user oriented setting (RenderLoop=1), which I added to OverlayEditor.cfg, allows it to use similar idle based render loop implementation for editor, so you see much more expected result on present-to-present frametimes:

    upload_2023-9-12_7-21-16.png
     
    Last edited: Sep 12, 2023
    BlindBison, Mr. LX, PhazDelta and 4 others like this.
  6. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    We've recompiled new build with updated PDF release notes included in distributive and small change in OverlayEditor, which I documented in the previous post. Also Intel recently upgraded console PresentMon to v1.9.2, so I updated bundled console version (no serious changes on that side too, they just fixed their exclusion list functionality, which I don't use at all). RTSS core didn't change, I just recompiled it to increment build number.

    RTSS 7.3.5 Beta 5 Build 27701 will be published in main Guru3D downloads page shortly.
     
    The1, SanokKule, hitzz and 5 others like this.
  7. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
  8. Klaus Luppert

    Klaus Luppert Active Member

    Messages:
    85
    Likes Received:
    24
    GPU:
    RTX 4090 FE
    Thanks. But my name is "Klaus Luppert". I did complete new german translation for both RTSS and MSI AB. I beg to correct my name in changelog. I want to be correctly mentioned, nothing more ;)

    This is wrong:
    • Seriously revamped German localization by Klaus Grosser
     
    toyo likes this.
  9. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    My bad, sorry. I'll fix it so it will be corrected in the next and final builds. Now I'm really puzzled how it got transformed to Grosser in the release notes.
     
    toyo likes this.
  10. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    Ah, I think I know the source of this typo. There are two Klaus users registered here at Guru3D and I seem to select wrong one in forum dropdown when trying to tag you with @ symbol. :)

    I've fixed it in readme (so it will get into newer versions) and fixed your name in changes list provided in this thread.
    @Hilbert Hagedoorn, can we please fix guy's name in changes list provided in download page too?
     
    toyo likes this.

  11. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    7.3.5 beta is now public and available to masses, so I think it is worth discussing some additional specifics related to it, which may confuse some users. If you take screenshot from one of my previous posts, some users may still fail to understand why it is normal (and even expected) to see nearly flat line on top frametime graph and jittering on the bottom one when framerate limiter is enabled:

    upload_2023-9-13_23-7-5.png

    As I explained in my post a few weeks ago, top and bottom graphs are reflecting deltas between slightly different time points. To understand it better is let's look at slide from Tom's presentation again:

    [​IMG]

    As you can see there, typical game in loop in Intel's slide is represented by 3 abstract time intervals: [Game], [Render] and [Wait]. Bottom graph displays deltas calculated between timings of <Present> points located between [Render] and [Wait] intervals of each frame (i.e. present-to-present delta). Top graph displays deltas calculated between timings of <Start> (or <Present return>, as they effectively match) points located between [Wait] and [Game] intervals of each frame (i.e. start-to-start delta). Length of [Game], [Render] and [Wait] is variable on each frame, so we get two slightly different representations of frametime (start-to-start vs present-to-present), which are close enough but never match due to calculating it on different time points.
    The difference becomes most visible when you enable the framerate limiter and the difference pattern directly depends on framerate limiting mode you use. For example, RTSS async and back edge sync modes align timings of <Start> points of each frame, trying to make the difference between them as close to fixed target as it is possible. So you're seeing close to flat line on start-to-start timings and jitter on present-to-present timings. If you switch the limiter to front edge sync mode, it will try to align timings of <Present> points of each frame instead of <Start>. So the difference pattern will be inverted, you'll start seeing jittering on top (start-to-start graph) and bottom (present-to-present graph) will be nearly flat. So new overlay layout is is also a good toolset helping you to understand differences between diffferent framerate limiting modes.
    If you want to read more about async/front edge sync/back edge sync modes - you may give this thread a read.
    And by the way, this slide will probably give you better understanding of RTSS framerate limiting modes naming logic. Front edge sync and back edge sync in this context of this slide apply to front edge and back edge of [Wait] interval. Front edge sync mode makes front edges of [Wait] intervals synchronous on each frame (i.e. it waits immediately before presenting a frame to synchronize front edges). Back edge sync mode makes back edges of [Wait] intervals synchronous on each frame (i.e. it waits immediately after presenting a frame to synchronize back edges).
     
    Last edited: Sep 14, 2023
    Kaminari, toyo, SanokKule and 2 others like this.
  12. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    The second thing worth mentioning is related to 0.75 threshold used in dynamic CPU/GPU bottleneck indicators. Some users believe that it should be set to something closer to 1. Here is useful post from different thread, explaining why lower threshold might be necessary (and mandatory). Anyway, you can always edit and increment it on your side if you find it necessary.
     
    SanokKule and Dan Longman like this.
  13. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    And finally the third thing definitively worth mentioning is related to PresentMon's application input latency metric (represented by msInputLetency counter available in PresentMon data provider). Please always keep in mind that it is approximated value, so always interpret it with grain of salt.
    This counter consists of two independent parts. The first one is truly measured value, it is present-to-display latency (represented by msUntilDisplayed counter). The second part is start-to-present latency. To measure it correctly it is necessary to know when exactly the game is sampling input used to render this specific frame. But this data is not available to PresentMon so start-to-present latency is approximated assuming that the game loop sample input immediately after presenting the previous frame. Which may or may not be the case. For example, if async or back edge sync limiter is active this approximated start-to-present latency will include extra time spent inside framerate limiter, so the real latency will be much lower than approximated one. And finally, the game may use different input sampling approach (e.g. asynchronous) so such approximation will be just plain wrong.
    So once again, take msInputLetency with a grain of salt and always remember that it includes approximated portion. If you need something more strict, use raw msUntilDisplayed and call it "Display latency". It will be less than actual application latency but it will contain no approximated chunk.
     
    Haldi, SanokKule and Dan Longman like this.
  14. Ichisich

    Ichisich Member

    Messages:
    10
    Likes Received:
    5
    GPU:
    AMD RX 580
    So when using a framerate-limiter the actual latency can be assumed to be between msUntilDisplayed and msInputLatency + hardware latency (mouse+monitor+...)?
     
  15. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    Not exactly. You can only reliably assume frame_start_to_display latency, which is a part of actual latency.
    For front edge sync framerate limiter mode, scanline sync and similar third party framerate limiters (i.e. framerate limiters which wait in Present() hook _before_ presenting the frame) actual frame_start_to_display latency should be close to what you see in msInputLatency.
    For async, back edge sync and similar third party framerate limiters (i.e. framerate limiters which wait in Present() hook _after_ presenting the frame) actual frame_start_to_display latency is much lower (up to 1 frame lower) than msInputLatency and can be assumed to be between msUntilDisplayed and msInputLatency.
    In both cases, actual latency should additionally include input_to_frame_start and hw_latency. You cannot reliably guess what input_to_frame_start is because it is specific to input sampling implementation inside each application.
    It will be close to 0 if application samples input immediately after submitting the previous frame, but application's input sampling approach may be different.
     
    Last edited: Sep 19, 2023
    SanokKule and Ichisich like this.

  16. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    Some users requested this tiny feature after integrating asynchronous PresentMon data sources in the previous beta. Now you can use the layer's refresh period setting to slow down layer updates. Initially this setting was intended for speeding up layer updates only, when you were rendering timer driven sprite animations there (the approach was demonstrated in details in one of my previous videos). This short video summarizes alternate usage strategy for this setting.



    Also, when recording this video I noticed that RTSS desktop videocapture module captured some desktop cursor shapes (e.g. cursor displayed when you hover mouse over edit boxes) improperly, so I fixed that too.
     
  17. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    Useful addition to my previous post. This screenshot captured in Forza Horizon 5 apply to the part of my quote marked with bold. It is a case of enabled async limiter (which can be confirmed by seeing ideally flat start-to-start times on top frametime graph and jittering present-to-present times on bottom frametime graph), so frame_start_to_display latency is expected to be much lower that PresentMon's application latency approximation. You can roughly estimate the error in application latency by enabling RTSS own performance profiler panel. It is my own debug panel, allowing me to track efficiency of overlay implementation, see how much CPU and GPU time overlay rendering takes on each stage of overlay rendering to keep it as efficient as possible. This performance profiler panel also includes "CPU wait" counter, which is telling how much time RTSS waits inside the framerate limter, and we can use this value to estimate application latency reporting error. Out points of interest are marked with red boxes. On this screenshot PresentMon estimates app latency as 21.7ms. As I mentioned before, in reality for such framerate limiting mode it also includes framerate limiter's wait time, so real latency is lower and we can expect it to be between PresenMon's display latency (msUntilDisplayed = 5.4ms) and application latency (21.7ms). As you can see in RTSS performance profiler panel, it spends about 11ms in framerate limiter's wait loop per frame, so approximated application latency should be about 11ms lower (21.7 - 11 = 10.7ms). Such estimation is rather rough because as you probably remember, PresentMon's data is reported with 3000ms lag, while RTSS performance counter apply to current frame, but at least it allows you to estimate error level.
    Also, it is rather interesting to see two values marked with green boxes. Top one is Intel's brand new GPU busy, bottom one is... exactly the same "GPU busy", part of it related to RTSS overlay renderer, and it was available for profiling during decades. Some reviewers like GN are raving about GPU busy and present it as something revolutionary. But it is not true. While "GPU busy" is indeed new for end users and reviewers, every single 3D developer could easily track GPU render times with Direct3D queries and similar mechanisms, which exist in every 3D APIs since Direct3D8 times.

    upload_2023-9-20_8-40-37.png
     
    SanokKule, Ichisich, Undying and 4 others like this.
  18. hitzz

    hitzz Member

    Messages:
    13
    Likes Received:
    1
    GPU:
    RTX 3090
    Hi, i have few questions regarding GPU busy and Frametime statistics, maybe you could shed some light here. Recently i've been testing new AMD frame generation and when i did the comparison between off and on, few things have been noticed by me and other user.
    Here is the Screenshot of "OFF" https://imgur.com/a/cNRyaRw
    You can see in green color i have "GPU Busy" and in red "FT" or Frametime metric at the bottom of the overlay, sometimes when both values are very close to each other GPU Busy value can be higher than frametime, in this case it was FT - 17ms and GPU Busy 19ms. Could you explain what is the reason for that?

    The second behavior i noticed when i was testing amd frame generation, which is not exclusive to this feature, when there is very erratic frametime (not necessarily exclusive to AMD frame generation, it can happen in other games, such as NFS most wanted 2005), sometimes the value itself does not display correct value, which you can see in this screenshot: https://imgur.com/a/zd6s7Lu
    In this example even RTSS default frametime says 0.5ms which is not possible with that framerate, the "FT" and "GPU Busy" seems to also have a hard time displaying real values, although sometimes, for a split millisecond i can see showing the correct value which is around ~20ms.
    Hopefully i explained it well enough and hopefully you could point me to right direction for these questions.
    Thanks :)

    I can also provide with video footage if needed.
     
  19. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    Sorry, cannot help with that. Both games with AMD's frame generation are restricted here due to sanctions, even free demo of Forspoken. So I cannot peek inside to investigate anything, and if game developers don't care - I absolutely don't too.

    It is easily possible if AMD present generated frame almost immediately after real one. But I can only guess without seeing internals myself.
     
    SanokKule and hitzz like this.
  20. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,250
    Likes Received:
    7,018
    GPU asynchronously process frames submitted by CPU. So you cannot expect GPU busy to be always lower than frametime. It is absolutely realistic case when GPU still processing the previous frame when CPU is submitting new one. It is normal for GPU busy to be higher than CPU frametime sometimes.
     
    SanokKule and hitzz like this.

Share This Page