PCI Bus Usage sensor inconsistent?

Discussion in 'MSI AfterBurner Application Development Forum' started by Digika, Jan 23, 2022.

  1. Digika

    Digika Member Guru

    Messages:
    149
    Likes Received:
    1
    GPU:
    GTX 670
    MSIAb 4.6.4 b4, RTSS 7.3.2 b5, Win10 20H2/GTX 1070

    After numerous testing the "PCI Bus Usage" sensor in RTSS I'm not sure it works properly. This is my usage all the time no matter what game I run:
    https://i.imgur.com/OztmRVu.png

    Both global or with per-game profile. THE only time it works (as in: goes to 92%) if I run synthetic PCMark test, the Hairy Donut also does not trigger any changes in the sensor.

    I'm reading it wrong? To me "PCI Bus" implies the GPU PCIeX bus, the one used by GPU. May be I'm wrong in my assumption and it is total system pci lanes bandwidth approximated? But even then random 1-3% percent spike for games dont make sense.
     
  2. Astyanax

    Astyanax Ancient Guru

    Messages:
    16,998
    Likes Received:
    7,340
    GPU:
    GTX 1080ti
    unless you have a recent graphics card and platform, all pcie transactions are 256MB max.
     
  3. Digika

    Digika Member Guru

    Messages:
    149
    Likes Received:
    1
    GPU:
    GTX 670
    Recent is a fairly loose definition in this context. How recent we are talking about? The platform is from 2019 if you mean CPU/motherboard combo. GPU is GTX1070
     
  4. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,083
    Likes Received:
    6,567
    It works properly and makes perfect sense if you have a basic understanding of typical 3D game workflow. "Haiy Donut" is designed to be bottlenecked by GPU, it is transfering close to no data from RAM to VRAM in realtime. The same apply to any properly designed 3D applications, realtime RAM->VRAM or VRAM->RAM transfers (which you see on GPU bus usage graph) are rather slow and any developer in sane mind target to avoid them as a plague. Huge transfers should take place during initialization or load stage only but not in realtime, so it is normal and expected to see this performance counter close to static and rather low in realtime with peaks during load only. So you can only see it high if you run artifical tests aimed to transfer a lot of data on purpose. You can also see how it is increasing when you try to capture video (which results in transfering additional data from VRAM to RAM).
     

  5. Digika

    Digika Member Guru

    Messages:
    149
    Likes Received:
    1
    GPU:
    GTX 670
    Yes, I understand all of that, however:
    This never happens. I've enabled logging, tested multiple games in a row and went through log and saw it just fluctuates between 1%-3% absolutely randomly (it also does it during system idle as well). Hence the question.

    Oh, good point, didnt think to test that, will do now.
     
  6. Digika

    Digika Member Guru

    Messages:
    149
    Likes Received:
    1
    GPU:
    GTX 670
    Recording with NVENC H.264 barely does anything, it still about 1-3%.
    I've tested multiple modern games, tried some project demos in Unreal Editor and Unity, adopting them to load new stuff constantly - still no effect on PCI Bus Usage sensor.
     
  7. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,083
    Likes Received:
    6,567
    It is not enough just to try to record something. Record and test wisely. Use software encoding to ensure that framebuffer data is transferred from VRAM to RAM for encoding. Use as high target video framebuffer resolution as possible.Use as high target video framerate as possible. Otherwise you just transfer no data or transfer something pretty minor, for example attempt to record low res 1080p video at low 30fps results in just ~ 8MB * 30 = 240MB per second bus transferes.
    The best way to test it is to write your own simple 3D application transferring expected amount of data via bus. If it is too complex for you, you may easily modify RTSS OverlayEditor sources and recompile it to perform that. Just declare two global D3D9 surfaces in OverlayEditorWnd.cpp right before void COverlayEditorWnd::Render(BOOL bUpdateLayerRects):

    IDirect3DSurface9* g_lpVidMemSurface = NULL;
    IDirect3DSurface9* g_lpSysMemSurface = NULL;

    Then add the following to COverlayEditorWnd::Render function body right after m_lpd3dDevice->Clear(0, NULL, D3DCLEAR_TARGET|D3DCLEAR_ZBUFFER, m_dwBgndColor, 1.0f, 0) line:

    if (!g_lpVidMemSurface)
    m_lpd3dDevice->CreateRenderTarget(4096, 4096, D3DFMT_A8R8G8B8, D3DMULTISAMPLE_NONE, 0, FALSE, &g_lpVidMemSurface, NULL);
    if (!g_lpSysMemSurface)
    m_lpd3dDevice->CreateOffscreenPlainSurface(4096, 4096, D3DFMT_A8R8G8B8, D3DPOOL_SYSTEMMEM, &g_lpSysMemSurface, NULL);
    if (g_lpVidMemSurface &&
    g_lpSysMemSurface)
    m_lpd3dDevice->UpdateSurface(g_lpSysMemSurface, NULL, g_lpVidMemSurface, NULL);

    This will create two 4096*4096 surfaces (each is 64MB), one in VRAM and one in system memory and perform DMA copy from system memory surface to VRAM surface on each frame when RTSS OverlayEditor's window is active. Considering that editor window runs at approximately 63FPS by default (rendering is driven by 16 ms timer), it will result in transferring 64MB * 63 ~= 4GB per second via bus. It is immediately and correctly reflected by bus usage performance counter jumping to 3x% range here.
     
  8. Digika

    Digika Member Guru

    Messages:
    149
    Likes Received:
    1
    GPU:
    GTX 670
    This is super useful and handy, thanks!
     

Share This Page