Understanding memory bandwidth.

Discussion in 'Videocards - AMD Radeon' started by guppysb, Apr 16, 2015.

  1. guppysb

    guppysb Active Member

    Messages:
    59
    Likes Received:
    3
    GPU:
    2x 290x, Qnix 2710 100hz
    Hey Guys,

    I currently have a R9 290x with 4GB. I am also looking to upgrade and buy another R9 290x next month to go with the crossfire route for Witcher 3 (I am open to alternative suggestions as well) for my 1440p monitor.

    Reading this article, it appears that PCI-Express 3.0 x16 can handle 32 GBps bi-directional.
    http://www.guru3d.com/articles_pages/pci_express_scaling_game_performance_analysis_review,2.html

    However, the R9 290x has memory bandwidth of 320 GBps. How is it that the 32 GBps PCI-Express 3.0 x16 lane isn't a huge bottleneck for the R9 290x's 320GBps memory bandwidth?

    In the scaling link, it appears that there is almost no difference between 8x and 16x for a PCI-Express 3.0 lanes that the GPU uses. I have to be missing something here.
     
  2. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    1st) PCIe 3.0 x16 has cca 16GB/s and PCIe 4.0 x16 has cca 32GB/s in one direction

    Then it's way PC uses each interface (PCIe vs GDDR5). When you load level of game textures go to system memory (pre-caching) and set which should be used for rendering then goes to graphical memory (pre-caching).
    This move from system memory to graphical happens only once, unless data are dropped in favor of other data and are needed again.

    Example:
    Game load 2GB of data, it takes 1/8th of second to get to graphical memory. And then additional data pieces which are pre-cached as needed are quite small, like 40MB which takes 0.0024s to load that is 1/408th of second.

    Loading itself to system memory is another pre-caching since SSD today deliver up to 540MB/s vie SATAIII interface, but standard HDDs do around 120-160MB/s only.

    As you can see it all boils down to smart logistics and knowledge of what will be needed.
    And biggest slowdown/stutter if caused by lack of resources (bad pre-caching) comes from HDD. And its effects can be reduced by faster drives.

    And yes, there are games which stutter/reduce frame rate if you turn around too fast as a lot of high resolution textures has to be loaded into vram from system memory even if you have space. It is not mistake at side of graphics card, it is caused by programmers.

    Getting 2nd card which will slow down transfer rate will increase effect of this "turn around issue".
    Lowering texture resolution a bit will decrease its effect.
     
  3. guppysb

    guppysb Active Member

    Messages:
    59
    Likes Received:
    3
    GPU:
    2x 290x, Qnix 2710 100hz
    Thanks for the response. Is it true that the second card also doesn't increase overall vmem as well? Meaning 2x 4gb gpus can only still load 4gb worth of textures into vmem? From what I understand, the second GPU is mostly used for calculations to increase the overall fps.
     
  4. red6joker

    red6joker Guest

    Messages:
    572
    Likes Received:
    0
    GPU:
    MSI GTX 980TI Gaming
    Yes that is how it is right now, when windows 10 and DX12 come out that is "supposed" to be changing. I do not think its going to happen at the launch of win10 but I could be wrong.
     

  5. MacT

    MacT Member Guru

    Messages:
    184
    Likes Received:
    0
    GPU:
    2 x Sapphire HD 7970 OC
    PCIE lanes throughput and GPU memory bandwidth are completely separate entities. They don't interact as such under normal circumstances.

    Pcie lanes handle communication between the Northbridge and any pcie cards installed, like the GPU (and with some later model GPU in crossfire, also GPU to GPU communication).

    GPU memory bandwidth relates to the data transfer speeds between the GPU and it's dedicated memory.

    So information is supplied to and received from the GPU over PCIE at up to 32 GBps - more than enough. GPU does it's processing - definitely needs that 320 GBps transfer rate between it and it's own dedicated vram to keep the image processing fed.

    Usually with crossfire/sli, the method of rendering used is AFR (alternate frame rendering). This requires that each video card supplies the following frame after the other gpu submits their frame. So pretty much the GPUs have near similar 'mirrored' information stored to be able to render the next frame in the alternating frame process.
     

Share This Page