NVIDIA announces RTX IO, GPU to Directly Access SSD

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Sep 2, 2020.

  1. wavetrex

    wavetrex Maha Guru

    Messages:
    1,416
    Likes Received:
    1,050
    GPU:
    Zotac GTX1080 AMP!
    PCI to PCI device communication has been a thing since ages ago (as old as Pentium 1)

    All PCI (and by extension PCIe) devices have virtual memory mapping, especially in x64 systems the hardware can be mapped to some address outside the system RAM address, and then Device 1 can just grab data from Device 2, if both support DMA (which they do).
    The CPU only gives commands to the PCI devices where to look and what they are allowed to access. (But that shouldn't be an issue since both NVMe driver and GPU driver are Kernel drivers.)

    Nvidia "simply" made a connection between the two, and Microsoft is probably involved as well so I'm guessing an update to the OS will be needed in order to permit the two drivers to talk to each other.
     
  2. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,816
    Likes Received:
    2,241
    GPU:
    HIS R9 290
    I'm not surprised - storage is hardly a bottleneck in games nowadays. I'm still using SATA because I know most games barely load faster with NVMe. It's everything that comes after storage (decompression, transferring over PCIe, dropping into VRAM, etc) that slows things down. As the article mentioned, you could speed things up by taking out some of this overhead. For games that don't have official support, my "prefetch" idea (I meant to say prefetch, not paging file) with already decompressed data could make a measurable performance improvement.

    Storage in theory should still be the bottleneck. But if you eliminate the very long and complicated path that game data takes to reach its destination, it will likely become a bottleneck. That isn't such a bad thing either - you want the slowest part in the system to be under 100% load. The fact that it isn't is a problem. DS can help alleviate that problem.
     
  3. pharma

    pharma Ancient Guru

    Messages:
    1,687
    Likes Received:
    493
    GPU:
    Asus Strix GTX 1080
    DirectStorage is coming to PC
    Sept 1, 2020

    https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/
     
  4. Cplifj

    Cplifj Active Member

    Messages:
    87
    Likes Received:
    22
    GPU:
    290X
    Did nvidia just copy or license the AMD HBCC technology ?
     

  5. Denial

    Denial Ancient Guru

    Messages:
    13,323
    Likes Received:
    2,823
    GPU:
    EVGA RTX 3080
    It's not the same. GPUDirect completely bypasses system memory, allowing the GPU to pull from the hard disk. HBCC uses system memory as a VRAM cache and intelligently pulls from it. It's two completely different technologies. Also GPUDirect is made by Microsoft, not AMD/Nvidia.

    As for everyone else talking about how they copied AMD, Nvidia announced GPUDirect as part of it's Magnum IO API stack back in November last year.
     
    Last edited: Sep 2, 2020
    semantics, PrMinisterGR and pharma like this.
  6. tsunami231

    tsunami231 Ancient Guru

    Messages:
    11,606
    Likes Received:
    816
    GPU:
    EVGA 1070Ti Black
    so this gona be asnwer to console ps5/xbox faster loading? is this all built into the drivers and windows or "Extra" software that need to be installed? like say "drivex"

    and seeing it involves dx12 do i need newer version of windows still on 1907 here and is this gona be universal thing? meaning old game will support this? or is the game gona have to be patch to support this seeing is involves dx 12, what about DX9/10/11 games yes games are still using DX9 to this day, 10 to lesser degree, DX11 more the the other 2 fast as I can tell.
     
    Last edited: Sep 2, 2020
  7. Cplifj

    Cplifj Active Member

    Messages:
    87
    Likes Received:
    22
    GPU:
    290X
    Radeon pro SSG did something similar. Using SSD up to 1TB for storage via it's own m.2 slot. That i call similar to this Nvidia tech, just only slightly different since Nvidia uses the system SSD.
     
    PrMinisterGR likes this.
  8. Denial

    Denial Ancient Guru

    Messages:
    13,323
    Likes Received:
    2,823
    GPU:
    EVGA RTX 3080
    The SSG with the 1TB storage was kind of similar - it had a PCI-E switch on it and essentially bypassed the CPU to write data directly to the SSD - but it's not like Windows could see that drive or you could install games to it.
     
  9. richto

    richto Active Member

    Messages:
    80
    Likes Received:
    2
    GPU:
    2 x 7900GX2 GTX DUOs in Quad SLi
    Just to note that the Xbox Series X has similar nvme4 accelerated decompression. Its not just on the PS5.
     
    Undying likes this.
  10. user1

    user1 Ancient Guru

    Messages:
    1,636
    Likes Received:
    554
    GPU:
    hd 6870
    The HBCC can pull directly from any storage device according to some early slides, it specifically can use any storage as a cache (including things like network storage), , the HBCC is aware of different available memory pools and uses a tiered storage like solution presented as vram, the whitepaper doesn't detail using anything other than nvram or ram, so maybe it was cancelled or subject to some erratum.
    [​IMG]

    its not the same as RTX io/directstorage. though maybe amd can implement support for Directstorage in the same or similar way.
     

  11. Denial

    Denial Ancient Guru

    Messages:
    13,323
    Likes Received:
    2,823
    GPU:
    EVGA RTX 3080

    HBCC creates what AMD calls a HBC (High Bandwidth Cache) which resides in both VRAM/SDRAM in a tiered hierarchy, with VRAM as the last level cache. If the GPU requires an asset that's outside of this cache, the controller can request the CPU to fetch it and pull it within the HBC, than the GPU can utilize it.

    So while it can request data from any location, the data is moved into the HBC first and it's all done by the CPU. It's really not that much different than how GPUs worked prior to HBCC, but HBCC creates storage tiers and manages pages/swaps/etc for the developer.

    https://www.reddit.com/r/Amd/comments/7x552w/exploring_vega_hbcc_and_its_effect_on_the_system/

    This post does a good job investigating the effects of HBCC on the CPU.

    _

    GPUDirect Storage on the other hand allows the DMA on the NVMe drive to push the request data directly into the GPU's memory, bypassing both system memory, the CPU and the GPU's DMA engine entirely.

    I think this section from Nvidia explains it pretty well:

    The technologies are similar in that they both work to provide data to the GPU but the similarities kind of end there. HBCC creates a tiered VRAM/SDRAM cache and simply requests data the traditional way, but intelligently manages this cache. GPU Direct Storage allows the data on, what I think is any device with a DMA engine, to directly write to GPU's storage.
     
  12. sykozis

    sykozis Ancient Guru

    Messages:
    21,799
    Likes Received:
    1,056
    GPU:
    MSI RX5700
    I'd like to see a solution where an SSD is installed on the graphics card and accessible by Windows.....
     
  13. user1

    user1 Ancient Guru

    Messages:
    1,636
    Likes Received:
    554
    GPU:
    hd 6870
    thing is that accessing system memory in anyway requires using the cpu, its not really useful to show that turning on hbcc uses more cpu energy/sycles since fundamentally there is no other way to access that memory, the fact that the SSG variant has its own ssd it can read from via pcie, is managed by the HBCC, and the slides show network access , pcie ,xdma ect, strongly suggests that it is doesn't have to talk to the cpu inorder to use storage as a cache. kinda like how amd used to use xdma engines for crossfire over the pcie bus without cpu involvement.

    also found this slide from the SSG press release
    [​IMG]
    so the question remains whether the inclusion of the cpu block in this diagram for accessing "storage", is due to no apis/os support , or a hard limitation.
     
  14. wavetrex

    wavetrex Maha Guru

    Messages:
    1,416
    Likes Received:
    1,050
    GPU:
    Zotac GTX1080 AMP!
    Don't forget that GPU is physically connected to the CPU... the 16 lanes come from the CPU's I/O area (internal North Bridge), and in case of Zen 2, it's a dedicated die.

    Even if the GPU accesses the SSD -directly-, without involving the CPU cores, it will still happen through the CPU I/O (but not through execution of CPU code)
     
  15. Monolyth

    Monolyth Meow Mix Kills

    Messages:
    131
    Likes Received:
    7
    GPU:
    EVGA 3090 FTW3 24GB
    This is a pretty big game changer regardless of who got there first. It may not be as sexy as ray tracing to demo but this kind of tech will be the unsung hero as textures get ever larger over the foreseeable future.

    And I agree that we will probably see it sooner than we expect. These kinds of low level features and enhancements can be added without necessarily altering core storage access APIs.
     

  16. NewTRUMP Order

    NewTRUMP Order Master Guru

    Messages:
    495
    Likes Received:
    125
    GPU:
    STRIX GTX 1080
    Excuse my ignorance on the subject but can someone tell me how much of a difference it makes from the other way of going thru the cpu? Seconds, miliseconds, can / can't tell the difference while gaming? Will it give you an edge over someone online using the cpu method? Is this a game changer, pardon the pun, or who cares?
     
  17. Mufflore

    Mufflore Ancient Guru

    Messages:
    12,695
    Likes Received:
    1,290
    GPU:
    Aorus 3090 Xtreme
    I saw mention it is 10% faster getting the data directly to the GPU vs going through ram+using the CPU.
    Plus there will be benefits from using less CPU and ram bandwidth/space.
     
  18. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,804
    Likes Received:
    3,359
    GPU:
    6900XT+AW@240Hz
    Online games usually preload all data for given level (loading screen with progress bar for each player). That means, no benefit at all unless everyone has same loading capability. (Except of feeling that you was fastest.)
    But there are games which take like 5~8 seconds to load even from NVMe as CPU is limiting factor. Would there be no CPU bottleneck, such game would load within second.
    Then there is compression ratio. Once GPU takes care of data, compression used can be better which will mean that even in situation where storage is limiting factor, more data will be extracted per second.
    But problem is again with people who have no access to this decompression. So it either has to have dynamic compression decided on per system basis, or decompression can't exceed reasonable CPU requirements.
     
  19. mbk1969

    mbk1969 Ancient Guru

    Messages:
    10,658
    Likes Received:
    7,902
    GPU:
    GF RTX 2070 Super
    So how many people in the world have NVMe disks in their rigs? 100%?
     
  20. Astyanax

    Astyanax Ancient Guru

    Messages:
    10,313
    Likes Received:
    3,705
    GPU:
    GTX 1080ti
    well its more the fact the current method uses the cpu for decompression which adds latency to getting the data onto the gpu.
     

Share This Page