Log in or Sign up

NVIDIA announces RTX IO, GPU to Directly Access SSD

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Sep 2, 2020.

Page 2 of 3

wavetrex Ancient Guru

Messages:

2,465

Likes Received:

2,578

GPU:

ROG RTX 6090 Ultra

Hilbert Hagedoorn said: ↑

at first glance it does not seem possible to have SSDs communicate with the video card without the intervention of the CPU
Click to expand...

PCI to PCI device communication has been a thing since ages ago (as old as Pentium 1)

All PCI (and by extension PCIe) devices have virtual memory mapping, especially in x64 systems the hardware can be mapped to some address outside the system RAM address, and then Device 1 can just grab data from Device 2, if both support DMA (which they do).
The CPU only gives commands to the PCI devices where to look and what they are allowed to access. (But that shouldn't be an issue since both NVMe driver and GPU driver are Kernel drivers.)

Nvidia "simply" made a connection between the two, and Microsoft is probably involved as well so I'm guessing an update to the OS will be needed in order to permit the two drivers to talk to each other.

wavetrex, Sep 2, 2020

#21
schmidtbag Ancient Guru

Messages:

8,018

Likes Received:

4,396

GPU:

Asrock 7700XT

Fox2232 said: ↑

I used to have storage statistics via MSI Afterburner. So I kind of know at what rate games loaded data and when. And games gained almost nothing from my NVMe drives.
Click to expand...

I'm not surprised - storage is hardly a bottleneck in games nowadays. I'm still using SATA because I know most games barely load faster with NVMe. It's everything that comes after storage (decompression, transferring over PCIe, dropping into VRAM, etc) that slows things down. As the article mentioned, you could speed things up by taking out some of this overhead. For games that don't have official support, my "prefetch" idea (I meant to say prefetch, not paging file) with already decompressed data could make a measurable performance improvement.

Storage in theory should still be the bottleneck. But if you eliminate the very long and complicated path that game data takes to reach its destination, it will likely become a bottleneck. That isn't such a bad thing either - you want the slowest part in the system to be under 100% load. The fact that it isn't is a problem. DS can help alleviate that problem.

schmidtbag, Sep 2, 2020

#22
pharma Ancient Guru

Messages:

2,496

Likes Received:

1,197

GPU:

Asus Strix GTX 1080

DirectStorage is coming to PC
Sept 1, 2020

We’re excited to bring DirectStorage, an API in the DirectX family originally designed for the Velocity Architecture to Windows PCs! DirectStorage will bring best-in-class IO tech to both PC and console just as DirectX 12 Ultimate does with rendering tech. With a DirectStorage capable PC and a DirectStorage enabled game, you can look forward to vastly reduced load times and virtual worlds that are more expansive and detailed than ever.
Click to expand...

https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/

pharma, Sep 2, 2020

#23
Denial Ancient Guru

Messages:

14,207

Likes Received:

4,121

GPU:

EVGA RTX 3080

Cplifj said: ↑

Did nvidia just copy or license the AMD HBCC technology ?
Click to expand...

It's not the same. GPUDirect completely bypasses system memory, allowing the GPU to pull from the hard disk. HBCC uses system memory as a VRAM cache and intelligently pulls from it. It's two completely different technologies. Also GPUDirect is made by Microsoft, not AMD/Nvidia.

As for everyone else talking about how they copied AMD, Nvidia announced GPUDirect as part of it's Magnum IO API stack back in November last year.

Last edited: Sep 2, 2020

Denial, Sep 2, 2020

#24

semantics, PrMinisterGR and pharma like this.
tsunami231 Ancient Guru

Messages:

14,750

Likes Received:

1,868

GPU:

EVGA 1070Ti Black

so this gona be asnwer to console ps5/xbox faster loading? is this all built into the drivers and windows or "Extra" software that need to be installed? like say "drivex"

and seeing it involves dx12 do i need newer version of windows still on 1907 here and is this gona be universal thing? meaning old game will support this? or is the game gona have to be patch to support this seeing is involves dx 12, what about DX9/10/11 games yes games are still using DX9 to this day, 10 to lesser degree, DX11 more the the other 2 fast as I can tell.

Last edited: Sep 2, 2020

tsunami231, Sep 2, 2020

#25
Denial Ancient Guru

Messages:

14,207

Likes Received:

4,121

GPU:

EVGA RTX 3080

Cplifj said: ↑

Radeon pro SSG did something similar. Using SSD up to 1TB for storage via it's own m.2 slot. That i call similar to this Nvidia tech, just only slightly different since Nvidia uses the system SSD.
Click to expand...

The SSG with the 1TB storage was kind of similar - it had a PCI-E switch on it and essentially bypassed the CPU to write data directly to the SSD - but it's not like Windows could see that drive or you could install games to it.

Denial, Sep 2, 2020

#26
richto Guest

Messages:

114

Likes Received:

11

GPU:

2 x 7900GX2 GTX DUOs in Quad SLi

Undying said: ↑

Ps5 we have instant loading!
Nvidia hold my beer...
Click to expand...

Just to note that the Xbox Series X has similar nvme4 accelerated decompression. Its not just on the PS5.

richto, Sep 3, 2020

#27

Undying likes this.
user1 Ancient Guru

Messages:

2,782

Likes Received:

1,305

GPU:

Mi25/IGP

Denial said: ↑

It's not the same. GPUDirect completely bypasses system memory, allowing the GPU to pull from the hard disk. HBCC uses system memory as a VRAM cache and intelligently pulls from it. It's two completely different technologies. Also GPUDirect is made by Microsoft, not AMD/Nvidia.

As for everyone else talking about how they copied AMD, Nvidia announced GPUDirect as part of it's Magnum IO API stack back in November last year.
Click to expand...

The HBCC can pull directly from any storage device according to some early slides, it specifically can use any storage as a cache (including things like network storage), , the HBCC is aware of different available memory pools and uses a tiered storage like solution presented as vram, the whitepaper doesn't detail using anything other than nvram or ram, so maybe it was cancelled or subject to some erratum.

its not the same as RTX io/directstorage. though maybe amd can implement support for Directstorage in the same or similar way.

user1, Sep 3, 2020

#28
Denial Ancient Guru

Messages:

14,207

Likes Received:

4,121

GPU:

EVGA RTX 3080

user1 said: ↑

The HBCC can pull directly from any storage device according to some early slides, it specifically can use any storage as a cache (including things like network storage), , the HBCC is aware of different available memory pools and uses a tiered storage like solution presented as vram, the whitepaper doesn't detail using anything other than nvram or ram, so maybe it was cancelled or subject to some erratum.

its not the same as RTX io/directstorage. though maybe amd can implement support for Directstorage in the same or similar way.
Click to expand...

HBCC creates what AMD calls a HBC (High Bandwidth Cache) which resides in both VRAM/SDRAM in a tiered hierarchy, with VRAM as the last level cache. If the GPU requires an asset that's outside of this cache, the controller can request the CPU to fetch it and pull it within the HBC, than the GPU can utilize it.

So while it can request data from any location, the data is moved into the HBC first and it's all done by the CPU. It's really not that much different than how GPUs worked prior to HBCC, but HBCC creates storage tiers and manages pages/swaps/etc for the developer.

https://www.reddit.com/r/Amd/comments/7x552w/exploring_vega_hbcc_and_its_effect_on_the_system/

This post does a good job investigating the effects of HBCC on the CPU.

_

GPUDirect Storage on the other hand allows the DMA on the NVMe drive to push the request data directly into the GPU's memory, bypassing both system memory, the CPU and the GPU's DMA engine entirely.

I think this section from Nvidia explains it pretty well:

The PCI Express (PCIe) interface connects high-speed peripherals such as networking cards, RAID/NVMe storage, and GPUs to CPUs. PCIe Gen3, the system interface for Volta GPUs, delivers an aggregated maximum bandwidth of 16 GB/s. Once the protocol inefficiencies of headers and other overheads are factored out, the maximum achievable data rate is over 14 GB/s.

Direct memory access (DMA) uses a copy engine to asynchronously move large blocks of data over PCIe rather than loads and stores. It offloads computing elements, leaving them free for other work. There are DMA engines in GPUs and storage-related devices like NVMe drivers and storage controllers but generally not in CPUs. In some cases, the DMA engine cannot be programmed for a given destination; for example, GPU DMA engines cannot target storage. Storage DMA engines cannot target GPU memory through the file system without GPUDirect Storage.

DMA engines, however, need to be programmed by a driver on the CPU. When the CPU programs the GPU’s DMA, the commands from the CPU to GPU can interfere with other commands to the GPU. If a DMA engine in an NVMe drive or elsewhere near storage can be used to move data instead of the GPU’s DMA engine, then there’s no interference in the path between the CPU and GPU. Our use of DMA engines on local NVMe drives vs. the GPU’s DMA engines increased I/O bandwidth to 13.3 GB/s, which yielded around a 10% performance improvement relative to the CPU to GPU memory transfer rate of 12.0 GB/s shown in Table 1 below.
Click to expand...

The technologies are similar in that they both work to provide data to the GPU but the similarities kind of end there. HBCC creates a tiered VRAM/SDRAM cache and simply requests data the traditional way, but intelligently manages this cache. GPU Direct Storage allows the data on, what I think is any device with a DMA engine, to directly write to GPU's storage.

Denial, Sep 3, 2020

#29
sykozis Ancient Guru

Messages:

22,492

Likes Received:

1,537

GPU:

Asus RX6700XT

Denial said: ↑

The SSG with the 1TB storage was kind of similar - it had a PCI-E switch on it and essentially bypassed the CPU to write data directly to the SSD - but it's not like Windows could see that drive or you could install games to it.
Click to expand...

I'd like to see a solution where an SSD is installed on the graphics card and accessible by Windows.....

sykozis, Sep 3, 2020

#30
user1 Ancient Guru

Messages:

2,782

Likes Received:

1,305

GPU:

Mi25/IGP

Denial said: ↑

HBCC creates what AMD calls a HBC (High Bandwidth Cache) which resides in both VRAM/SDRAM in a tiered hierarchy, with VRAM as the last level cache. If the GPU requires an asset that's outside of this cache, the controller can request the CPU to fetch it and pull it within the HBC, than the GPU can utilize it.

So while it can request data from any location, the data is moved into the HBC first and it's all done by the CPU. It's really not that much different than how GPUs worked prior to HBCC, but HBCC creates storage tiers and manages pages/swaps/etc for the developer.

https://www.reddit.com/r/Amd/comments/7x552w/exploring_vega_hbcc_and_its_effect_on_the_system/

This post does a good job investigating the effects of HBCC on the CPU.

_

GPUDirect Storage on the other hand allows the DMA on the NVMe drive to push the request data directly into the GPU's memory, bypassing both system memory, the CPU and the GPU's DMA engine entirely.

I think this section from Nvidia explains it pretty well:

The technologies are similar in that they both work to provide data to the GPU but the similarities kind of end there. HBCC creates a tiered VRAM/SDRAM cache and simply requests data the traditional way, but intelligently manages this cache. GPU Direct Storage allows the data on, what I think is any device with a DMA engine, to directly write to GPU's storage.
Click to expand...

thing is that accessing system memory in anyway requires using the cpu, its not really useful to show that turning on hbcc uses more cpu energy/sycles since fundamentally there is no other way to access that memory, the fact that the SSG variant has its own ssd it can read from via pcie, is managed by the HBCC, and the slides show network access , pcie ,xdma ect, strongly suggests that it is doesn't have to talk to the cpu inorder to use storage as a cache. kinda like how amd used to use xdma engines for crossfire over the pcie bus without cpu involvement.

also found this slide from the SSG press release

so the question remains whether the inclusion of the cpu block in this diagram for accessing "storage", is due to no apis/os support , or a hard limitation.

user1, Sep 3, 2020

#31
wavetrex Ancient Guru

Messages:

2,465

Likes Received:

2,578

GPU:

ROG RTX 6090 Ultra

Don't forget that GPU is physically connected to the CPU... the 16 lanes come from the CPU's I/O area (internal North Bridge), and in case of Zen 2, it's a dedicated die.

Even if the GPU accesses the SSD -directly-, without involving the CPU cores, it will still happen through the CPU I/O (but not through execution of CPU code)

wavetrex, Sep 3, 2020

#32
Monolyth Meow Mix Kills

Messages:

164

Likes Received:

35

GPU:

Gigabyte RTX 3090Ti

This is a pretty big game changer regardless of who got there first. It may not be as sexy as ray tracing to demo but this kind of tech will be the unsung hero as textures get ever larger over the foreseeable future.

And I agree that we will probably see it sooner than we expect. These kinds of low level features and enhancements can be added without necessarily altering core storage access APIs.

Monolyth, Sep 3, 2020

#33
NewTRUMP Order Master Guru

Messages:

727

Likes Received:

314

GPU:

rtx 3080

Excuse my ignorance on the subject but can someone tell me how much of a difference it makes from the other way of going thru the cpu? Seconds, miliseconds, can / can't tell the difference while gaming? Will it give you an edge over someone online using the cpu method? Is this a game changer, pardon the pun, or who cares?

NewTRUMP Order, Sep 3, 2020

#34
Mufflore Ancient Guru

Messages:

14,732

Likes Received:

2,701

GPU:

Aorus 3090 Xtreme

NewTRUMP Order said: ↑

Excuse my ignorance on the subject but can someone tell me how much of a difference it makes from the other way of going thru the cpu? Seconds, miliseconds, can / can't tell the difference while gaming? Will it give you an edge over someone online using the cpu method? Is this a game changer, pardon the pun, or who cares?
Click to expand...

I saw mention it is 10% faster getting the data directly to the GPU vs going through ram+using the CPU.
Plus there will be benefits from using less CPU and ram bandwidth/space.

Mufflore, Sep 3, 2020

#35
Fox2232 Guest

Messages:

11,808

Likes Received:

3,371

GPU:

6900XT+AW@240Hz

NewTRUMP Order said: ↑

Excuse my ignorance on the subject but can someone tell me how much of a difference it makes from the other way of going thru the cpu? Seconds, miliseconds, can / can't tell the difference while gaming? Will it give you an edge over someone online using the cpu method? Is this a game changer, pardon the pun, or who cares?
Click to expand...

Online games usually preload all data for given level (loading screen with progress bar for each player). That means, no benefit at all unless everyone has same loading capability. (Except of feeling that you was fastest.)
But there are games which take like 5~8 seconds to load even from NVMe as CPU is limiting factor. Would there be no CPU bottleneck, such game would load within second.
Then there is compression ratio. Once GPU takes care of data, compression used can be better which will mean that even in situation where storage is limiting factor, more data will be extracted per second.
But problem is again with people who have no access to this decompression. So it either has to have dynamic compression decided on per system basis, or decompression can't exceed reasonable CPU requirements.

Fox2232, Sep 3, 2020

#36
mbk1969 Ancient Guru

Messages:

15,605

Likes Received:

13,614

GPU:

GF RTX 4070

So how many people in the world have NVMe disks in their rigs? 100%?

mbk1969, Sep 3, 2020

#37
Astyanax Ancient Guru

Messages:

17,040

Likes Received:

7,380

GPU:

GTX 1080ti

Mufflore said: ↑

I saw mention it is 10% faster getting the data directly to the GPU vs going through ram+using the CPU.
Plus there will be benefits from using less CPU and ram bandwidth/space.
Click to expand...

well its more the fact the current method uses the cpu for decompression which adds latency to getting the data onto the gpu.

Astyanax, Sep 3, 2020

#38
Deleted member 213629 Guest

Man now this is interesting - I suppose this means Im late to the party far as commenting on NVIDIA saying PS5 what now ehh

This is all going to work with NTFS?

EDIT: the crash course I just read on Magum I/O - it's file system agnostic in that sense I guess you could say long as there's support for it but man NTFS is outdated as crap

I'm a visuals person:

Last edited by a moderator: Sep 3, 2020

Deleted member 213629, Sep 3, 2020

#39
Astyanax Ancient Guru

Messages:

17,040

Likes Received:

7,380

GPU:

GTX 1080ti

GPUDirect Storage is not RTX IO,

RTX IO is derived from it to a degree but where as GPDS is a full stack nvidia implementation, RTX IO cuts out the front end and replaces it with MSDS API.

Astyanax, Sep 3, 2020

#40

Caesar likes this.

(You must log in or sign up to reply here.)

Page 2 of 3

Share This Page