When can we expect to see Microsoft DirectStorage in real games?

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by BlindBison, Jan 13, 2022.

  1. Glowtape

    Glowtape Active Member

    Messages:
    54
    Likes Received:
    21
    GPU:
    RTX 3080
    I presume DirectStorage and RTX IO are only capable to decompress specific and probably simpler easier to implement formats. You have to consider, what good is a mediocre compression format, when I have two cores idle that I could occupy with say running a ZStd decompressor on better compressed data.

    Just because DirectStorage/etc avoids any direct CPU involvement via DMA, doesn't mean the memory controller is any less busy (cache coherency BS). If I have spare CPU cycles, I might as well continue to run the decompression on the CPU. Things like ZStd are super-fast (at least in decompression), regardless.

    Anyone that'd care about DirectStorage and/or has a GPU capable, probably also has a half-decent CPU.
     
  2. Astyanax

    Astyanax Ancient Guru

    Messages:
    14,978
    Likes Received:
    6,139
    GPU:
    GTX 1080ti
    DirectStorage will be unpacking into local graphics memory.
     
    BlindBison likes this.
  3. Glowtape

    Glowtape Active Member

    Messages:
    54
    Likes Received:
    21
    GPU:
    RTX 3080
    So? Anything involving memory also involves the (IO)MMU on the CPU, that includes DMA, whether the target is RAM or not. The memory controller will be busy validating the transfers according to all MMIO registers, memory mappings, BARs and what not. I might as well write it out to RAM and process it with a decent algorithm that achieves higher compression rates, because last I know, DirectStorage uses DEFLATE, which is PKZip. Whoopty-doo, welcome to 1989.

    Also, you should rename yourself to Beetlejuice or something. Because it's eery how you show up everytime I start complaining about things.
     
  4. Astyanax

    Astyanax Ancient Guru

    Messages:
    14,978
    Likes Received:
    6,139
    GPU:
    GTX 1080ti
    not true, modern graphics cards have onboard IO MMU support so they can bypass the system memory entirely apart from basic driver state.
     
    BlindBison likes this.

  5. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,407
    Likes Received:
    661
    GPU:
    RTX 4090
    AFAIR there are no clear specs on PC DirectStorage yet and there are no clear understanding of how it will work either.
    Latest schemes I've seen suggested that the decompression will happen when data will be read from system RAM to VRAM which means that it will still have to be read into system RAM first.
     
  6. Serotonin

    Serotonin Ancient Guru

    Messages:
    4,337
    Likes Received:
    1,739
    GPU:
    Asus RTX 4080 16GB
    To take away from you anytime they feel like. Scary. I learned this years ago when Bioshock was just taken off the App Store and unable to be played on my phone. Digital is a murky place with content.
     
    BlindBison likes this.
  7. cucaulay malkin

    cucaulay malkin Ancient Guru

    Messages:
    7,201
    Likes Received:
    4,227
    GPU:
    RTX 3060 Ti
    you need more vram then I suppose
    how much more though ? extra 2GB? 4 ? 8 ?
     
  8. RealNC

    RealNC Ancient Guru

    Messages:
    3,939
    Likes Received:
    2,141
    GPU:
    EVGA GTX 980 Ti FTW
    I don't think so. DS will be unpacking textures directly into VRAM rather than load them in RAM first then copy them to VRAM. It's the same data, it just gets to VRAM faster.
     
    BlindBison likes this.
  9. cucaulay malkin

    cucaulay malkin Ancient Guru

    Messages:
    7,201
    Likes Received:
    4,227
    GPU:
    RTX 3060 Ti
    ah
     
  10. Astyanax

    Astyanax Ancient Guru

    Messages:
    14,978
    Likes Received:
    6,139
    GPU:
    GTX 1080ti
    it'll mean less textures too.
     

  11. Glowtape

    Glowtape Active Member

    Messages:
    54
    Likes Received:
    21
    GPU:
    RTX 3080
    DEFLATE isn't that of a complicated algorithm, nor does it need much memory to operate. It easily ran on old computers with 640KB RAM (total I might add, less was actually available to run). Nor does it require all data to be present to start working, things can just streamed and decompressed in small chunks. So, no, it doesn't mean less textures.

    Everything so far indicates that it uses DMA transfers. Means, there's a controller in the CPU that you can tell do copy data between memory locations and/or IO ports. And it does so independently of the CPU. It's typically reserved for kernel mode for drivers to use. Userland memory transfers is code that executes and moves the data around manually word by word.

    If the supported NVMe SSD signals that data is available, a DMA transfer can be set up, from the source memory location to a target memory location, and data gets moved without further involvement of the CPU (other than signaling it's done). If the target happens to be in the memory region the GPU is mapping itself to, it will be redirected there. It can bypass RAM just fine, but it still needs coordination of the CPU (or rather its MMU), which still is the arbiter of all things memory. Once a transfer is done, the GPU can be notified to do its thing with the data.
     
    Last edited: Jan 18, 2022
    BlindBison likes this.
  12. BlindBison

    BlindBison Ancient Guru

    Messages:
    1,794
    Likes Received:
    729
    GPU:
    RTX 3070
    Genuine question, I'm not doubting you, probably just a gap in my knowledge, but why wouldn't PCs need dedicated decompression hardware but consoles would? Aren't consoles just PCs with unified memory?

    I would've thought that PCs could benefit from decompression hardware to make streaming assets into memory quicker/more efficient.
     
    PrMinisterGR likes this.
  13. janos666

    janos666 Maha Guru

    Messages:
    1,170
    Likes Received:
    215
    GPU:
    MSI RTX3080 10Gb
    High-end GPUs have much more processing units ("CUDA cores" in nV terms) than console iGPUs ("shader units" in AMD terms), so even decompressing through "CUDA cores" should be faster than CPU decompressing and copying (from CPU RAM to GPU RAM). But I think (based on nVidia's marketing materials) RTX GPUs have some dedicated decompressing capabilities.
     
    BlindBison likes this.
  14. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,407
    Likes Received:
    661
    GPU:
    RTX 4090
    This is the last info I've seen on the matter:
    [​IMG]
    It implies that the data read will be CPU driven with GPU then doing a local copy and decompression.

    I also find it rather interesting that there are still no benchmarks of any sort despite DS being in dev preview for more than a year now. It's like there's nothing to show for it.

    Dedicated h/w is needed when you need to improve performance per watt and/or area.
    In this case it means that instead of doing these tasks on CPUs or GPUs they are done on dedicated units.
    On PC this isn't required since the configurations are different enough for dedicated h/w to not being able to guarantee the same things as it does on consoles.
    It also tends to be fairly rigid and limited in its functionality leading to s/w overflowing onto general purpose execution units anyway, to a point where this dedicated h/w just sits idle.
    So basically even if some vendor will add it there will still be PCs without it (or with some different versions of it) meaning that s/w won't be able to rely on its presence and will still need to run without it. Which makes it kinda pointless.
     
    BlindBison likes this.
  15. Glowtape

    Glowtape Active Member

    Messages:
    54
    Likes Received:
    21
    GPU:
    RTX 3080
    Interesting. I guess the DMA is from the SSD straight into main memory instead (well, IIRC that's standard MO, anyway). Still more efficient than a code based memory copy routine. I still think that anyone that cares about this, or the other way around is the target group, also has enough resources to keep running it on the CPU. Which wouldn't restrict you to silly Zip compression either. Considering how ridiculously large games are getting, if a more modern algorithm can carve another considerable bit out a compressed file, the better.

    Also, if your game needs 5GB/s sustained IO to run, which is part of the raison-d'etre for all this stuff, it's a broken engine/game IMO.

    As far as benchmarks go, is it available to anyone or just select developers? In case of latter, I guess there's an NDA.
     
    Keitosha likes this.

  16. NEP6XSBW

    NEP6XSBW Member Guru

    Messages:
    105
    Likes Received:
    46
    GPU:
    RTX 3080 Ti
    It won't be CUDA cores, it will be compute shaders via DirectCompute.

    [​IMG]
     
    PrMinisterGR and BlindBison like this.
  17. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,407
    Likes Received:
    661
    GPU:
    RTX 4090
    Compute shaders are executed on the "CUDA cores".

    I mean there's a difference between needing something to run and using something to be able to provide some gameplay experience - like total lack of any loading screens or slow elevators and such. The difference boils down to a constant vs burst I/O speed.
    The question is if PCs even require a new API for that. Chances are that current Windows I/O stack is good enough for that and the only "real" improvement here will come from using GPUs to preprocess (decompress) the assets.
    Which makes DS a weird combo since you'd expect these two things to be separate instead of being packaged into one API.
     
  18. S3r1ous

    S3r1ous Member Guru

    Messages:
    134
    Likes Received:
    21
    GPU:
    Sapphire RX 6700
    this all fascinating, i guess too see any gains this technology and etc related has to solve some bottleneck in game engine, it has to be big enough problem to make difference in frametimes, loadtimes etc to be noticeable
     
  19. Mineria

    Mineria Ancient Guru

    Messages:
    5,527
    Likes Received:
    690
    GPU:
    Asus RTX 3080 Ti
    If you look at the Xbox Series X it seems more to affect loading times than anything else, especially when switching between games.
    It might give more stable framerates on PC, but what do I know.
     
  20. dr_rus

    dr_rus Ancient Guru

    Messages:
    3,407
    Likes Received:
    661
    GPU:
    RTX 4090
    PCs with NVMe drives tend to beat XSX loading times right now, without any DS being used.
     

Share This Page