AMD Could Do DLSS Alternative with Radeon VII through DirectML API

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Jan 17, 2019.

  1. Denial

    Denial Ancient Guru

    Messages:
    13,326
    Likes Received:
    2,827
    GPU:
    EVGA RTX 3080
    I keep seeing this idea that RT/Tensor cores take "a lot of space up" but I really don't see any evidence of that at all. Turing has the same CUDA/mm2 as GP100 but it does it with Tensor, RT, double the cache and twice as many dispatch units and a process that's the same density. They take up space sure - they definitely don't take up "a lot of space". Regardless, I'm responding to people comparing this to Freesync vs Gsync - RPM has a fixed die cost as well that's been idle with the exception of Farcry 5 - so looking at it your way they both cost die space for a feature used in relatively little titles.

    As far as quality, DLSS utilizes an autoencoder which is basically the same implementation that Microsoft demonstrated for their upscaler on DirectML early last year and will most likely be the same that AMD uses. You can tweak the weights, train longer, etc to improve quality. With only one example on a game that seems to be somewhat abandoned it's hard to say what DLSS or any AI upscaler will be like.
     
  2. dr_rus

    dr_rus Ancient Guru

    Messages:
    2,985
    Likes Received:
    363
    GPU:
    RTX 2080 OC
    Source on that please.
     
  3. Alessio1989

    Alessio1989 Ancient Guru

    Messages:
    1,959
    Likes Received:
    548
    GPU:
    .
  4. BlackZero

    BlackZero Ancient Guru

    Messages:
    8,878
    Likes Received:
    479
    GPU:
    RX Vega
    The full Direct X 12 feature set was only included in the last few generations, so they may well do. Also, having it additionally run on CPU could be of huge benefit for older cards if they could run it concurrently on GPU and CPU.

    Of course, that's all wishful thinking at this point.
     

  5. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,821
    Likes Received:
    2,243
    GPU:
    HIS R9 290
    Part of me feels they did this just to spite Huang's comment.

    Not that I'm complaining.
     
  6. lord_zed

    lord_zed Active Member

    Messages:
    55
    Likes Received:
    3
    GPU:
    980ti MOD 1500/8000 LC
    ATM after moving from AMD to NV it looks like this:
    NV comes up with Gsync and charges for it. AMD sees it and not wanting to be BEHIND introduces Freesync
    NV starts to push Ray Tracing. AMD sayw we can do this TO
    NV comes up with DLSS and AMD says we can do that TO

    It's hard for Me to think of a tech that AMD came up with that NV ripped off in last few years. Well I purchased 290x cause MAntle and True Audio ware things that AMD came up with. Played BF4 on Mantle and TrueAudio was DOA not played anythign that uses it.
    I guess this is what happens when You are a Leader on marker, You put some tech out and chasing competition NEEDS to adopt the technology or they will loose market %.


    I knew NV will keep Freesync option as ACE CARD when they will need it aka RTX does not sell as good as they expected.
     
  7. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,821
    Likes Received:
    2,243
    GPU:
    HIS R9 290
    Not sure what your point is. Are you suggesting that these royalty-free alternatives are a problem? Are you suggesting these technologies are the only reason to buy a product?
    AMD doesn't typically come up with new technologies because they don't have the time and money to research such things. Their first priority is (or should be, anyway) to get something with good all-around performance. I don't see that as a problem. I appreciate Nvidia trying to push new technologies, but I personally have no interest in funding them if they're proprietary.

    But anyway, I'm pretty sure AMD knew Mantle was DOA before they even released it; it was supposed to be a proof of concept, at which it was a success. Thanks to Mantle, we have DX12 and Vulkan. As far as I'm concerned, that [so far] was a greater success than Raytracing or DLSS.
     
    INSTG8R and moo100times like this.
  8. dr_rus

    dr_rus Ancient Guru

    Messages:
    2,985
    Likes Received:
    363
    GPU:
    RTX 2080 OC
    Yeah, good luck running ML matrix multiplication load on CPU with any kind of satisfactory performance.
     
  9. tunejunky

    tunejunky Maha Guru

    Messages:
    1,240
    Likes Received:
    440
    GPU:
    RadeonVII RTX 2070
    imho, AMD was probably pushed a tad by Google. they've been working very closely together on Project Stream...which atm is damn good (still beta). the racks and racks of servers all running Radeon Pro and running AC:Origins @1080p 80+ fps on a browser. my work computer is seeing as good of AC:Origins on a browser as from the ssd and a RX580...but then again i live very close to Google and the server farm.
    the "dlss" feature could be implemented through streaming with no performance hit (other than whatever latency you get from your ISP and the "distance" from the server).
     
  10. BlackZero

    BlackZero Ancient Guru

    Messages:
    8,878
    Likes Received:
    479
    GPU:
    RX Vega
    I did say concurrently.
     

  11. Alessio1989

    Alessio1989 Ancient Guru

    Messages:
    1,959
    Likes Received:
    548
    GPU:
    .
    That's the last of the problem. GPU-CPU readback and then CPU-GPU upload is what it would make it unsuitable on gaming.
     
  12. BlackZero

    BlackZero Ancient Guru

    Messages:
    8,878
    Likes Received:
    479
    GPU:
    RX Vega
    Because sending a lot of 0s and 1s takes up huge amounts of CPU time and bandwidth?

    Anyway, if they could make it happen, it could be useful.
     
  13. Denial

    Denial Ancient Guru

    Messages:
    13,326
    Likes Received:
    2,827
    GPU:
    EVGA RTX 3080
    It's the latency, it's syncing the data to the CPU, running code, sending it back to the GPU, recombining it with GPU data and doing all of that in 2-3ms before you render it out. It's a nightmare to code for and at best you get basically no performance because the copy to GPU/CPU takes longer then just letting the GPU spend another ms on the task and keeping it all there.

    It's the same reason why mGPU will never take off. It takes to long to transfer the data and when you only have 16ms to do it + recombine it's just not worth the effort. That's why they just do alternate frame rendering but that's basically broke with any interframe postprocess effect.
     
    Last edited: Jan 17, 2019
    BlackZero and fantaskarsef like this.
  14. BlackZero

    BlackZero Ancient Guru

    Messages:
    8,878
    Likes Received:
    479
    GPU:
    RX Vega
    Of course, if the process doesn't justify the end result, it's not worth the effort. Although, the point remains, latency is relative to frame rate.

    Having said that, I can completely understand it could prove not worth the effort due to complexity, especially with reference to past experience with mGPU.
     
  15. xrodney

    xrodney Master Guru

    Messages:
    354
    Likes Received:
    58
    GPU:
    Gigabyte 3080 Eagle
    Isn't this one of reasons AMD is working on HBCC and IF to be able share data with minimal latency? Plus I am pretty sure data transfer and operations between CPU and GPU takes microseconds and not milliseconds unless said operation take hundreds of clock cycles.
    Especially on Zen where through IF different resources could have direct access without need to wait for CPU cores to do all actions.

    I am not saying that CPU must be able to do it (performance) but latency between CPU, GPU, memory and cache should not be big deal unlike operations itself.
    Question here is about controlling stuff, but this is something that needs to be solved anyway to be able use chiplets in GPU and still be visible as one GPU unlike current SLI/CF and its something AMD is likely working on and Nvidia probably too.

    BTW, in big data and ERP multi-node systems you have server to server (each different physical frame) data latency in range of 100+ microseconds and that's for two systems that have to talk through network where network latency is bottleneck.
     
    Last edited: Jan 17, 2019

  16. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,804
    Likes Received:
    3,359
    GPU:
    6900XT+AW@240Hz
    Read Direct3D change log for feature levels. That's not Microsoft's wish list. That's what has been developed in cooperation with AMD/intel/nVidia and game studios. Based on it being feasible for HW implementation down the road or HW already ready for such operations.

    Just because AMD is not jerking publicly each and every feat of technology/software does not mean they sit idle. Quite contrary, great number of revolutionary technologies which are actually important came from AMD. And not some petty: "Let's try raytracing again." or "new way to do image filter/upscaling" ...
    AMD's feats are more to the core of innovation itself. Here a bit: https://developer.amd.com/tools-and-sdks/ or https://www.amd.com/en/technologies/store-mi
    From latest years... HBM, interposers, chiplets, real working MCM for desktops.
    Going back AMD64, HSA, ...

    I wonder how would you play games on IA-64 processors. That extra compute performance required for raytracing. AMD pushed that kind of compute long time before nVidia in consumer market.

    As of TrueAudio. It has not been implemented, but it is technology which delivers exactly what it promises. I would prefer that in games instead of "Too little, too soon raytracing".
    Good audio realism provides better immersion than bit more accurate reflections of ugly objects.
     
    moo100times and carnivore like this.
  17. Alessio1989

    Alessio1989 Ancient Guru

    Messages:
    1,959
    Likes Received:
    548
    GPU:
    .
    It's more about transfer time (plus eventual decoding) then computation time, especially from GPU to CPU, readback operation can become easily a bottleneck since they break rendering pipeline. Also, abuse of CPU to GPU upload can become a problem too, especially on discrete GPUs.
     
    BlackZero likes this.
  18. -Tj-

    -Tj- Ancient Guru

    Messages:
    17,173
    Likes Received:
    1,921
    GPU:
    Zotac GTX980Ti OC
    I rather have directML then dlss. At least when I saw that car reconstruction picture.


    The biggest reason is quality, unless you use 2x dlss to get over that upsampling , but then it's kind of a moot point - no perf boosts..

    I saw really detailed review about dlss @ ffxv and to be honest it looked crap 90% of the time.
    The worst part was fence lines shimmering and some smeared pixels with loss of texture detail and even object detail in the distance.
     
  19. BlackZero

    BlackZero Ancient Guru

    Messages:
    8,878
    Likes Received:
    479
    GPU:
    RX Vega
    Clearly, all this adds latency, but we were discussing older GPUs that probably aren't achieving 60 FPS in the first place.

    If it's worth developer time or not, that I suppose would depend on if any real benefit can be attained.
     
  20. Denial

    Denial Ancient Guru

    Messages:
    13,326
    Likes Received:
    2,827
    GPU:
    EVGA RTX 3080
    Yes.

    The latency would depend on the size of the data but it's not really relevant. In this case Microsoft found GPU processing on DirectML with metacommands on to be 275x faster then running it on the CPU.

    http://on-demand.gputechconf.com/si...-gpu-inferencing-directml-and-directx-12.html - @24 minutes into presentation - the entire presentation is good though and covers a lot stuff being said here.

    Point is even if the latency is only 100-200us to transfer the CPU, the GPU could have performed whatever operation that was sent to CPU multiple times over again.The more data you send the longer the time to get it back. It's simply never worth sending it there - especially with the order of magnitude in performance.

    https://hps.ece.utexas.edu/people/ebrahimi/pub/milic_micro17.pdf

    They both are working on it but it requires massive amounts of bandwidth, changes to the scheduling, etc and even then it's still not scaling perfectly in terms of performance.

    You have a source for 100 microseconds? Typically the latency between two multnode systems ~350-400us for the network alone - but admittedly it's been a while since I worked on anything like this (2011/12 @ RIT).
     
    Fox2232 likes this.

Share This Page