Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Jan 9, 2019.
What does this mean?
Not that we would know much right now... hard to say, could just be marketing talk with what little information we have right now about AMD's tricks up their sleaves.
From 1 Side(Game Dev's) we know:
From other side(AMD) we now hear this:
I think AMD got some more trick in their sleeves for 2019
P/S - why do I have a feeling the DLSS going to be another "G-Sync VS Free-Sync?"
NV mage Gamers to pay for it before MS released DirectML for Free?
P.S2: I have read on Anandtech that DirectML will Use FP16 for better performance
look about this chart:
ALL pascal Based GPU's offer 1/64 of FP32 performance:
1080TI- 177.2 GFLOPS (1:64)!!
while Vega offers 2X (25,166 GFLOPS (2:1)) and Polaris offers 1:1!(6,589 GFLOPS (1:1))
HWgeek, AMD already uses their FP16 in games like Far Cry 5 and Wolfenstein 2. It's no surprise that they'd extend the usefulness of FP16.
However, when we look at those games, then it's clear that GTX1080ti still trounces the Vega 64.
Finally, DirectML works across the board, so, it's not going to be a unique technology for AMD. Where as Nvidia now has the option of both their DLSS implementation and DirectML.
No tricks up the sleeve if the competition already has it...
Radeon VII 16GB HBM 2 memory cost around $320
If true, I think its a safe bet to assume no HBM2 in Navi. If they want to make any money on it.
Not sure if this is old news -- however I just noticed on AMD's Radeon VII product page the disclaimer: "GPU specifications and features may vary by OEM configuration." I don't remember that the wording was there before.
This implies the OEM cards could be different from the reference design.
As I wrote before, 8GB variant of Radeon 7 could be $150 cheaper. And at that price, even I would seriously consider it. 8GB of VRAM is plenty more than I need for my 1080p gaming.
The way I understand it is that its not so easy to do that without a redesign and refitting of a new interposer, which would make it just as costly as leaving it with 16gb. 8gb HBM2 would make sense on an entirely new design of a card rather than redesigning and reconfiguring an existing 16gb for 8gb. Other types of vram would not be a problem.
It would still be 1024pins per HBM2 chip and same bandwidth. I did post slide from one of HBM2 manufacturers showing 2GB per chip having exactly same bandwidth and pin count as 4GB chips.
It should be noted that HBM uses 1 die which is sort of controller and above it you have multiple dies which are actual memory.
Well, why havent they offered an 8gb version for the R VII? If they did, @ $150 cheaper, would be a killer card for the money.
Sounds appealing. Where i can get it?
NVIDIA has been pushing into machine learning (ML) hard. Calling that a trick up their sleeves for AMD is not very accurate.
On top of 2x FP16 performance on Turing, NVIDIA also has the Tensor Cores, which can do a lot of ML work "for free", while on AMD it costs shader performance. More Machine Learning tasks in mainstream software/games only plays to NVIDIAs advantages in Turing, if anything.
We all seen the marketing. But nobody did show actual reproducible benchmark. So, all those Tensor cores, by how many TFLOPs it boosts 2080Ti FP16 (27TFLOPs). As actual total FP16 performance of card 30TFLOPs? I do not think so, otherwise it would be marketed as such.
Or in reverse. If you take general workload for ML and run it on 27TFLOPs GPU and then run it via Tensor Cores. How much faster it will be?
Spoiler: Well, Anand did this:
Titan V has 30TFLOPs of FP16. And 640 Tensor Cores. They account for additional throughput which would be equal to 10.7 TFLOPs of FP16.
RTX 2060 has 240 Tensor Cores which would be around 4 TFLOPs of FP16 used for ML. Totaling FP16 ML of card as 16.9 TFLOPs.
RTX 2070 has 288 Tensor Cores which would be around 4.8 TFLOPs of FP16 used for ML. Totaling FP16 ML of card as 19.7 TFLOPs.
RTX 2080 has 368 Tensor Cores which would be around 6.1 TFLOPs of FP16 used for ML. Totaling FP16 ML of card as 26.2 TFLOPs.
RTX 2080Ti has 544 Tensor Cores which would be around 9 TFLOPs of FP16 used for ML. Totaling FP16 ML of card as 35.9 TFLOPs.
Those are not exactly crazy high values in comparison to Vega 64 with 25 TFLOPs of FP16. And Radeon 7 having 27 TFLOPs of FP16.
= = = =
Now as for actual statement that running something on Tensor cores comes "for free". It kind of does, but how much would it cost AMD?
RTX 2080 which is comparable in price to Radeon 7 has 6.1 TFLOPs of FP16 "for free" outside shaders. For Radeon 7 it means, it will have 20.9TFLOPs left after doing same ML workload.
(Sacrifice of 22.6% of shader time/performance.)
Could they not also do a 12gb card as well? 12gb would have even to cover needs longer, for a bit more than 8gb.
I think the issue is no-one is making 2 or 3gb stacks of HBM2 memory. As far as i know it's 4gb only. I have no idea of the costs involved, but it might end up being to costly to change and that is why we are only getting 16gb HBM2 at this stage.