AMD Greenland Vega10 Silicon To Have 4096 Stream Processors?

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Mar 28, 2016.

  1. Hilbert Hagedoorn

    Hilbert Hagedoorn Don Vito Corleone Staff Member

    Messages:
    42,071
    Likes Received:
    10,075
    GPU:
    AMD | NVIDIA
  2. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,809
    Likes Received:
    3,366
    GPU:
    6900XT+AW@240Hz
    This year we do not need more SP/ROPs/TMUs/... as 14/16nm is to bring higher clock.

    Question is how much higher. For AMD sitting around 950~1050MHz for standard and 1100~1200MHz for OC, it may be quite a jump up.
    For nVidia, who likes to write on box low MHz number like 1040~1070MHz, but having GPUs boost to 1350~1450MHz without user OC, This Jump may be just small step up.
     
    Last edited: Mar 28, 2016
  3. lantian

    lantian Master Guru

    Messages:
    292
    Likes Received:
    1
    GPU:
    MSI GTX 970 GAMING 4gb
    I am sorry but what, if anything it will bring lower clock's, you do realize that all of those processes are optimized for low power chips with low clock speeds. A gpu by principle is high power part. Not to mention it is a new process that is having big problems getting of the ground. The previous nodes where not build exclusively with low power devices in mind, also when the first 28nm card launched it had a clock speed advantage of 25mhz compared to last gens card on previous process node.
     
  4. Noisiv

    Noisiv Ancient Guru

    Messages:
    8,043
    Likes Received:
    1,333
    GPU:
    2070 Super
    I've been following discussion along those lines on b3d, and there is exact zero evidence suggesting that. Other than some guys feelings.

    TSMC 16FF+ provides around 65% higher speed than 28HPM at same power, and there is no way around that. (Except radically different architecture with higher IPC)
    In fact that's the whole reason for deploying new process!

    In the end clocks do not matter at all. All that matters is perf (/W).
    Matter of fact it wold be super-sexy if GP104 was clocked at say... 275MHz :banana:,
    with 30% higher perf than 980Ti @170W
     
    Last edited: Mar 28, 2016

  5. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,809
    Likes Received:
    3,366
    GPU:
    6900XT+AW@240Hz
    Are you time traveler from 2013 talking about 20nm? Because official GloFo materials are Jaw Dropping when you actually read what they write.
    And anyone who can normalize those values for comparison is going to leave with lasting impression and anticipation of something really good.

    I guess, you can understand following thing as it is relatively simple.
    28HPP reasonable Max clock is 1.2GHz (according to GloFo materials).
    14LPP reasonable Max clock is 2.48GHz (according to same GloFo material)


    GPUs went somewhere around that point for 28nm even if it was with TSMC.
    My guess is that we may see GPUs clocked around 1600MHz at 14/16nm.
    Going to 1600 from 1050 is 52% boost in clock. Going to 1600 from 1400 is 14% boost.

    I guess you know that manufacturing of Fiji silicon costs around same as manufacturing of GM200 (GTX 980Ti). GM200 is bit stronger in most of games than Fiji. But clock to clock it is much weaker (practically everywhere).

    If you realize that boost each of those chips can get just from clocking up. You now see, that AMD needs to adjust Fiji to 14/16nm standard, make it stable (minimal cost).
    But nVidia can't go with just GM200 die shrink and up clock a little as it would not be enough to compete with 14/16nm Fiji. nVidia has to come with much better technology or increase SP/TMUs/ROPs a lot. (in either way it is higher cost to market/manufacturing cost)

    This looks like AMD has to do minimum work to be in front of nVidia this round. And that's while not including into equation that AMD is not sitting on their hands, but improving/evolving GCN as much as nVidia does, since both companies are working towards all those small new things their GPUs are not yet capable to do.
     
  6. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,809
    Likes Received:
    3,366
    GPU:
    6900XT+AW@240Hz
    removed part is where you probably misunderstood me. Bold part is where you can't get more wrong.
    Your dream 275MHz power efficient GPU would require to be so beefy, that it would be manufactured in same way as Core2Duo. But with 4 GM200 GPUs or maybe 2 Double sized GPUs as 16nm can make ASICs with more transistors.
    But it still remains, that price of that thing would be at least 4 times as high as one GM200 and it would deliver same performance.

    Power efficiency is mostly bogus. If you can have bit more power efficient GPU for $200more than exactly same performing one. But in entire time you are going to use that power efficient GPU, it will only save you $150 in comparison to not paying $200 more in 1st place, you made a bad deal.

    Yesterday and today, I had communication with one of our members and he brought to my attention that Nano cost as low as R9-390X, now that's where you have power efficiency at same price with very similar performance. But dreams about low clocked monsters... Especially since power consumption is quite linear thing to clock till you reach point where chip is clocked out of comfort zone of manufacturing process and power consumption spikes to the sky.

    In other words, you can have your power efficient GPU, just get GTX980Ti and limit its clock to 900~1000MHz, still good performer, but at nice low power consumption.
     
  7. Noisiv

    Noisiv Ancient Guru

    Messages:
    8,043
    Likes Received:
    1,333
    GPU:
    2070 Super
    275MHz <- I was just dreaming out loud. I thought that was obvious :)

    Power efficiency bogus? It might be if you are talking about power bill savings, but I am not interested in power bill. We are talking about arch, are we not?

    And if power efficiency was bogus, why is Raja all perf/W all of a sudden.
    Because perf/W = perf. for all intents and purposes.

    If your arch has good perf/W, all you need to do is to clock higher or expand your die to get more perf.
    But if your perf/W sucks, you'll end up TDP limited that much sooner.

    perf/W + mm2 = ALL
     
  8. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,809
    Likes Received:
    3,366
    GPU:
    6900XT+AW@240Hz
    perf/W matters only for mobile devices. That's why Raja is all over it as he wants to recapture this market which is lost completely and AMD had nearly no presence there.

    But when we look at Desktop where most of us live, GTX 970 obliterates GTX 980Ti in perf/W. Yet people buy 2~3 GM200 chips with smile on face.

    Anyway, what All those post meant was to show that nVidia used high clock while AMD used beefy architecture to reach same goal (high performance).
    And with 14/16nm, this High clock field is leveled and they start anew.

    It is like AMD having big engine in jeep while nvidia having high rpm engine in bike. And games posing difficult terrain. Bike got better of it as that terrain was not difficult enough. But as this time Jeep gets to same rpm as bike while keeping that massive engine.

    I look at things as big equations with a lot of variables, some constants and try to predict what variable will change in what way.
    In GCN to polaris thread I shown how big step up GloFo promises over 28nm.
    And how wide variety of GPU size to clock (performance) and power efficiency we may expect.

    Each of clock, transistor count and efficiency range is at least twice as wide with 14nm than it was with 28nm.
     
  9. Noisiv

    Noisiv Ancient Guru

    Messages:
    8,043
    Likes Received:
    1,333
    GPU:
    2070 Super
    No it does not.

    Perf/W matters only for mobile, you say. But if AMD needs it "only for mobile" then its not bogus.
    And if they got it for mobile, then they got it period, because its the same ****ing chip.

    You're really defending "power efficiency is bogus"? Jesus...
    Power efficiency was not something invented by Nvidia. It's pretty much The metric. As seen above.


    No it's not. Because they do not start from nothing.
    Corresponding architectures and their inherent characteristics remain, and they are the starting point in new design for both AMD/NV.

    In AMDs case, they needed to catch up and that is exactly what they did. So AMD might gain more on clocks, but Nvidia's higher clocks are guaranteed by stepping down, not once, but twice in TSMC's lithography process.

    TSMC's 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology.

    All this tell us is how an ideal shrink of previous product would look at 16FF+. No clock field leveling required.
     
    Last edited: Mar 28, 2016
  10. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,809
    Likes Received:
    3,366
    GPU:
    6900XT+AW@240Hz
    I love way you use term speed (very safe way instead of using clock, or real world gaming performance. It can be easily substituted for each as needed later.)
    I see no reason to agitate you as much as your written form indicates, therefore I apologize and you are right in all you wrote.

    I'll keep my reality way it is (potentially completely wrong). And I will not get mad if time proves me wrong. Because I made assumptions on a lot of information available to me and even so I may be wrong in all of them.
     

  11. Ieldra

    Ieldra Banned

    Messages:
    3,490
    Likes Received:
    0
    GPU:
    GTX 980Ti G1 1500/8000
    Few things need pointing out here, perf/w =/ perf. It doesn't scale linearly, a 250mm^2 can be the perf/w king, but that same gpu scaled up to 600mm^2 could be terribly inefficient .

    As for the clocks, I'm confused frankly. AMD has historically gone for larger designs with more cores, while NV has gone with higher clocks.

    1600 SPs at 1500MHz == 2400 SPs at 1000MHz, simple arithmetic guys.

    My favorite performance metric would have to be Tflops/mm^2

    Power consumption to clock speed relationship is most certainly not linear

    I'm struggling to understand what you're saying. The nano is marketed as a power efficient part SPECIFICALLY because clock speed does not scale linearly with power draw, and it throttles heavily to remain with in its tdp.

    Why would you ever level the clock playing field to compare two processors ? That makes no sense at all. I buy a GPU that runs at 50% higher clocks than it's competition, so you compare it to it's competitor at 66% of it's max ? That's just silly. Very, very silly.

    Imagine we did that with the FX9590, compare it to haswell clock for clock - that'll be good for a laugh.
     
    Last edited: Mar 28, 2016
  12. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,809
    Likes Received:
    3,366
    GPU:
    6900XT+AW@240Hz
    That's because on 28nm AMD and nVidia went very different ways to achieve performance for end user.

    But nVidia using tricks (know how) on 28nm which allowed them to clock much higher than AMD are not directly translatable to 14/16nm. Those processes behave differently, frequency at which inflection point for power efficiency sits is on other place and may or may not be as forgiving as on 28nm where it was very unforgiving.

    But belief that while AMD's limit on 28nm Big GPU was around 1200MHz (at best) and nVidia's around 1500MHz (which is 25% difference) will translate to AMD 1600MHz GPU and nVidia 2GHz GPU after die shrink is way out of this world.

    Especially since there are values indicating thermal density and what kind of power consumption ASIC gets upon reaching high clocks. That's why I think we are going to see Big GPUs around 1500~1600MHz even if technology allows for 2.4GHz. Because there you are going to reach 7~8times higher thermal density than 28nm did (again according to GloFo materials).
     
  13. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,889
    Likes Received:
    755
    GPU:
    Inno3D RTX 3090
    AMD is probably investing in smaller chips this time around. They are going to do the Maxwell approach on GCN, is my guess. I hope that the good parts of the architecture are not lost in the translation.

    In my uneducated opinion GCN on the hardware side is ROP/TMU limited. I wouldn't mind seeing a Fiji-like configuration with 128ROPs.

    As for people whining about performance/watt and saying it's only for mobiles etc: If a chip is "cold" as a design you can clock it much higher than a "hot" chip, cram more hardware in, or a combination of both. Hawaii was a hot chip and it was a problem for AMD in that respect, until games started having more GCN-tuned engines and AMD themselves improved the drivers.

    What I find curious is that supposedly the lower end parts will use GDDR5X. Now I might be wrong, but I have the feeling that ALL AMD parts will use HBM, and the lower end of the scale will go from that to GDDR5. Combined with the Micron news recently, and with that NVIDIA is the only company speaking about GDDR5X, I can't see AMD using it.

    Also, if the memory bus is indeed 256bit for a 2.300+ shader product, I would be:
    [​IMG]
     
  14. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,809
    Likes Received:
    3,366
    GPU:
    6900XT+AW@240Hz
    When people talk 28nm OC, they talk 10% and their chips will get from hot to melting in process as AMD and nVidia have them sold way after inflection point for that non-existent power efficiency we on PC have. Because Power Efficient 28nm is 400~600MHz for GPU as can be seen on mobile devices.

    14/16nm Cold/Hot and Power efficiency is story of that Thermal density.
    I'll quote myself here:
    So, how forgiving will 16/14nm be once GPU goes over breaking point. And from nearly linear power consumption to clock dependency we reach very exponentially increased power consumption (read electricity to heat transformation).

    What kind of cooling will we need to keep low clocked 1300MHz chip with 2 times higher thermal density (in comparison to 28nm) "Cold"?
    What kind of cooling will we need to reach 1800MHz where we can expect 3.5 times higher thermal density in comparison to 28nm?
    Do we even have cooling capable to efficiently remove 250W of heat from 75mm^2 GPU?

    Because 28nm GPU reached its peak with 600mm^2 while eating 275W and Fiji is kept under 52°C under extreme situations with increased ambient temperature.
    But 14nm at its peak clock has 7.25 times higher power consumption per mm^2. So while Fiji made 0.46W per mm^2, 14nm at 2.4GHz is to make 3.35W per mm^2.

    Materials show power efficient technology, which is not easy to cool down therefore hardly be called "cold" and once you cross its limit, that advanced heatpipe cooling we enjoy in last few years won't help a bit.
     
  15. Ieldra

    Ieldra Banned

    Messages:
    3,490
    Likes Received:
    0
    GPU:
    GTX 980Ti G1 1500/8000
    Lithography Wars Episode 14nm: return of the heatspreader

    A little confused by your calculations, maybe it's the alcohol, but if we have 1B transistors at 28nm in 100mm^2 die area consuming 10W, then by your calculations with 14nm finfet we'd have 4B transistors in 100mm^2 consuming 20W. Looks like double thermal density to me, not 8x
     
    Last edited: Mar 28, 2016

  16. Noisiv

    Noisiv Ancient Guru

    Messages:
    8,043
    Likes Received:
    1,333
    GPU:
    2070 Super
    Pretty simple, right?
    Or you can throw away your perf/W and go nuts with pure performance.
    But in order to throw it away, you have to have it in the 1st place jebus...

    Except I've said nothing of the kind. So why you're "debunking" this 2GHz/1.6GHz is beyond me.
    Quite opposite - I've allowed AMD gaining more in regards to clocks.
    Because they had bigger homework to do efficiency wise. Not because they start anew. They don't.
    But for all we know they might deploy higher IPC design, and even "lose" in clocks by more then the old 25%.
    Who actually gives a **** about clocks??

    Nvidia is according to all sources certainly not starting anew.
    It seems they will be carrying lots of Maxwell forward, and you can read directly from TSMC 16FF+ quote how an ideal Maxwell shrink would look like in regards to clocks.
    There is nothing clean slate about this.

    And I still don't give a damn about final clocks :)
     
    Last edited: Mar 28, 2016
  17. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,809
    Likes Received:
    3,366
    GPU:
    6900XT+AW@240Hz
    Unfortunately not, they fit same amount of transistors into 3.7 times smaller area. And that same amount of transistors will eat 1.96 times more power at maximum clock in comparison to those 28nm transistors ticking at their maximum official clock.

    We take your values (ignoring any errors of transistor density, power consumption per transistor per clock or voltage):
    28nm: 1B transistors in 100mm^2 (@1GHz) consumes 10W
    14nm: 1B transistors in 27mm^2 (@2.4GHz) consumes 19.6W

    Now that is power efficient, 2.4x higher clock and only 1.96x higher power consumption. (22% more power efficient at peak clock, on any lower clock it is more power efficient.)

    If you wanted those 4B transistors:
    14nm: 4B transistors in 108mm^2 (@2.4GHz) consumes 78.4W

    Bad thing on this die shrink, More heat is smaller area.

    Good thing on this die shrink is that they promise high clock (I add: "As long as you can cool it.").

    Here you have part of it.
     
  18. Ieldra

    Ieldra Banned

    Messages:
    3,490
    Likes Received:
    0
    GPU:
    GTX 980Ti G1 1500/8000
    I understand what you mean, but clocks are still important, not alone, but in the context of the design.

    Im not saying nvidia will clock higher than amd or vice versa, I'm saying I care about the throughout, whether it's achieved through a big slow design, or a fast small design doesn't affect me in the slightest.

    The only way I can justify the 8x thermal density is if 14nm transistors consume double the 28nm ones, which would be pretty funny. 4x transistor density, 2x power = 8x thermal density. How I actually think it is : 4x density 0.5x power

    Just saw your comment Fox, okay yeah if you're talking 150% higher clocks then it makes sense, it's just that... That's not gonna happen
     
    Last edited: Mar 28, 2016
  19. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,889
    Likes Received:
    755
    GPU:
    Inno3D RTX 3090
    Thermal density is a problem indeed, and a heatspreader might work, or even direct water cooling.
     
  20. Ieldra

    Ieldra Banned

    Messages:
    3,490
    Likes Received:
    0
    GPU:
    GTX 980Ti G1 1500/8000
    I was reading about experimental research into integrating pipes for liquid cooling into the die, something like 100x surface area for cooling.

    Damn that sounds so sexy.
     

Share This Page