AMD/ATI started to work on HBM technology nearly a decade ago

Discussion in 'Videocards - AMD Radeon' started by OnnA, Aug 30, 2015.

  1. OnnA

    OnnA Ancient Guru

    Messages:
    17,793
    Likes Received:
    6,691
    GPU:
    TiTan RTX Ampere UV
    The high-bandwidth memory (HBM) introduced along with AMD’s code-named “Fiji” graphics processing unit radically changes the way graphics adapters are built and also dramatically improves potential performance of future graphics processing units. But while HBM looks ingeniously simple on paper, it was extremely hard to develop and is not easy to build. In fact, AMD started to work on what is now known as HBM as early as in 2006 – 2007.

    The need for speed

    [​IMG]

    Memory bandwidth has been a performance-limiting factor for graphics processors since the introduction of the first gaming-grade graphics cards, three-dimensional games and 32-bit colour back in the nineties. In a bid to considerably increase performance of a graphics adapters, IHVs [independent hardware vendors] had to bolster bandwidth of their DRAM [dynamic random access memory], which was not always easy.

    There are several ways to increase memory bandwidth on a graphics card: to rise memory clock-rate, to widen memory interface, or to use a more efficient memory technology. After increasing frequencies of graphics DRAM to rather high levels in 1997 – 1998, Nvidia Corp. was the first company to start using the double data rate (DDR) memory (which transfers data on both the rising and falling edges of the clock signal, the technology known as double-pumping) on its GeForce 256 DDR graphics cards in 1999 and doubled bandwidth available to the GPU. ATI Technologies introduced the world’s first graphics card with 256-bit memory bus in 2002 and doubled memory bandwidth of graphics processors once again. In 2002 – 2003 new memory technologies – GDDR2 and GDDR3 – designed specifically for GPUs and supporting quad-pumping were introduced and doubled available bandwidth another time.

    But memory bandwidth improvements in the early 2000s did not come for free. Increases of clock-rates and data rates amplified power consumption of memory chips. Wider memory interfaces required more memory ICs, which also increased power requirements of add-in-boards.

    By 2006 – 2007, when the work on the ATI R600 graphics processor with 512-bit memory bus as well as on the GDDR4 and the GDDR5 memory technologies was essentially completed, it became clear that memory consumed a lot of power already and would consume even more over time. Since ATI and Nvidia planned to use their GPUs for high-performance computing (HPC) applications, which require a lot of local memory, it was obvious that power consumption of GDDR was going to become a problem.

    [​IMG]

    At the time, new memory tech development team at ATI Technologies led by Joe Macri came up with an idea of brand-new memory technology, which could provide extreme bandwidth while consuming a low amount of energy. The key elements of the new technology were multi-layer memory devices with an ultra-wide interfaces that used silicon interposer to connect to a processing device.

    Brief history of HBM

    Modern technologies take a long time to develop. For example, the work on DDR4 started back in 2005, a couple of years before DDR3 was commercially launched. Similarly, ATI Technologies (which AMD acquired in 2006) started to think about high-bandwidth memory with low power consumption about a decade ago, before the company helped to commercialize GDDR4 in 2006 and GDDR5 in 2008. The work on what is now known as HBM began sometimes in 2006 – 2007 and in 2013 the technology became an industry standard.

    Architecturally, the first-generation high-bandwidth memory (JESD235) uses a protocol similar to that of the original DDR, which development kicked off in 1996 and was concluded in mid-2000. But a in a bid to finish the new standard, AMD, SK Hynix and other developers had to create a massive amount of additional technologies that ultimately facilitated creation of graphics processors like AMD’s “Fiji”.

    [​IMG]

    There are several key technologies that empower HBM:

    Memory chips with multiple vertically stacked memory devices interconnected using through-silicon-vias (TSVs) and microbumps and then placed on a base logic die.
    Silicon interposer that connects memory ICs to host processor using an ultra-wide interface. Silicon interposer is made using a photolithography technology in a semiconductor fabrication plant.
    Host processor with ultra-wide memory interface.

    [​IMG]

    Development of new technologies requires a lot of prototyping activities in general.
    Before AMD and SK Hynix proceeded to standardize their HBM memory with JEDEC in 2010, the companies had to design multiple implementations of their new technologies and learn how they operated in real-life.

    [​IMG]

    [​IMG]

    AMD started to experiment with interposers and processors back in 2007. The first GPU to connect to memory using an interposer was the RV635, which powered ATI Radeon HD 3650/3670 graphics adapters. Eventually, AMD experimented with interposers and the “Cypress”, which was the world’s first DirectX 11-supporting graphics processor. Both the RV635 and the “Cypress” were based on the TeraScale architecture (gen 1 and gen 2), which was succeeded by the GCN [graphics core next] architecture in 2012.

    The JESD235 standard was published in October, 2013, when the work on AMD’s “Fiji” was well underway and the graphics processing unit was months away from tape-out.

    The HBM saga continues

    The first-generation HBM (HBM1) stacks four DRAM dies with two independent 128-bit channels per die on a base logic die, creating a memory device with a 1024-bit interface. Each channel supports 1Gb capacities (2Gb per die), features 8 banks and can operate at 1Gb/s data-rate (1GHz effective DDR clock-rate). As a result, each HBM 4Hi stack (4 high stack) package can provide 1GB capacity and 128GB/s memory bandwidth. AMD’s Radeon R9 Fury X flagship graphics adapter features 4GB of HBM memory with unprecedented bandwidth of 512GB/s. While the first-gen HBM has limitations when it comes to capacity, it allows to create very small and very high performance graphics solutions thanks to the fact that HBM chips are smaller than GDDR5 ICs.

    [​IMG]

    The second-generation HBM (HBM2) utilizes 8Gb dies with two 128-bit channels featuring 16 banks and sporting up to 2Gb/s data-rates (2GHz effective DDR frequency). The architecture of the HBM2 will let manufacturers built not only 4Hi stack (4 high stack) packages, but also 2Hi stack and 8Hi stack devices. As a result, memory producers will be able to assemble HBM2 memory chips with up to 8GB capacity (8Hi stack) and up to 256GB/s bandwidth (2Gb/s data rate, 1024-bit bus).


    [​IMG]

    Architectural advantages of HBM2 will allow GPU developers to use it not only for ultra-high-end applications with 4096-bit memory bus, but also for adapters that do not require extreme performance. Next-generation enthusiast-class graphics cards based on AMD’s “Greenland” graphics processors as well as Nvidia’s GP100 (“Pascal”) GPUs will feature 8GB – 16GB of HBM memory with up to 1TB/s bandwidth. Samsung Electronics forecasts that over time HBM will enable add-in-boards with up to 48GB of memory.

    The third-generation HBM is in development and engineers currently do not share any information about it. It is logical to expect further increases of capacities as well as performance. While we have no idea how additional capacities and performance will be achieved, we are pretty sure that engineers at companies like AMD are already playing not only with prototypes of future implementations of HBM, but also with something that will succeed them a long time down the road.

    [​IMG]

    Please Keep Up with TOPIC :banana:

    Here -> www kitguru_net
     
  2. mjorgenson

    mjorgenson Guest

    Messages:
    257
    Likes Received:
    0
    GPU:
    EVGA GTX 1070 FTW-RX 480
    I find HBM a very fascinating technology. I'll jump on it when they get HBM2 out the door....

    HBM just makes sense.
     
  3. OnnA

    OnnA Ancient Guru

    Messages:
    17,793
    Likes Received:
    6,691
    GPU:
    TiTan RTX Ampere UV
    Yep Bro :)
    But for HBM II You'll wait some time (3-5Years)
    Remember GDDR3-4-5? Look at Timeline.
    I'm in Need of Fury-X or Nano :) But i'm All X :grin2:
     
  4. The Mac

    The Mac Guest

    Messages:
    4,404
    Likes Received:
    0
    GPU:
    Sapphire R9-290 Vapor-X
    supposedly HBM2 will be on next years GPUs
     

  5. OnnA

    OnnA Ancient Guru

    Messages:
    17,793
    Likes Received:
    6,691
    GPU:
    TiTan RTX Ampere UV
    In Our Dreams :nerd: :wanker: :)

    AMD + HyniX R&D Must give profit (so many Years of developing)
    First like always will be HBM-1 then after some years HBM-2 and so on.
    IMO
     
  6. The Mac

    The Mac Guest

    Messages:
    4,404
    Likes Received:
    0
    GPU:
    Sapphire R9-290 Vapor-X
    nope

    both Pascal and Arctic Island will supposedly both have it.

    google AMD HBM2
     
    Last edited: Sep 1, 2015
  7. OnnA

    OnnA Ancient Guru

    Messages:
    17,793
    Likes Received:
    6,691
    GPU:
    TiTan RTX Ampere UV
    I'm OK with that, but i know how industry works ;-)
     
  8. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    8,125
    Likes Received:
    969
    GPU:
    Inno3D RTX 3090
    Do you have any idea to what has happened to Volta? Wasn't this to be out this year supposedly?
     
  9. Kohlendioxidus

    Kohlendioxidus Guest

    Messages:
    1,399
    Likes Received:
    13
    GPU:
    Sapphire Vega 56 Pu
    Nice post. I'm also waiting for HBM II :nerd: or at least a <14 nm GPU spec before upgrading but that is second on the list.

    The first on my list is anyway a CPU upgrade from AMD and I hope they deliver next year...
     
  10. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    HBM2 is not limited by development of internal logic, but by ability to stack 8 layers instead of 4 and ability to at least double capacity of each layer.

    For now, 4 HBM1 layers are just bit lower than GPU die, having 8 of them may prove to be nightmare for assembly or will require some additional riser on interproser for GPU, so there is nor risk of damaging HBM.

    I actually believe that we could have HBM1 with 4 layers and double density per layer. But it would be costly and gave Fury X no real benefit.
    If it had 6~8 layers, capacity would be bigger too, but in process bandwidth would go up by 50~100% (769~1024GB/s) and that would rise Fury X performance by cca 15%.

    So next generation will not only get proportional increase of performance from architectural changes (more work per cycle; more sharers/CU due to 14nm transistor availability), but will get free bonus 15% just from removal of memory bandwidth bottleneck.
     

  11. mjorgenson

    mjorgenson Guest

    Messages:
    257
    Likes Received:
    0
    GPU:
    EVGA GTX 1070 FTW-RX 480
    Can't find the review, but Sapphire told reviewers not to take apart the Fury as the cooling plate was machined in a way as to touch the HBM and GPU because the HBM sat higher.
     

Share This Page