3D stacking manufacturing process, GPU and memory chips are closely stacked and thus better (shorter) connectivity between them => high bandwidth with lower power. with first generation of HBM you'll get 2GB at 256GB/s or 4GB at 512GB/s / core. Later versions can scale up to 8GB/core at 2TB/s (2016). Edit: Actually it is up to 64GB at 2TB/s with 2nd generation HBM. http://www.overclock.net/t/1488641/next-r9-r7-graphics-to-use-hbm
From these leaked slides it doesn't look like, they show a limit of 4 stacks/128GB/s (4x128=512) with 1GB/stack maximum for the first generation. In fact the HBM bus width is 1024bit even for the first generation, so make that 1TB/s
Mantle V1 is good enough, API doesnt need adjustments like TressFx or Physx OT; good, hopefully it will be cooler too..
Luckily, in this case they just need to get the card to run to see the benefits, no special drivers required