Discussion in 'Videocards - AMD Radeon' started by OnnA, Jul 9, 2016.
RUMOR: AMD Radeon RX Vega 64 to be great for mining
Source :- VideoCard Z.
Well profits for AMD. No cards for gamers np.
Let's hope that's not true, i remember the 1070 getting a price cut to $349 and now you struggle to get one for under $449.
It explains in the article why it's not just all about profits for AMD.
Long term a high amount of returned cards can't be good for the manufacturers/consumers, and short term instead of buying new cards people could end up getting something second hand, possibly even card that could be ruined and leave a bad impression on that gamer.
It's not like there was ever going to be a huge amount of cards available, so they would have sold out to gamers anyway.
I suspect the vendors have been informed that limited amounts should be sold for the launch, at least, because they need the positive feedback from the reviews, and the positive feedback on the forums and web pages.
If its left up to the vendors to decide, we're F_ed.
That is terrible news for us gamers.
Those numbers look fake. A Fury X will do 23 ish and a Vega FE will do 30. Highly doubt a normal vega will do 3 times more then a Vega FE.
Newer drivers I suspect.
Did no one see the bit in the AMD presentation about new instructions specifically for Coin mining ?
edit : Imagine EVERY GPU coin miner in the world suddenly had to replace ALL their cards with Vega's.
AMD can sell as many as they can make.
A radeon 295x5 can do about 50-60 MH/s and a 7990 around 40-50. It's definitely strange that it is supposedly MUCH MUCH higher than the FE, but it's not impossible for a new card to reach 70-100 when we've seen older cards do 60.
AMD Vega Microarchitecture Technical Overview
1. Memory System & 2. Next Generation Compute Unit
Vega is a big die and AMD helped here by marking clearly the sectors where the memory system is located. The biggest point regarding memory where Vega is considered is the introduction of HBM2 to the retail consumer market - 8 GB HBM2, which was otherwise only available in high-end professional development solutions costing an order of magnitude more. HBM2 has higher capacity per stack, which in turn increases the maximum memory capacity as well, relative to HBM1.
AMD has also provided a comparison to GDDR5 (as opposed to GDDR5X, which is what NVIDIA uses in their competing GeForce solutions) to mention the higher efficiency and lower foot print. The latter holds true even with GDDR5X, however, as our RX Vega Preview indicates, so it has been for nought given the VGA card's PCB appears to be longer than what we had with the AMD R9 Fury series that first introduced HBM.
In order to best make use of the higher bandwidth available with HBM2, AMD's Radeon Technology Group (RTG) devised a brand new High-Bandwidth Cache Controller (HBCC) to help maximize GPU VRAM utilization through grouped memory.
Here, VRAM is used as a cache device for system memory and/or disk storage, and HBCC controls data movement in an intelligent manner.
As a quick visualization of how memory management is otherwise done, AMD is showing how HBCC can help with a page-based management system wherein data segments are handled individually rather than as complete chunks with active pages residing in the high-bandwidth cache and inactive pages in the slower memory.
This can be especially handy when a program loads into memory resources it finds relevant to the 3D scene being rendered without it needing access to them all for every single frame. This disparity hampers the otherwise high memory bandwidth and consumes resources moving said data.
On large working sets, this also brings with it the chance that the physical GPU memory overflows, causing expensive swapping operations to happen in an unorganized manner. By using the high-bandwidth memory cache (HBMC) in Vega, AMD is tackling this via a direct hardware solution, and this is where the HBCC comes in.
The example above showed uniformly sized pages, but the high-bandwidth cache controller is designed to handle irregularly sized memory pages as well. Typical page sizes are between 4K and 128K.
It can access not just system memory and storage, but non-volatile RAM as well, such as Intel's new Optane technology-based SSDs. If you have used a small SDD as a scratch/cache disk with a spinning drive, think of the practical benefits you achieved.
The design of the high-bandwidth cache controller will be handy also in that AMD now has a platform to use this concept with new microarchitectures or scaled-up silicon, and expand upon the same functionality.
As it is now, it provides as much as 27 GB worth of assets to be used, allowing for real-time OpenGL rendering of ~500 million triangles. AMD estimates that this can be expanded upon to as much as 512 TB of virtual space.
Call it GCN 5.0, or GCN 1.6, or even Next Gen GCN, it is clear that Vega builds upon the existing GCN microarchitecture with some improvements added. AMD distinguishes this by referring to their compute units as "Next Generation Compute units" or NGCUs.
This is where the bulk of the magic.. err.. the engineering has happened. AMD can not just turn its back on GCN because the architecture is used in millions of consoles, which helps developers port their tech to PC in a more time- and cost-efficient way.
AMD has added support for 8-bit operations with NGCU, has retained the 16-bit floating point operations from Polaris, and continued to maintain FP32 and FP64 operation support as well. One new feature here is Rapid Packed Math wherein multiple 16-bit operations can be handled simultaneously between 32-bit operations.
If a task has some complex 32-bit operations where precision is key, nothing changes. However, if your application is not demanding on precision - for example, if it is a lighting effect or change from one to another - you can use Rapid Packed Math to perform said operation as a 16-bit one, which has it take up less resources and increases performance throughput.
AMD estimates a Vega NGCU to be able to handle 4-5x the number of operations per clock cycle relative to the previous CUs in Polaris.
They demonstrate a usage case of Rapid Packed Math using 3DMark Serra - a custom demo created by Futuremark for AMD to show off this technology - wherein 16-bit integer and floating point operations result in as much as a 25% benefit in operation count.
AMD encourages developers to take a good look at their shaders and think where they need full 32-bit precision or why they can opt for 16-bit maintaining the same visual fidelity and gaining significant performance improvements. For example, a noise-generating shader doesn't need 32-bit precision, 16-bit would be perfectly fine and still provides a value range large and differentiated enough for a decent noise effect.
Aiding in computation with Vega NGCU is added support for over 40 new ISA instructions that take advantage also of the increased IPC over Polaris. Here's the thing - some of these are very relevant to GPU mining.
Need I say more on where this goes? AMD estimates a single NGCU to be able to handle as many as 512 simultaneous 8-bit operations.
New Graphics Features
AMD has traditionally been at the forefront of introducing new APIs and supporting others, with Mantle being a key focus during the launch of their "Hawaii" microarchitecture.
While Mantle as it was is practically dead now, most of it lives on in DX12 and Vulkan APIs, and AMD designed Vega to provide the best feature support yet for these modern APIs, of any other consumer GPU architecture. With DX12, the higher the tier level of support, the better it is, and a quick look at the table above shows how Vega exceeds both AMD Polaris and NVIDIA Pascal here.
NVIDIA has promised a higher level of support with upcoming microarchitectures of their own, but for now, AMD is Lord of the DX12 manor again.Continuing the extended support path, AMD has finally added support for conservative rasterization with Vega - a Direct3D 11 feature NVIDIA has had since Maxwell.
Conservative rasterization means that all pixels that are at least partially covered by a rendered primitive are rasterized, which means that the pixel shader is invoked. Normal behavior is sampling, which is not used if conservative rasterization is enabled. This is especially handy in situations involving collision detection, shadows, occlusion culling, and visibility detection.
With Direct3D 12, an additional control of overestimated or underestimated conservative rasterization has been added, which is also supported (as Tier 3 level) by Vega. In underestimated mode, only the pixels that are fully covered by a rendered primitive are rasterized. Underestimated conservative rasterization information is available through the pixel shader via the use of input coverage data, whereas only overestimated conservative rasterization is available as a regular rasterizing mode.
With Vega, AMD has also devised a new method to deal with the geometry pipeline.
This also comes down to effective pixel shading and rasterization, wherein the new "Primitive Shader" combines both geometry and vertex shader functionality to increase peak throughput by as much as a 100% in the native pipeline relative to Fiji. The base improvement immediately helps in the rendering of scenes with millions of polygons where only a fraction is visible on screen at all times - a video game environment is a prime example here, with objects in front of others. Implementing primitive shader support comes partly with DX12 and Vulkan, but ultimately falls to the developers again, which can end up limiting the applications that make use of it.
To aid in its adoption, AMD has increased the discard rate for the native pipeline by ~2x that of Fiji but, more importantly, by as much as 5x via the Vega NGG fast path implementation. Again, there has been no mention of NGG fast path being available any time soon, so it is a feature that may end up being theoretical only.Ah, asynchronous compute - the one DX12 feature that caught NVIDIA unaware to where Ashes of the Singularity is still used by AMD to demonstrate their prowess here.
With Vega, async compute continues to allow for both graphical and compute workloads to be processed simultaneously. Nothing new has been added specifically to Vega over GCN as a whole, with AMD claiming their architecture continues to handle it better and better.
GPU Open continues to be supported by AMD, and we touched on this when we covered AMD's Radeon Crimson ReLive Edition 17.7.2. Open source shader functions developed by AMD as part of the initiative with collaboration from industry partners, including DICE and id Software, have helped optimize GCN-based shader units for FP16 operations.
This again is not necessarily Vega exclusive as the data used to quantify the optimization comes from a Doom (2016) developer presentation at SIGGRAPH 2016 for AMD's GCN architecture as a whole.
More here ->
Display Engine, Virtualization and Security Engines
Vega has a new display engine, and with it comes native DisplayPort 1.4 support with high bit rate 3 (32.4 Gb/s), multi-stream transport, and high dynamic range. HDMI 2.0 allows UHD resolution at 60 Hz with 12-bit HDR and a 4:2:0 encoding, while HDCP 2.2 and FreeSync is supported on all DisplayPort and HDMI outputs.
As such, the total bandwidth for video transmission along with MST hub support enables Vega GPUs to support even more displays simultaneously relative to Polaris that brought native 4K/120Hz support with it (non-HDR).
Add in HDR and Vega now looks like the only option from AMD for 4K/120 Hz and 5K/60Hz displays, although with display technology lagging in terms of implementation in monitors and TVs alike, it will not be a real bottleneck any time soon.
There is something else AMD did here with the display engine, and it is not covered in their slides. Vega has better implemented FreeSync technology support than any other architecture before, and it helps significantly enough, too. Only recently have we seen displays come with a FreeSync range where the maximum and minimum refresh rates have >2.5x a ratio of the two and yet AMD only supported FreeSync Low Framerate Compensation Technology (LFCT) for this 2.5x ratio.
A lot of monitors, including the Samsung CF791 AMD is including as part of their Radeon Packs, have a 2x ratio here (48-100 Hz being the max for the CF791, for instance), and Vega brings 2x ratio support for the LFCT, now improving user experiences on the lower end, especially combined with Enhanced Sync as a software solution.
Performance and Power Management
Having spoken to RTG members over the course of the weekend, it seems obvious that a good fraction of the 12.5 billion transistor count was dedicated to increasing the base and boost clocks Vega is capable of relative to Polaris, and also Fiji, which is a direct comparison given their die sizes.
As such, AMD is targeting GPU core frequencies on the order of 1.7 GHz with Vega 10, which is a huge improvement over the previous two microarchitectures and can help tremendously if IPC is on par with the highly overclockable NVIDIA GP104 die-based cards.
This increased transistor count coupled with a smaller die size relative to Fiji is a result of an optimized general-purpose register design with Vega, wherein AMD claims collaboration with their Ryzen CPU team to have helped with the transistor density and power savings.
For example, the company leveraged Globalfoundries' planar transistor technologies, which enables the use of wider transistors when the designer chooses to do so without compromising leakage or parasitics. AMD has also used improved synthesis tools for their circuit design and paid closer attention to their cell library.
AMD also updated the GPU hierarchy such that it improves performance of programs that use deferred shading. The geometry pipeline, the compute engine, and the pixel engine, which output to the ROPs (L1 cache), are now tied to the L2 cache, which has in turn been doubled from 2 MB to 4 MB to cater to these changes.
Ahh, it was only a few weeks ago when AMD announced the launch of the Radeon Vega Frontier Edition, and tests quickly revealed that draw-stream binning rasterization (DBSR) was not enabled on it despite the Vega architecture supporting it.
AMD today confirmed that Vega 10 does indeed support it and that RX Vega SKUs should too.
We are not sure yet if there will be a Radeon Pro software driver update to help enable it with the prosumer Vega Frontier Edition at this point.
DBSR is a tile-based pixel-shading/rendering approach wherein the GPU can render more complex pixels very efficiently relative to previous generations. This is done by having the GPU fetch overlaps only once, followed by performing the pixel-shade of those overlaps only once. As such, any overlapped or invisible pixels are not shaded/rendered, thus saving power and time.
AMD has provided some internal testing results to help demonstrate said power savings and performance benefits in synthetic and real-world rendering loads.The power savings with Vega continue with the addition of an updated micro-controller unit (SMC MCU) for power management alone.
Vega supports Infinity Fabric, although in ways yet unknown completely, but one of the ways the MCU aids is by improving idle state power draw - switching over to a sleep state for the GPU core as well as an ultra low operating frequency for the HBM2 memory.
AMD is using a very old 3DMark application, Perlin Noise, to generate solid procedural textures in order to demonstrate the power savings in action. This does seem like a stretch, but quantifying idle behavior is not easy to begin with.
4K @ 120 FPS and 8K @ 60 FPS! Dayum.... I would CFX Vega if it really can pull this off!
Q: I don't get it. If FE is 30 MH/s, how it can be 70-100 for RX Vega? What is different? Those cards are very similar, aren't they?
Because RTG were sandbagging in the hope that miners would ignore Vega.
Vega 64 Graphics Processor Vega XT (Air)
Base Clock MHz 1247 MHz
Boost Clock 1546 MHz
Max Clock (DPM7) 1630 MHz
Vega 64 Graphics Processor Vega XTX (Liquid)
Base Clock MHz 1406 MHz
Boost Clock 1677 MHz
Max Clock (DPM7) 1750 MHz
Nice. Hopefully the liquid cooled version can hold those clocks. I'll still wait for reviews, but hopefully that will be the card to replace my rx480.
Vega Frontier OC LN2 (stock) by Buildzoid : https://cxzoid.blogspot.fr/2017/08/first-impressions-of-vega-fe-on-ln2.html
The interview was sparked from talk about Vega’s primitive shader (or “prim shader”), draw-stream binning rasterization (DSBR), and smart small primitive discarding. We’ve transcribed large portions of the first half below, leaving the rest in video format. GN’s Andrew Coleman used Unreal Engine and Blender to demonstrate key concepts as Mantor explained them, so we’d encourage watching the video to better conceptualize the more abstract elements of the conversation.
Draw-stream binning rasterization (DSBR)
Smart primitive discarding
It unfortunately seems that it will take at least a year before anything into Vega is properly tapped.