Discussion in 'Videocards - AMD Radeon' started by WhiteLightning, Sep 28, 2018.
Raytracing and... via SP+TMU+VGPR?
Get them out before the holidays!!
Seeing as that is MacOS driver, couldn't they just be all the Mac Pro cards ? They were Vega based iirc.
So basically hardware based RT acceleration (TMU's), likely available in RDNA2.
From the text it appears that only extra changes required are in TMUs. Patent has been filled about 10 months before alleged 1st Navi tapeout.
Could AMD put it there? Possibly. Could AMD added it there in between 1st and current tapeout? Possibly.
But I expect that it is what AMD has in consoles with Navi and those are scheduled for late 2020. Therefore there is some small chance that RX 5700 XT will have it. (Maybe driver is not ready.) But I expect that Bigger Navi chips next year will have it as they are too close to actual deadline for console tapeouts.
Edit: But taking into consideration following slide, I would guess that since 1st generation of RDNA is in same bag as GCN for raytracing, this improvement is going to be for 2nd generation/bigger Navi:
AMD Files a Patent for Cooling of 3D Stacked Memory
Scaling and manufacturing of ever shrinking semiconductor devices is becoming more challenging as smaller nodes are introduced.
As we have approached 7 nanometers, economies of scale are becoming more influential than scales of manufacturing.
For example, the development of the 7 nm node development cost more than 3 billion USD, while smaller nodes are expected to see that price cross the 5 billion USD mark.
So given that we are approaching the limit where we can't squeeze more transistors in two-dimensional space without huge economical impact, we have to utilize another dimension in order to keep performance improvements coming.
AMD has filed a patent for cooling a 3D stacked memory with thermo-electric coolers - TECs, also known as Peltier devices.
Being that TECs are made out of P-type and N-type semiconductors, they can easily be integrated into existing silicon manufacturing methods and controlled like a regular device.
The process AMD has patented basically describes how to insert the TEC between memory and logic devices, where it draws heat from either logic or memory with each side being able to dissipate the heat.
That effect is possible due to nature of TEC, where the direction of heat flow is changed inverting the voltage.
As you can see, this is the high-level overview of what AMD proposes, with constant measurements of both the logic stack and memory stack, to determine which one is hotter.
The hotter side gets heat drawn away from it to the colder side, which can dissipate that heat.
AMD Patents a New Method for GPU Instruction Scheduling
With growing revenues coming from strong sales of Ryzen and Radeon products, AMD is more focused on innovation than ever.
It is important for any company to re-invest its capital into R&D, to stay ahead. And that is exactly what AMD is doing by focusing on future technologies, while constantly improving existing solutions.
On June 13th, AMD published a new method for instruction scheduling of shader programs for a GPU.
The method operates on fixed number of registers. It works in five stages:
Compute liveness-based register usage across all basic blocks
Computer range of numbers of waves for shader program
Assess the impact of available post-register allocation optimizations
Compute the scoring data based on number of waves of the plurality of registers
Compute optimal number of waves
It is important to note that the "liveness" of registers is most probably a reference to register utilization, while the term "wave" refers to the machine states, like for example EOP (End Of Pipe) and DRAW which draws the shader.
There are of course many more states but these are just few examples from AMD's "GPU Open" documentation.
The new method is supposed to bring additional performance improvements and reduce latency by making data (machine states in this case) like a wave that is stored in a register.
You can find out more about it here.
^ That looks like something to prevent unnecessary fetching of data from higher level cache/memory. And using more cache in case it comes to conclusion that it will enable completing of work earlier without violating previous sentence.
AMD is working on next-gen software/hardware hybrid ray tracing technology!
According to AMD, this hybrid approach will address some issues that can be found with solely hardware-based ray tracing solutions, and will bring major performance improvements to games taking advantage of it.
As AMD detailed:
“The hybrid approach (doing fixed function acceleration for a single node of the BVH tree and using a shader unit to schedule the processing) addresses the issues with solely hardware based and/or solely software based solutions.
Flexibility is preserved since the shader unit can still control the overall calculation and can bypass the fixed function hardware where needed and still get the performance advantage of the fixed function hardware.
In addition, by utilizing the texture processor infrastructure, large buffers for ray storage and BVH caching are eliminated that are typically required in a hardware raytracing solution as the existing VGPRs and texture cache can be used in its place,
which substantially saves area and complexity of the hardware solution.”
AMD has patented this hybrid approach as “Texture processor based ray tracing accelerator method and system.”
“The system includes a shader, texture processor (TP) and cache, which are interconnected. The TP includes a texture address unit (TA), a texture cache processor (TCP), a filter pipeline unit and a ray intersection engine.
The shader sends a texture instruction which contains ray data and a pointer to a bounded volume hierarchy (BVH) node to the TA. The TCP uses an address provided by the TA to fetch BVH node data from the cache.
The ray intersection engine performs ray-BVH node type intersection testing using the ray data and the BVH node data. The intersection testing results and indications for BVH traversal are returned to the shader via a texture data return path.
The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node.”
AMD has also shared some more details about this hybrid model that you can read below (or you can visit this link that features the entire patent).
“A texture processor based ray tracing acceleration method and system are described herein. A fixed function BVH intersection testing and traversal (a common and expensive operation in ray tracers) logic is implemented on texture processors.
This enables the performance and power efficiency of the ray tracing to be substantially improved without expanding high area and effort costs.
High bandwidth paths within the texture processor and shader units that are used for texture processing are reused for BVH intersection testing and traversal. In general, a texture processor receives an instruction from the shader unit that includes ray data and BVH node pointer information.
The texture processor fetches the BVH node data from memory using, for example, 16 double word (DW) block loads. The texture processor performs four ray-box intersections and children sorting for box nodes and 1 ray-triangle intersection for triangle nodes.
The intersection results are returned to the shader unit.
In particular, a fixed function ray intersection engine is added in parallel to a texture filter pipeline in a texture processor. This enables the shader unit to issue a texture instruction which contains the ray data (ray origin and ray direction) and a pointer to the BVH node in the BVH tree.
The texture processor can fetch the BVH node data from memory and supply both the data from the BVH node and the ray data to the fixed function ray intersection engine.
The ray intersection engine looks at the data for the BVH node and determines whether it needs to do ray-box intersection or ray-triangle intersection testing.
The ray intersection engine configures its ALUs or compute units accordingly and passes the ray data and BVH node data through the configured internal ALUs or compute units to calculate the intersection results.
Based on the results of the intersection testing, a state machine determines how the shader unit should advance its internal stack (traversal stack) and traverse the BVH tree. The state machine can be fixed function or programmable.
The intersection testing results and/or a list of node pointers which need to be traversed next (in the order they need to be traversed) are returned to the shader unit using the texture data return path.
The shader unit reviews the results of the intersection and the indications received to decide how to traverse to the next node in the BVH tree.”
RAGE 2 now supports AMD’s new FidelityFX technology:
Avalanche Studios has released a new patch that officially adds support for AMD’s new FidelityFX technology.
FidelityFX is a collection of high-quality post-process effects that automatically collapse multiple effects into fewer shader passes to reduce overhead and free up your GPU.
Great YT for knowledge.
So probably this is what crytek is doing with the V56 on its demo? If so this works on vega arch
Vega/1st navi appear to do it via shaders. While this new thing uses Shaders for logic and actual checks are done by TMUs which will have changes required to enable it.
but is still possible in vega if enabled in drivers?
I could be wrong but I don't think so. Whenever you see the mention of "fixed function" acceleration they are referring to hardware acceleration.
Radeon Rx5700 & 5700XT review ....
Edit: The review was pulled. Some saved screenshots can be found here.
^^ Not bad at all, now we need some agro-pricing and we have the winner
Maby it really is HD5xxx comeback... 250mm2 Chip as good as 541mm2 one....
But still i have NP to score ~28k (1732MHz/1150HBM2)
Spoiler: FS GPU
Wolf clearly shows that Vega is still in good shape.
Spoiler: Wolf 2
Power tW is great but still this is 7nm and 250mm2 chip, so no biggie here.
Up to 150tW for 5700XT
Spoiler: Power tW
From all I have seen Vega/RDNA1 has to do it hard way. Vega has good compute potential and if AMD manages some driver level magic for proper scheduling, it may not be worst.
But I expect that RDNA2 will smash Vega in this like a bug. Especially in terms of completed operations per Watt-hour. (energy efficiency)
As there RDNA2 should have much higher potential in similar TDP limit.