Undervolting Vega 56

Discussion in 'Videocards - AMD Radeon' started by Dener de Paula Pereira, Sep 25, 2019.

  1. Dener de Paula Pereira

    Dener de Paula Pereira Active Member

    Messages:
    59
    Likes Received:
    5
    GPU:
    Vega 56 Pulse
    i've undervolted my vega Sapphire 56 ( +1,5% frequency/900mhz hbm +50%PL at 950mv), it runs at ~1495mhz on core.
    My question is: why undervolting improves performance? and why all vega's didn't come to the end user undervolted?
     
  2. GeniusPr0

    GeniusPr0 Maha Guru

    Messages:
    1,236
    Likes Received:
    11
    GPU:
    RTX 2080
    because Vega is power limited by default at 1.2v. undervolting allows less power usage rendering a PL 50% useless.

    I've had about 5-6 Vega cards. One was a serious dud and couldn't undervolt to save its life. Thats probably why.
     
  3. Dener de Paula Pereira

    Dener de Paula Pereira Active Member

    Messages:
    59
    Likes Received:
    5
    GPU:
    Vega 56 Pulse
    So when undervolting i don't need to touch power limit?
     
  4. GlennB

    GlennB Master Guru

    Messages:
    207
    Likes Received:
    51
    GPU:
    Sapphire Vega 56 EK
    Undervolting lowers power consumption which in turn lowers temps. The higher P states function as normal as long as temperatures are fine, once it goes over it drops down one P state until temps/load allow for a better P state.

    AMD probably had problems with some early cards requiring these voltages to work properly so had to force these voltages on all the cards.
     

  5. JonasBeckman

    JonasBeckman Ancient Guru

    Messages:
    15,694
    Likes Received:
    1,699
    GPU:
    Sapphire 5700XT P.
    It varies extensively from one GPU to another and even goes back to at least the 7000 series and GCN 1.0 if not before though Wattman makes things simpler although testing and monitoring is needed as driver behavior changes and for Vega the overclock parameters have changed four times or so improving and getting worse both though I believe most of the quirks are now under control although behavior can also change depending on GPU workload and the API used from D3D9 - 11 to the low-level D3D12 and Vulkan.

    Reports since 19.6.1 also had Vega pulling a bit higher voltage than set in Wattman (Generally around a +20mv) and in 19.8.1 users reported that the voltage set was simply ignored and the GPU's were just going with 1.2v regardless or what the bios parameters were since this also differs a slight bit mainly with the Vega 64 liquid I believe at 1.250 instead of 1.200 as I recall.

    Thus monitoring via HWInfo or other program and getting data outside of Wattman itself for what the card is doing.
    (Or what it's drawing I suppose ha ha, well hopefully that's resolved.)

    And then whatever limits apply, some users get a -20 even when downclocking the cards and others can go up to almost -200 on voltage while also barely having to lower clock speeds at all though depending on Vega 56 or Vega 64, memory clocks and type the overall performance impact from actually lowering clock speeds isn't much or even noticeable at all until you drop it to more significant values though I believe this changes a bit with the Radeon VII although it's also using the more free states and scaling rather than the prior use of the P state same as Navi and likely later cards. (These do undervolt too and much the same so it's very possible to hit -100 to -150mv though again it depends on the GPU as everyone will behave a bit different plus overclocks or underclocks and again it doesn't have to impact performance in any perceivable way or by a very small amount for the reduction in voltage and temps and overall resulting power draw.)

    Would have been nice if the GPU's could have been 1.1v instead of 1.2v maybe hitting 1500 - 1550 Mhz boosting a bit as available but AMD went for near 1600 and a bit higher for Vega 64 and again the liquid version of this and then 1.2v though this works too but it does get a bit hotter depending on cooling and ambient temps and all that and also the power usage although with the Radeon VII and the memory situation not being as bottlenecked the higher core clocks actually amount to a bit more. :D
    (Well it can help Vega too but overclocking and 2 - 4% isn't a big gain though some manage a bit better too which again varies from every card because binning and overall silicon lottery as it's called.)

    For the Vega 64 Pulse and Hynix memory so not Samsung which had better parameters or so was the common behavior I hit a small but respectable 900Hz memory speed testing at +10 to +15Hz stopping first at 850 and then pushing it a bit higher and finding it good enough without hitting artifacts or errors where it works but performance actually degrades instead (Also important to test.) and then instead of let's see 1600 somewhere on the core I made a small reduction to 1550 Hz here.

    -150mv and it worked just fine though testing was with a -25mv reduction each time and after hitting 1050 instead of 1200 that felt like a good balance of lowering the voltage and keeping it a bit above where it likely could have gone if I min/maxed the parameters to the limits depending on if the card has a little moment and boosts or drivers require a bit more which worked until the Navi GPU arrived as a start of the next system build because it's about time really so GPU and CPU and then the rest will follow shortly after these. :p

    With this I repeated almost the exact same procedure too, from a silly 2100Mhz 1250mv what it was boost it could never realistically hit down to 1950Mhz at 1050mv and complete stability after eight hours of testing and performance only hitting a mild drop for the gains in junction and core temp plus memory up from 875 to 900Mhz which isn't much but the cards are sensitive and early drivers could be prone to errors or other issues so keeping it here for now.

    Memory might show crashes or artifacts or just a performance drop or stop scaling so it needs careful testing, core clocks and boost still hit around 2Ghz by default and this keeps it within a similar range just a tad lower thus the performance is fairly similar though the card has bit more reach than the Vega for overclocking a bit further if users would want to.
    (Well it can but I'd say the stock settings were kinda pushing it already for this card and what it positions itself as, Nitro and the Devil might be a tad more flexible but settings and overall limits are still what they are.)



    EDIT: Well memory sensitivity goes for Vega too and HBM2 but it mostly showed graphical glitches or corruption as long as smaller increases and testing and re-testing were done before it got crash happy and it also showed gains in performance due to Vega at least until Vega20 and VII improved the situation a bit due to bottlenecking but due to Hynix and Samsung types it could differ a bit what was achievable plus 56 voltage and 64 at a slightly higher memory voltage.
    (Still capped though despite Wattman saying memory voltage but it operating as a different parameter entirely for some earlier P state and possibly overall voltage minimum or voltage floor not memory but for GPU core.)

    So yeah regardless AMD is still kinda pushing the settings and voltge but it does allow every GPU to hit parameter specs and at least it's easy and flexible enough to be lowered down a notch and might not affect performance at all but sheering off in the hundreds feels a bit funny but eh it's a thing and it's pretty useful if the card is showing no issues with a nice reduction here plus lowering temps and power draw and improving core temps at least a little bit. :D


    And per the first paragraph it just varies from card to card so some hit higher undervolt results and others might not allow for as much of a decrease because silicon lottery and binning is a thing so every single chip is going to come out a bit different even if the variance doesn't have to be very big it can still affect overall parameters by at least some margin.
    (And the cards are pushed quit close to the limits already for clock speeds though there's room for both underclocks and overclocks and voltage adjustments too.)
     
    Last edited: Sep 26, 2019
    Dener de Paula Pereira and OnnA like this.
  6. Dener de Paula Pereira

    Dener de Paula Pereira Active Member

    Messages:
    59
    Likes Received:
    5
    GPU:
    Vega 56 Pulse
    Got it
    So i dont need to increase power limit in order to uv the card
    Thanks
     
  7. JonasBeckman

    JonasBeckman Ancient Guru

    Messages:
    15,694
    Likes Received:
    1,699
    GPU:
    Sapphire 5700XT P.
    Pretty sure it only allows for the card to draw more power if it needs to without throttling but then that also counters the under volt a bit by having it draw more from the PCI and if it doesn't need more power than stock there's no need to increase the slider at all.

    With the newer drivers it might also try to boost or push above the specified settings and boost or have small spikes so that might affect the behavior of this setting too if it keeps pushing higher clocks because there's additional headroom before the GPU starts throttling which includes the power limit although I would keep the slider here on stock as it's mostly from the default setting it even hits the default threshold and starts throttling.
    (And even then it's more like 10 or 20% giving it enough headroom and the full 50% is a bit redundant short of overclocking and the higher-end Vega 64 models allowing for drawing more voltage via their bios parameters at 1.250 instead of 1.200 being the cap here.)

    Can also work by setting a negative 10 - 20% too and ensure the card doesn't try to draw additional power but then it needs more careful measuring and comparing of how it draws or at least performs so as to not throttle or start stuttering or impacting performance if the GPU downclocks during busier workloads because it's hitting the power threshold now.

    Sigh well it's mostly the stock settings and AMD pushing the card for all they can which makes increasing this effective in the first place and why power draw can be measured down from what 350 almost 400 watt was it or maybe higher down to like 300 or lower depending on how the card takes undervolting and the power adjustment from this slider allowing for a bit extra though from how I remember the way this works is a very mild modification even if it says 50% and then there's a difference between the GPU core power and overall total GPU power draw and wattage but with how much the GPU can can usually be lowered it's still a noticeable total reduction which also ensures the card is kept a bit more cool and can also allow for a more quiet fan curve and such without hitting the temperature threshold where well GPU core has one value and then it's a bit below that for the HBM2 chips first loosening the timings and then clocking down. :)

    Let's see it's 65 degrees in Celsius for the HBM2 timings I think, 80 degrees Celsius for the GPU core in the older drivers and 75 Celsius in the newer ones. (And the "hot spot" temp is going to be a bit above that and then it's throttling around 105 to 110 Celsius but the usual GPU temp sensor won't be in the 100's or anything just because the hot spot is climbing near 90c or where it might end up at during heavier operations and full GPU load.)


    That's a bit extra though, benchmark and test the card under the settings and something like GPU Z or HWInfo can be used to see other sensor information if needed though temps and performance and of course stability also works really well for measuring how the new settings are doing and from there it can be fine tuned further. :)


    EDIT: Vega having some cards hitting 1.0v or lower while maintaining near or above 1500Mhz is quite something, my own results was 1.000v at 1450Mhz p6 and 1500Mhz p7 up to 1.050 and 1500 and 1550 when I was testing which I could probably have lowered voltage further without hitting any issues but that was pretty good results and it barely affected performance at all but then Hynix memory and running at first 850Mhz and then 900Mhz was holding back the GPU core a bit as well.
    (Then again even the Vega 64's get a whooping 6% or so increase from pushing the core and memory higher although the core clocks are also higher from stock so it's mostly memory and limitations here.)


    EDIT: Think the default p7 state I had for the Pulse Vega 56 was 1593 Mhz or 1597 Mhz so trimmed it down a bit and then increased it slightly landing at 1552 Hz (It always rounded up that last 2 for some reason.) though from my own testing it wasn't until dropping core clocks below 1400 Mhz that performance really started showing a decrease though it will differ depending on software and benchmarks in particular plus the behavior can also be more varied with DirectX 12 and Vulkan allowing the GPU to hit higher efficiency and not be held back as much compared to DirectX 11 :)

    Which also affects testing and how stable the card can be if it's right on the limits for when testing under D3D11 though it's usually not too problematic and a slight voltage increase can be enough but it's simple enough to test and monitor the card under both the older API's and these newer ones and see how it behaves.
    (And adjusting anything if needed.)
     
    Last edited: Sep 30, 2019

Share This Page