Discussion in 'Videocards - AMD Radeon' started by OnnA, Aug 15, 2017.
FWIW I noticed some strange behavior regarding GPU crashes/recovery during my initial stress testing phase of my V56.
When the GPU crashed and recovered I did much as you have, reset mostly the same variables and continued testing.
However, I noticed that even with the same settings the card were no longer performing the same.
Not uncommonly the idle VID increased from ~775/800 to ~900/950 mV.
Regardless of further stability issues the card performed measurably, and consistently, worse in Time Spy despite running identical settings as pre-crash.
Reclaiming the lost performance required a reboot and settings reset.
I'm not saying this is absolutely what's happening on your end but it may be worth benching some to find out. My performance disparity wasn't huge, 150-250 points in Time Spy or thereabouts, but it was consistent.
As for the reason I can only speculate but it seems the GPU gets put into some sort of "safer" power state after a crash, unaffected by the regular settings in WattMan or the like.
So it's possible that the reason you're no longer having issues post-crash is because the GPU is running at a different power/performance level than before.
When it comes to the reason for the initial crashes it kinda sounds like a power spike, Vega (or any GPU for that matter) can draw notably more than the average load consumption for a very brief moment and this can cause some PSUs to be unable to keep up. Or even hit overcurrent protections in some bad instances. The reason this would be my first guess is you mentioning cold boot, at ideal circumstances like that the GPU is more likely to want to turbo up to higher clock/voltage states than after an extended run because it sees more cooling headroom. In addition, that happening on initial launch of the benchmark is also telling.
YMMV, hopefully some of that is useful to you!
If crashed (only in 3Dmark BTW)
then im always going for re-apply settings both Wattman & OC tool.
No disparity then
Most likely a power spike and card is hitting overcurrent protections of some sort. My card is the same way. After a restart or reboot(or sleep mode) it will run about 10-15mhz(1655mhz vs 1670mhz) higher clock then spike to 1742-1747mhz with a 1180-1190mv reading and crash the driver. But after the crash I just reapply settings and its stable at 1650-1655mhz for as long as I dont restart or reboot or sleep mode. I have my PL at 30% if I drop to 15% issue goes away. But at that PL level my core jumps alot more (1600-1660mhz)
On a side note if I do a PPT mod and set it to 142% with 1.23v I can run stable at 1675-1700mhz without that issue popping up. But its loud and hot at those setting.Fans at 85-100% and hits 70+c on core and 78-80c HBM.
Well, if that is so then I have nothing left to do right? I don't see a power spike, more likely a clock spike.. for exemple, today I turned on my PC, run R6Siege and run its built-in Benchmark, almost instantly I got a crash, game closes, gpu driver reset, HWiNFO showed 1770Mhz on core clock (P7 set to 1727 - 1168mv | PL 24%). So I re-load wattman settings, run the game again and I was able to play for about 2 hours straight, with a very solid core clock.
After that, I just restarted Windows and I was able to run R6Siege built-in Benchmark for 3x, no crash, clocks close to 1700mhz..
So thats give me a hint that the problem only occurs on cold boot, but thats a pita, how come will I be able to run the card like that? everytime I turn on my PC, I'll have to first run some 3D benchmark to get driver crash and than re-load wattman settings and then I'll be able to use the PC... ??? thats sad! (which is what I'm doing for the last couple of days)
I dont see any performance drops (I'll do more testing later), I'm getting around 1680Mhz-1690Mhz with P7 set to 1727Mhz, so it`s quite acceptable, and Mem at 1100Mhz
Spoiler: HWiNFO - 2 Hours Gaming
I'll try PPT later since I have my card on water.. but I had no clue how to do it, I'll search about it..
You have to restart after applying settings for the first time In Wattman, otherwise you get crashes due to voltages not being correctly set.
As already pointed out, this can bee seen by looking at idle voltages being much lower if booting with default settings in Wattman. However, once you boot with settings already saved, you can make changes on the fly, and even if you get a driver crash.
No lucky for me, I DDU, reinstalled 19.3.3 driver, set settings on wattman then restarted, did 3d benchmark test and got a crash.. after the crash, reload wattman settings, re-run benchmark and everything went fine.
All this with 220W bios (switch towards to display ports).. I'll try with the other one later
Either way, settings entered should be stable after a reboot.
If they are not, you'll just end up going around in circles whether it's because of incorrect settings or a software quirk.
Also, the profile needs to be saved after rebooting.
My Sapphire has a 240w bios, same thing happens. I did some testing to see if I could pin point issue. So far Im thinking its a power issue.
So I started with default settings and was getting 1475-1480mhz stable in games. So then I bumped memory up to 1100mhz this caused clocks to drop to 1380-1385mhz in game. Noticed wattage was hit the 240w bios limit on the perf. overlay. No crashes. Then I started to undervolt. This dropped the total wattage down which pushed clocks up to the 1475-1480mhz again.Bumped clocks to stable 1540-1560mhz, but again reached the 240w. So then started with Power Limit. Went to 10% adjusted clocks(P6email@example.com/P7-1700mhz@1145) to a stable 1605-1625mhz in game. Hit 264w No crash. Bumped to 15%(276w) power limit. Crash. Messed with clocks and voltages at that point and dropped voltage seems to fix issue but no clock increase. Bump to 20%(288w) crash. But like you said. Reload settings and its stable. So I think the power limits is spike clocks/voltage some how and tripping some safety limit. When I crash I have same clock spike 1740+mhz but notices inHWinfo that the voltage maxs out to at 1182-1190mv.
Just tested again with following:
Runs fine for 2-4 min was at 1620mhz@1031mv 288w then spiked to 1725mhz@1138mv 289w crashed. Reloaded setting stable for last 30 min gaming.
I’ve beaten my card OC record! 1600-1650MHz depending on the scene of Superposition.
My Vega 64 has the stock BIOS (300W maximum power limit, so can’t give more voltage to be stable at higher clocks without throttling for power limit), and it’s an awful overclocker (but awesome undervolter). By the way, 4673 at my daily UV settings (stock freq/950mV core, 1100Mhz/860mV/level2 timings mem; 170-180w)
-GPU: locked at P8, 1702MHz, 1150mV (anything beyond that crashes, I tried with various MHz/mV settings)
-Power Limit: +50%
-Mem: Level 2 timings, 1100MHz, 960mV (could go a bit lower on voltage, but I was tired. Even 20MHz more on the mem and my card crashes).
As to underline how awfully my card OCs, +7.5% score requires +75% power
Best for Vega uArch is:
HBM2 1080->1135MHz (Lev.1)
Good results on my XTX is (AC:O, Custom settings w/reshade, FPS in mind)
1717MHz/1.094mV | HBM 1150MHz/975mV, POW +1% or 0% (Pow. Spike is at ~180-214tW)
Holy moly, you've got a good chip!!!!!!
Last batch of original XTX LE + it's pre-tested for me (best of 5 tested by the shop)
I will change it for Arcturus HBM3
I'm starting to think that the problem has to do with Power Limit (bug driver or something like that).
Stock Bios (220W) + PL 24% = 272W
With settings already said above (Clock/Voltage) I have that situation:
-> 3D Load on Gpu, Drive Crash, Re-Load Settings, I'm good to go
Now I applied SapphireRXVega64.8176.170811(LC) PowerPlan Table via Reg, so:
Stock Bios turned out to be 264W, so I left PL 0 % with the same Clock/Voltage settings
-> I don't have any problems at all with a Cold Boot + Settings applied on Wattman AND get basically the same performance as 220W+PL 24%
I played 2 full ranked in R6 Siege with no problem at all (264W PPT reg + PL 0)
Then I changed PL to -10% to see how it would perform and... CRASH! P7 spiked to 1760Mhz... definitely all of this had to do with PowerLimit.
I'll not even bother to messing with the other bios like I said.
Since the 19.3.3 driver update my V64 LC Sapphire can hit the same UV setttings from Onna.
Stable, without any kind of problems.
Anyone have an idea of how or why these cards spike to above set P7 limit when OCing with PL increased which causes driver crash? I have card set to a 1702mhz@1130mv and it will spike to 1726mhz@1190mv then crash. anything above 1%PL causes this issue. Looks like it jumps about 15-24mhz over whatever P7 is set at and just under the P7 voltage.
Well updated to new drivers.... and OCing is worse. Now it spikes and crashes driver with no PL.
P6-1662mhz @ 1050mv
P7-1702mhz @ 1150mv
Mem 1100 @ 975mv
Spike crash max 1745mhz @ 1138mv
Card is starting to piss me off. Almost wishing I didnt leave Team Green...
Just wondering if it's the spike that causes the crash, or the crash that causes the spike?
It's an unstable overclock, which results in a crash.
Pretty standard behaviour.
IMO it's too much OC.
Try this settings instead:
P5 1550MHz 1.00mV
P6 1612MHz 1.094v or more if needed.
P7 1692MHz 1.125v ^^
HBM2 set it to safe 1080MHz at 975mV
POW at 0% for testing then adjust no more than +25%
Hard reset crash -> Way beyond stable OC
Silent crash (Black then pulpit/desktop) You're close to stabilize OC.
Just seems like its gotten pickier with OC and UV. Latest drivers mad UV worse. All my crashes have been silent crashes. Its odd though. Setting that were stable (4hr loops in Heaven/Timespy) now crash 30 sec in. Also new to this card so dont know what standard behaviour would entail. Learning though...lol
So I found these stable last night:
In games/benchs- 1633-1688mhz
No crashes. But its get hot: