Hello everyone Since the moment i assembled my pc 5 years ago i ran into quite a few problems with my r9 fury, insane texture streaming issues and z-buffer issues in almost every 3d environment, coupled with constant ctds and random shutdowns. solution that worked for me ( originaly ) was to use 15.7.1 catalyst drivers with a semi-deleted drivers from 17.11.2 crimson, on top of forcing every registry line with EnableUlps into 0, simultaneously launching clockblocker, and msi afterburner undervolting and underclocking only to get it work ( what was funny though, is that benchmarks ran perfectly fine and presented 0 errors ). And after roughly 3 monts of rectal intercourse with my pc i was finaly able to use adobe premiere / d3d11 games / 3d aplications without crashing to desktop for longer than 15 minutes up to 8 hours and more when needed. Though recently i had to replace my fans on the gpu. Recently i got this new issue which is basically what i had when i first assembled my pc. After a while (8 - 15 minutes) 90% of 3d aplications crash and if i check log it just says: Display driver amdkmdap stopped responding and has successfully recovered. And also a system freeze if you're unlucky ( but the sound is fine and i can hear the stuff is running or even use skype/discord, so i thought it's a driver issue ). MSI afterburner shows normal temperature never going above 83 degrees. One thing i noticed is that crashes frequently occur when gpu load has a sudden spike to 100% then back to 0% then again 100% and then display driver crashes and restarts. Voltage goes spiking down then back up too ( though it begins with spiking down for like 2.5 seconds ). It's 100% not a temperature issue cause benchmarks still do perfectly fine and besides temperature never goes above 85. Next on the line was voltage - i tried to undervolt - no result, tried to crank up voltage - no result. Reduced core clock from 1000 to 800 - no result. It looked to me like it's a ULPS issue or one of those amd power saving tricks, but i thought i disabled it in register, and clockblocker is running too which is supposed to prevent underclocking that amd loves so much ( for reasons that elude me, even though every friend i know has issues with underclocking ). After like tinkering with settings for 100 hours i get stuff to work, and i have no crashes in 3d apps untill i shutdown my pc, then it's crashtown all over again. I thought it might be my PCU, but why on earth does it work after i dance around it for 20 hours then? Thought it's faulty ram - still nothing. Asus Z170 Pro Gaming R9 fury 4Gb HBM with Strix cooling Core i7 6700 OEM 3.4GHz DIMM DDR4 16 Gb ( 4 x 4Gb ) Kingston HyperX Fury Black 2133 MHz Windows 7 Ultimate PCU: Thermaltake TP-850ah5ceg-a 850w WD2003FZEX Western digital black 2 Tb 3.5" 7200 RPM Hopefully someone could help me solve this mystery, or at least tell me where to look for a solution.
I don't have anything to help you solve the issue. My question is, you never tried to RMA the card? That's the first thing I would have done.
Well it has been over 5 years since purchace and proving that it's faulty from the get go is a juristical hell that would cost me 5 nuclear submarines. Is there any kind of diagnostics tool that would carry it's stats over through the crash? So that i could spot at least if it's the PCU or GPU or drivers or whatever the hell this thing is.
well, thing is, back when i was first assembling my pc, I tried both windows 10 and windows 7. Windows 10 just had an awful performance overall, and the more i hear from my friends - no one is happy about their choice with windows 10. And back then Windows 10 had even more problems with my rig, more lag spikes and even more graphical issues, along with random weird software issues like just deleting parts of drivers ( and i'm talking about the official licensed copy, surprisingly, pirated copy had less issues but was a far cry from windows 7 performance which just crashed without deleting stuff on my pc ). That was actually the reason i looked at windows 7 in the first place. I might not have if 10 worked fine
You need to do a fresh install of windows, to make sure it isn't a software problem as mentioned, a pain but neccessary, otherwise you will only continue to fight an unstable system,wasting your time, without having an good idea as to why.
How have you had the Fury for 5 years? They did not launch until June 2015. Not even 4 years old yet.... Win 10 runs fantastic for me, and all of my computers with it, even my trusty old q6600 rig that is 11 years old. Ive had a fury and even in a crappy airflow case, it never went over 63c with a custom fan curve without undervolting. You have a dud card if anything if its not PSU related, windows related or driver related. Simple as that. If the gpu does it in another rig, its the card sadly.
I would say it is in power delivery. Few points: - 83/85°C is not fine for Fury(X) because you have HBM right next to it using same heatsink. (In worst case scenario HBM does not even touches heatsink.) - Crash in "load spike"/"clock spike" hints that power delivery does not react fast enough and is not able to provide sufficient voltage in that spike. (Clock blocker/static clock "fix".) - Been OKish, stopped being OKish (Physical manipulation of some component. Maybe heatsink moved?) - From tests PSU looks solid 1st, I would check how well plugged are cables in between PSU and Graphics card. If OK, then I would inspect actual pins on those cables used. If OK then I would try to use different sockets on PSU as it has 4. (Btw, are you using 2 separate cables or one split at the end?) 2nd, did you ever took heatsink down or wiggled it a lot? (fan replacement mentioned, and thermals are way too high) - Even experienced people managed to do some (irreparable) damage to HBM traces on interposer. - Did you apply TIM on GPU and HBMs? 3rd, are you using original vBIOS or some modded one for any reason? - not saying it is bad, but we should be aware since unlocking bad shaders may or may not cause this. 4th, do you have errors in case you run memtestCL on GPU? (There is broken version floating around which shows millions of errors. So in case way too crazy count is found immediately, look for fixed version.) Here is some nightmarish image of what may be in between heatsink and GPU/HBM on your card.
For me, since some 18.x.x drivers every monitoring program can't properly show GPU usage, even Radeon Overlay. Yeah it goes from 0 to 100% and back to 0%, but performance is fine. You need to lower temperature. It's really high. My card goes throttling when reach 80°C, which is very easy on default, because Sapphire Intelligent Fan Control is stupid as..., run only central fan.
Funny thing is I actually did change TIM on GPU, and i did remove the cooling to do that and that special gpu paste was kinda solidish-whacky looking but not as bad as it's shown in the picture, but why wouldn't benchmarks show the issue then? I might have wiggled it a bit too. Sockets are sound idea, didn't think of that one. 2 seperate cables that split into 6 and 2 at the end. I'm not using modded vBIOS, not that i know of one at least, though i did thought about starting to. wires seem to be solid enough ( not that i have voltmeeter lying around to check ), they hold pretty firmly Did memtestCL, Video Memory Stress Test - 0 errors on both, used that FurMark Gpu stress test for over an hour top temperature 82, average 79, core clock around 500 - 650 memory clock 500, stable 82 fps for over an hour with no issues whatsoever.
Yea well, thing is it's sudden, usually it just goes like 50-70-90 then it keeps at around 99 for like 5 seconds drops to 98 for 1 second and then back to 99. It begins to spike only when it's about to get itself crashed and it happends in random time intervals for like a split second things look normal, then out of the blue spike to 0 then back to 100 then back to 0 and bam - crash, i can play or do my job for 20 minutes and it's fine, then it crashes, or it crashes at start, or after like 5-6 minutes.
Actualy it's even less, it's been out for like 3.5 years, doh well seems like time flows really slow for me oops. Well I wanted to use win 10 from the beggining and like i said i had a lot of performance issues with it, maybe things have changed since then but back when i tried it I couldn't get it to run at all, neither licensed copy nor pirated. About the card well... most of my friends either have notebooks or imacs sooooo... it's kinda hard to just bust at their place and ask them for a spare PCU or ask them to shuffle gpus to check them. Besides right now it's going to be pretty difficult to just reinstall the system
tried to record what's happening during crash with MSI afterburner, was succesful to an extent: https://imgur.com/lYYaTSP everything looks kinda fine: voltage seems fine it's ( Энергопотребление ЦП ), when it crashes it goes to just 32. But as you can see, GPU load goes for steady 100% then for some weird reason it just drops down to 0 and just crashes the entire thing. There was a small temperature spike but why on earth does it drop it's gpu load to 0? Especially when voltage during the event was at 39??? Maybe my MSI afterburner feeds me wrong info but i'm not sure myself but it doesn't look like it's the voltage issue. Maybe it's drivers fault? I have no idea at this point
unstable GPU => driver reset => 0% utilization => lower temperature That "Voltage" of 32 is magical number. What is its unit? 1/30th of Volt? Even if I ignore whole bad units in Voltage... It is clear that As temperature goes up, Voltage goes down. Graphs are crystal clear about it.
Power consumption is shown in Watts if i'm reading this right, any way to check what is going on with power? Or it's an overheating issue? Found a way to record GP voltage, maybe that'll do the trick But then again, ok, if it's an overheating issue, why didn't it crash on a benchmark?