Failed/Bad Asrock Phantom Radeon 6800?

Discussion in 'Videocards - AMD Radeon' started by MrSleezeBag1964, Dec 29, 2020.

  1. MrSleezeBag1964

    MrSleezeBag1964 Member

    Messages:
    29
    Likes Received:
    3
    GPU:
    Radeon 6800 16 GB
    I was lucky (unlucky?) enough to buy the Asrock Phantom Gaming 6800 from Newegg about 2 weeks ago. However, I've had nothing but trouble and I'm not really sure why.

    TL:DR - System Randomly Crashes, and Crashes hard when gaming. Erratic time to failure. Nothing seems out of the ordinary. No OC. Temps good. Mobo VGA Error lights after crashes. Zero visual artifacts/etc. GPU fans will usually spike to 100% after system has failed.

    Recently installed my ASrock Phantom 6800. During games - any game really, except MS Flight Sim (so far), GPU crashes, Mobo VGA Error Light Comes on , GPU Fans spike to 100%. Operating temps all normal in software, and as measured with Infrared heat gun on heat plate. No junction temp above 65° ever seen in Radeon drivers.

    Computer runs totally fine outside of gaming. Can even complete Timespy Extreme in 3D Mark.

    Coming from an RX 480. (Damn good little card. 0 Issues in almost 5 years and paid $215!)

    Hardware:

    Asus X570 Tuf Wifi
    Ryzen 3800X
    32 GB RAM (3600 Mhz C 16 - SK Hynix Modules as Trident Neo)
    Seasonic Focus GX 750 Watt Gold PSU (3 months old)
    GPU connected with 2X PCIe 8-pin cables.
    (One cable is an aftermarket cable from Amazon due to giving my OEM cable to relative when building his first PC. Measured ~12.5 Volts on this cable using multimeter)
    (Replacement cables ordered from OEM Seasonic supplier today just in case).


    Software:
    OS - Windows 10.0.0.19042 Build 19042.
    Radeon Version 20.12.1

    Tuning:
    OC profile using 1Usmus Clock Tuner for Ryzen
    Manually input timings from 1Usmus DRAM Calculator. (Auto 3600mhz profile.)

    Troubleshooting:
    Ran AMD Cleanup Utility, re-installed Radeon. Re-seated GPU, moved PSU connection port, tried both plugs on the PCIe GPU side. Confirmed voltage with multi-meter. Nothing seems to have helped so far.

    Am I over looking anything? Last troubleshooting steps will be new OEM PSU cables, and complete DDU pass, along with trying card in the GF's rig. Otherwise, will have to seek Newegg refund process and back to shopping!

    Appreciate any feedback!
     
  2. BuildeR2

    BuildeR2 Ancient Guru

    Messages:
    2,899
    Likes Received:
    150
    GPU:
    MSI 2080 Ti GX Trio
    Well, you mention testing with no overclock near the start but then also say you've manually input DRAM timings using the Calc. Have you tested gaming with your RAM at JEDEC speeds? I don't have any personal experience with the AMD 5000/6000 series other than building and testing PC's for people, but they tend to be more susceptible to RAM stability causing issues with the GPU.

    If you have already done that AND re-pasted the GPU then you may have to go with the nuclear option.
     
  3. MrSleezeBag1964

    MrSleezeBag1964 Member

    Messages:
    29
    Likes Received:
    3
    GPU:
    Radeon 6800 16 GB
    Hi BuildeR2, after re-reading, that sounded way more confusing than when I was writing it. By "no OC", I specifically meant no GPU OC. When I first experienced crashes, I was simply using the Radeon Auto-OC setting and thought that may be a contributing factor. However, even after my Radeon re-install and going back to the normal performance profile, I don't see any added stability. I only notice that Radeon reports a max of about 228 watts of power consumption vs. 240 watts when the Auto OC is in use.

    Just to clarify, when you say JEDEC speeds, do you mean to just run the ram at the default 2133 Mhz? Or simply run it at the advertised 3600 Mhz without manual timings?

    FYI - I chatted with Newegg support and their GPU Refund policy is "Open GPU's do not qualify for a refund" However, they said "if the GPU is out of stock when a return is processed, they automatically change to a refund."
     
  4. BuildeR2

    BuildeR2 Ancient Guru

    Messages:
    2,899
    Likes Received:
    150
    GPU:
    MSI 2080 Ti GX Trio
    Yep. In my mind JEDEC just means whatever the motherboard runs the RAM at stock, or check for JEDEC timings via software like CPU-Z and manually setting those while you test for GPU stability. If that doesn't seem to make any difference you can start downclocking the GPU Core and/or Memory speeds by 100MHz at a time and testing. That sometimes helps narrow down if the GPU is having trouble even running at stock speeds. I'm guessing you haven't physically removed the cooler+fan assembly to check thermal paste coverage since that might remove your ability to return it?

    The reason I even bring it up is because my 2080 Ti was giving me months on end of black screens and CTD's even with temps well below acceptable levels. In the end I broke down and ripped the whole card apart to find that only ~55% of the GPU die actually had thermal paste on it, so the parts that were overheating were causing all the issues. It has been smooth sailing since then (and with high overclocks!) even though I technically had to break the "WARRANTY VOID" sticker that isn't enforceable here in the USA. Good luck either way!
     

  5. suty455

    suty455 Master Guru

    Messages:
    378
    Likes Received:
    152
    GPU:
    Nvidia 3090
    Personally I would look at the DDU option and fresh drivers both platform and GPU and return the system to default ie no external tuners just the standard Bios tune settings PBO etc.
    If you stress test on a loop with Time Spy what happens?
    Depends when your cables will arrive but it may be worth waiting on all this until you replace those, although you get good measurements the actual quality of the power signal may be poor, the resistance will also change as it heats up it does not take a lot to trigger a failure when a GPU has a imperfect feed.
    If you have one OE cable left I would run it with just that cable piggybacked to both plugs not ideal but at least you will have only OE cables, the seasonic should be able to handle that anyway.
    You could of course visit your relative and swap the Amazon cable with yours :)
     
  6. MrSleezeBag1964

    MrSleezeBag1964 Member

    Messages:
    29
    Likes Received:
    3
    GPU:
    Radeon 6800 16 GB
    Unfortunately, I know this will result in more of the same. The first night I had the card, I ran it with a single cable and both 8pins connected. It did a complete shut down pretty quickly after some game graphics settings testing. So I swapped it out and ran my old RX 480 for 2 more days until the Amazon chinese cable arrived.

    I couldn't do much last night, but I backed up my BIOS settings and returned all RAM timings and CPU OC settings to normal.

    I've been trying to log with HWINFO 64 while running Time Spy, but need to pay attention more. I ran Time Spy once (not the stress test) and it failed and hard shutdown. I forgot to click start in logging. I ran Time Spy again and DID log, but this Time Spy ran just fine and completed and I logged 2 hours of nothing after test was done.

    Will test more tonight, last night was too busy with pesky "real life" commitments!
     
  7. suty455

    suty455 Master Guru

    Messages:
    378
    Likes Received:
    152
    GPU:
    Nvidia 3090
    That suggests your PSU is either overloaded or has a problem, if your cable cannot deliver enough power when the GPU ramps up and spikes it can trigger the overload protection, can you test the GPU in another system? or remove everything apart from the basics you need to test the GPU, its best to go back to basics reset the Bios, remove all overclocks and tuners, DDU/New drivers for GPU and board and plug the GPU direct to the motherboard ie no risers etc.
    Of all the PC issues I have had over the years 95% of them could be traced to either a cable or PSU issue

    Your old GPU would use a lot less power than this one no doubt
     
  8. MrSleezeBag1964

    MrSleezeBag1964 Member

    Messages:
    29
    Likes Received:
    3
    GPU:
    Radeon 6800 16 GB
    Thankfully, my girlfriend has a good system I can test in. She's running a ROG Strix Vega 56 so we're confident that it can supply enough power for a 6800 to run.

    I emailed Seasonic support about determining if the PSU is bad. It is only 3 months old, however a natural disaster back in October caused some massive electricity problems for my house. Brown outs, rapid On/off of electricity, trees down on power lines catching on fire while covered in ice, etc. I was running my PC working from home when we finally lost power. Hers was off, so perhaps it was spared.
     
  9. Chastity

    Chastity Ancient Guru

    Messages:
    2,646
    Likes Received:
    862
    GPU:
    Nitro 5700 XT
    These are the reasons why I insist that having a UPS is required.
     
  10. MrSleezeBag1964

    MrSleezeBag1964 Member

    Messages:
    29
    Likes Received:
    3
    GPU:
    Radeon 6800 16 GB
    Correct, if it were older and out of warranty, I'd probably tear it down to see what I can find. Interesting about the 2080ti though... I guess even top of the line products can suffer manufacturing/assembly tolerance issues.

    I really should look at getting a few. Any you would suggest? My house was built in the 60's and I can tell my "old" house that was built in early 2000's had significantly better electrical regulations. I've recently bought new surge protectors for everything, but not UPS.

    While I'm pretty sure I've found the issue, I cant yet seem to replicate it. I did some more testing with a multimeter last night after Seasonic support replied about what voltages you should get on a PCIe cables (https://www.lifewire.com/power-supply-voltage-tolerances-2624583). I briefly saw over 16 volts during 3Dmark tests! I tried to do the same today and didn't measure anything over ~12.5v. Seasonic said they can RMA the power supply, so I'll probably go that route.
     

  11. MrSleezeBag1964

    MrSleezeBag1964 Member

    Messages:
    29
    Likes Received:
    3
    GPU:
    Radeon 6800 16 GB
    Forgot I snapped a few pictures. I'll admit I'm not pro in using a multimeter, but I doubt user error could account for these variances

    [​IMG]

    [​IMG]
     
  12. MrSleezeBag1964

    MrSleezeBag1964 Member

    Messages:
    29
    Likes Received:
    3
    GPU:
    Radeon 6800 16 GB
    The journey continues...

    Since last post I've ran a full DDU pass in safe mode, and I've installed a new power supply after going through the RMA process with Seasonic. They were even so kind as to let me upgrade to the 850 watt version for $20.

    However, upon running OCCT power test, still getting errors. System was stable without crashing, but still got ~1,000,000 errors in a 5 minute test. 3D Mark Timespy, VRS, and PCI Bandwidth tests all ran perfectly multiple times.

    Such a confusing problem Upon testing in Apex Legends, system crashes the second the game loaded. Black screen, motherboard VGA Error LED comes on. Requires full system restart.

    Upon restart, I spent about 20 minutes in Apex firing range and everything was perfect. Then I played Teardown for ~30 minutes and again no crashes. More testing to come.
     
  13. suty455

    suty455 Master Guru

    Messages:
    378
    Likes Received:
    152
    GPU:
    Nvidia 3090
    Is anything overclocked? Especially Ram? and have you tried swapping any Sata Cables if you have Sata Drives with the game.
    Try a small run of Furmark it doesnt stress the Ram or CPU as much mostly the GPU
     
  14. MrSleezeBag1964

    MrSleezeBag1964 Member

    Messages:
    29
    Likes Received:
    3
    GPU:
    Radeon 6800 16 GB
    Ahh, forgot to mention, I was running Clock Tuner for Ryzen, along with the suggested Mobo power settings. I restored all settings is BIOS to stock and no longer running the Clock Tuner for Ryzen profile. Also updated to latest Asus BIOS.

    The only thing that isn't default is RAM speed setting in BIOS. It is set to Auto 3600Mhz Infinity Fabric clock at 1800mhz.
     
  15. suty455

    suty455 Master Guru

    Messages:
    378
    Likes Received:
    152
    GPU:
    Nvidia 3090
    so you have XMP enabled in Bios? not sure if you have the XMP2 option if you do I would use that normally, go ahead and try furmark it does super load the GPU but it tends to leave the CPU and Ram alone but only DL from
    https://geeks3d.com/furmark/
    apparently there may be compromised versions elsewhere and see what happens just run it at your default desktop settings, any weakness in the GPU will quickly show up.
    If it Crashes again I would suggest your Card is bad but again try it in another pc with the same test or in another pcie slot on your board it could be the board as well, however from what your saying and looking at the stuff you have eliminated I had a Card do exactly the same once just random crashes for no reason at all when put under load yet it would then run the same application with no issues that was an Asus Rx480 Strix its RMA replacement was a gigabyte and didnt have any issues ever.
     

  16. MrSleezeBag1964

    MrSleezeBag1964 Member

    Messages:
    29
    Likes Received:
    3
    GPU:
    Radeon 6800 16 GB
    Furmark has generally seemed to run without any issues. That's the puzzling thing about this whole ordeal. Furmark, OCCT 3D test, and OCCT VRAM test have absolutely no problems at all. It seems that GPU alone can be maxed.

    Furmark Results:
    [​IMG]

    OCCT 3D Results (8 Minutes):
    [​IMG]

    OCCT VRAM Results (8 minutes, 90% load):
    [​IMG]
    [​IMG]

    OCCT System RAM (90%, 8 Minutes, Auto Threads & Instruction Set)
    [​IMG]
     
  17. JonasBeckman

    JonasBeckman Ancient Guru

    Messages:
    17,293
    Likes Received:
    2,628
    GPU:
    MSI 6800 "Vanilla"
    Going by Furmark (~1400 Mhz) the GPU seems to be doing the same thing that OCCT is doing here though being Furmark it is even more bottlenecked.

    It's holding the card back either from some way the GPU is working or from the various thresholds controlling clock speeds from voltage, power draw and temperature.
    In my case OCCT is pushing the junction temps over 80 degrees close to 90 in a prolonged test (So ~70 - 75 edge/surface temperature for the GPU core then.) as a result the card is scaling back.

    Doesn't scale back as much in gaming when hitting just the temperature stepping though but 2.0 to 2.1 Ghz is the best I've gotten it to sustain.
    In-game it's 2350 to 2450 Mhz cooler can't quite dissipate the heat to stabilize the higher speeds when the card is really being pushed. :)

    Furmark at least also power throttles and GPU testing using that is not very effective it's hard to fully pressure the card though you might be capable of having it consume a high amount of power or hit the voltage utilization see how that works.
    Wasn't a problem with the 5700XT but even backing down on the complexity the card is still not boosting with OCCT so I mostly do in-game testing.


    Watch_Dogs Legion with ray tracing enabled and driving around the river gets the GPU going (And everything in-game is way too shiny. :p ) but any Vulkan or D3D12 game capable of utilizing the GPU should work though the ray tracing effect is a nice extra to really hit the GPU hard and see how stable it is when it's working close to 100%
    (D3D11 not so much Vulkan I would think should also be a fairly good API version 1.1 and newer so the last year or two roughly.)

    Haven't managed to get the GPU driver to crash but I also haven't clocked it up to it's higher theoretical limits as most benchmarks land around 2.3 to 2.4 Ghz without some additional tweaks like More Power Tool so I started with that.


    Instead the GPU just clocks down and limits speeds long as the minimum boost clock is a bit more relaxed though since they can't be set too close to one another anyway it's not been any issue so far.
    Card starts throttling down somewhat once it passes 74 - 75 degrees though the TJunction limit is around 105 - 110 but the card gradually limits down before hitting the max value here.


    I would say the algorithm and way AMD is working this is a lot better than the 5700XT and it's initial behavior especially but a faulty hardware might throw it off a bit.

    Not sure if there's a good benchmark though at least it's not Furmark or even OCCT both are limiting the GPU from performing at peak so you don't get the best results in stability testing at least for the graphics card.
    (OCCT can still be a useful overall system test suite.)


    EDIT: So I usually find a good D3D12 or Vulkan API title have Wattman in the background (Borderless display mode.) and compare what it reads out and displays plus I can adjust it directly seeing where performance starts slowing down or at worst hitting artifacts or getting software crashes possibly even driver crashes if going too high with the settings at once. :)


    Idea is to get near 100% GPU utilization maintaining the full boost clock speeds and I figured ray tracing might hinder performance but also work really well to test stability giving the GPU a alternate workload to deal with though some games are already very prone to memory sensitiveness so clocks for that including also the motherboard and RAM with XMP profile active is a factor should stability be iffy even with the GPU running at stock settings.


    EDIT: For AMD systems there's also the matter of AGESA.

    1.1.9.0 is a beta implementation ASUS already pulled at least one version and replaced it with another MSI and a few other board vendors jumped on early too Gigabyte and some others are waiting likely for a newer code from AMD or for their own internal testing before risking public beta testing with these.

    1.1.0.0 Patch D has it's own ongoing issues and overall compatibility with the 400 series motherboards and 5000 series CPU's isn't fully resolved on either.

    1.1.8.0 was a beta and is now regarded as a poor choice so 1.1.9.0 should eventually replace this and the bios builds that implemented this despite AMD recommending against it.


    Might not seem like much but the SOC voltage and a number of other figures plus overall memory compatibility and Zen3 CPU compatibility even Zen2 CPU compatibility is touchy depending on the firmware code here so that complicates testing on AMD motherboards and the current code here. :)
     
    Last edited: Jan 9, 2021
    chinobino likes this.
  18. JonasBeckman

    JonasBeckman Ancient Guru

    Messages:
    17,293
    Likes Received:
    2,628
    GPU:
    MSI 6800 "Vanilla"
    Can't say much on the PSU though long as it's connected properly and then for the GPU if it's multi-rain, single rail or single rail with a multi-rail divider you also have the various recommended connectivity for some of the cables mainly the PCI-E ones.
    Plenty of power available and a newer one too via that upgrade so that should be fine hardware wise. :)

    5700XT was again really sensitive with this and the 6800 and 6800XT drawing more also don't like splitters and could run into the PSU shutting off but I don't think that's the problem here.
    Manual or a diagram on the PSU itself tends to show how that should be set if you want to test that.
    (Or if there's a cable or setting or switch for it there's also single rail mode so then this doesn't generally matter though again this is as long as it's not via a splitter type cable.)


    EDIT:
    AGESA and the code and overall memory compatibility can be a mess though, I have been fairly lucky with the X570 Master here however Gigabyte is running a few things above AMD standard values but it does help with memory compatibility though there are various known bugs and issues both Gigabyte and general AMD ones.

    1.1.0.0 Patch D would be my recommended for Zen2 though for Zen3 1.1.9.0 seems best if it's stable with the current implementation for those motherboard models that have this available already.


    EDIT:
    For XMP and memory I would apply the profile and if that boots there's no random error logs or anything popping up during testing that should be fine, I'm using a slightly higher voltage just in case and again Gigabyte pushes a couple of things like SOC voltage above normal defaults too.
    The boards can take it though, others well it's a bit variable what the values with 1.1.9.0 are and how well it's working but other than trying for 1866+ infinity fabric to 1900 or higher and DDR4 RAM to match it seems like it's working fairly well. :)

    If there are any problems though going by AMD's recommended would be where I would start so DDR4 3200 speeds testing that first up to 3600 if that's stable then 3800 after that.

    See what the kits are if that info is available (Micron E and that.) maybe then test looser timings or if a little bit extra voltage works but you'd need to be careful with what the motherboard actually uses once this is set as it can vary and you don't want to push too far on memory kits that can't handle higher voltages since XMP's already using 1.35 by default.


    There's a little mess of current ongoing issues and compatibility problems here is what I'm trying to say and even the newer bios code and latest AGESA firmware and related from AMD it's not entirely perfect and not just for those overclocking into above the otherwise maximum supported value of DDR4 3600 / Infinity Fabric 1800 either.
    Since AMD semi-promised that stuff about Zen3 and 1900 even 2000 IF speeds which I feel they should have not done because it's clearly very much hit or miss here and not really something that's achievable as a standard speed.
     
    Last edited: Jan 9, 2021
    chinobino likes this.
  19. suty455

    suty455 Master Guru

    Messages:
    378
    Likes Received:
    152
    GPU:
    Nvidia 3090
    Am not sure I understand what you mean by full utilisation so I just ran OCCT myself on the power test cycle and saw 100% utilisation across all 32 threads CPU and 99% on my 3090, I did see the ram on the 3090 only went to 20% used but the GPU core was flat out and boosted to 1985 Mhz with a 57DegC max on the GPU and 83DegCMax on the CPU under water a loop temp of 36DegC so I know it was working damn hard I never see those results in any Game or app but I do see more GPU Ram use.

    I personally am inclined to suspect you have a duff card and should RMA it, either of those programs should max out the card if my current and past results are anything to go by, it could be your driver sees the card and allocates the resource but the card cant deliver hence the crash either way its a headache you dont need after spending all that money on it
     
    JonasBeckman likes this.
  20. JonasBeckman

    JonasBeckman Ancient Guru

    Messages:
    17,293
    Likes Received:
    2,628
    GPU:
    MSI 6800 "Vanilla"
    Right I'm not too great at explanations. GPU utilization for OCCT does go to 99% - 100% but the GPU clocks are held back from hitting the full speed when it's boosting compared to in-game testing. :)

    For my results when using the 5700XT (Pulse model.) it pushed pretty aggressively in it's boost behavior especially the earlier drivers before AMD toned it down a bit so with enough power, giving the GPU the voltage it needs and having a low enough temperature testing can still hit the peak 2050 - 2100 Mhz here.

    6800 can't whereas it hits up to 2400 Mhz in-game just fine but OCCT 3D testing well my best is 2100 Mhz even with the junction temperature sensor under control it won't exceed that speed.
    I can't see the results for OCCT but the Furmark results are telling in the above screenshots it's throttling down a lot actually maybe too much come to think of it the card should be closer to 2 Ghz at least.

    But AMD has some heavier profiles for benchmarking software and if Furmark is OpenGL based you're not going to get good utilization out of it (Thanks drivers!) though OCCT might be capable but I have a feeling it too might hold back from the full speeds.
    (Furmark in particular also pulls a ton of power and voltage so that throttles it further it's not quite ideal though it can be a bit of a torture power draw or voltage test perhaps.)


    As a result the GPU isn't under maximum load and stress testing isn't as thorough and it might miss errors and faults that would happen if it was operating at peak performance. :)

    Scaling for how the GPU boosts and the numerous parameters also differ here it's a lot better but also somewhat more complicated for Navi20 but it's also a bit limited in the drivers and bios from what AMD is enforcing there.
    (Not all of this can be currently worked around but work on that is ongoing.)


    As a result if Furmark demands more power the GPU is going to throttle because the maximum limit is too low for anything more.
    You get 15% now instead of 50% extra allowed with Navi10 GPU's so yeah it will be a limitation plus the voltage cap and then the temperature as another factor that all have to be optimal for scaling to work at it's best.

    Well 20% for some of the 5000 series cards instead of the full 50% but this is still the lowest AMD has given and it's for the entire 6000 series though the 6800XT at least has a higher initial power draw target this scales from.



    EDIT: Ah and I do agree about going through a RMA process as well, might take a while to get a replacement but if it's covered under warranty and such terms it should be a straightforward process and getting either money back or a new GPU and this one should function without any issues.

    A small number of cards will be defective after all, unfortunately that's just something that is and it sucks when the one you pick up happens to be one of these.

    Best to just get it replaced not much more to be done and delaying risks closing the open purchase or return period depending on how your country handles this.
    (Two weeks here though you have the general two year standard warranty and such after that too if the hardware fails.)
     
    Last edited: Jan 9, 2021
    suty455 likes this.

Share This Page