Vega64 FE woes ("Thread Stuck in Device Driver")

Discussion in 'Videocards - AMD Radeon Drivers Section' started by A2Razor, Feb 18, 2019.

  1. A2Razor

    A2Razor Guest

    Messages:
    543
    Likes Received:
    110
    GPU:
    6800XT, XFX Merc319
    Hello all,

    Finally upgraded and retired my old Fury. Need a bit of help here. Recently managed to get my hands on a "new" Vega FE Liquid at a steal of a price. Since installing that card, I've been having non stop "Thread Stuck in Device Driver" BSOD's.

    Win10 1809
    ASROCK X399 Professional Gaming
    Threadripper 1950X (@ stock, tested stable)
    64GB DDR4 2666 ECC (@ stock, no errors)
    BenQ XL2730Z, also tested an Acer display.
    Vega FE Liquid <== The new upgrade.

    Problem: Anytime that the display enters or exits standby mode (OR is connected to the PC) there's what I'd call a 50/50 chance of having the above BSOD (Thread Stuck in Device Driver). The card is otherwise 100% stable in games or on the desktop as long as the monitor connection is not touched and the display is not permitted to enter standby or otherwise turned off.

    Locking clockspeeds at the maximum via wattman or ClockBlocker (cough) does not improve those odds.

    So far tried 18.8.2 & 18.9.3 (Game Mode), Q1-2019 pro-drivers, 18.12.1 (as professional mode), 18.12.2, 18.12.3, 19.1.1, 19.1.2, 19.2.1, 19.2.2. Same situational on them all. I've also tried the use of DDU, cleaning up between installations, and a fresh install of 1809 before the latest attempt with 19.2.2. Different DisplayPort cables, and an HDMI cable as well.

    As a last ditch effort, I updated that ASROCK board to the 3.50 bios. (did nothing)
    Lowered the PCI-E slot to 2.0, and also tried different PCI-E slots (physically moving the card).


    Any suggestions short of testing the card in other computers?

    It seems strange if it's a dud considering it's load stable, including in low-load games where the clocks bounce all over the place. Pretty stumped here.
     
    Last edited: Feb 18, 2019
  2. Unwinder

    Unwinder Ancient Guru Staff Member

    Messages:
    17,083
    Likes Received:
    6,567
    Try to disable ULPS
     
  3. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Since you have pretty high rate at which this issue occurs, I would look at:
    - Windows: just do clean install on some media and test there
    - PC and monitor on same base: sometimes there is issue that one device is behind UPS or other device which alternates shape or phase of current and another device is not being fed same way=> usually causes more issues with sound
    - since you mentioned "display... OR is connected to the PC": try that in BIOS to see if there are conditions under which issue does not exist
     
  4. A2Razor

    A2Razor Guest

    Messages:
    543
    Likes Received:
    110
    GPU:
    6800XT, XFX Merc319
    This was a good call, yet sadly doesn't seem to have done the trick. (at least it's very easy to reproduce for testing)

    I also logged the reported clocks this time around (with and without ULPS) while a monitor is not connected. In both cases it looks like the card would throttle down to 27mhz core, 167mhz HBM if nothing else is enforced. It does seem that my OpenCL approach doesn't work with ULPS active and without a screen attached, I'll have to look at that one.
     

  5. A2Razor

    A2Razor Guest

    Messages:
    543
    Likes Received:
    110
    GPU:
    6800XT, XFX Merc319
    Fresh install of Windows has been tested, though I'm open to trying drastic stuff as I have plenty of backups. (have not tested earlier versions of Windows)

    Both the computer and monitor are behind an 'APC Smart-UPS 2200VA LCD'. By the "OR is connected to the PC", I mean more or less if I plug the connector in. eg, hitting the power button on/off on the monitor will eventually cause a crash. Same goes if I pull the monitor cable and reconnect it repeatedly.


    - The crashes only occur if the AMD video drivers are installed. If I'm running standard VGA drivers, then there are no issues with repeatedly reconnecting the display or power cycling it. (same applies prior to the OS, such as during post or in bios)


    ** Decided to test an old 1080p screen on an HDMI to DVI connector. Low and behold, no crash. So, this isn't happening on "all" monitors that are connected, only my 144hz freesync displays so far.
     
    Last edited: Feb 18, 2019
  6. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Yes, "OR is connected to the PC", I understood correctly.
    So test this behavior in BIOS:
    - enter BIOS
    - unplug/turn OFF display
    - reconnect
    - repeat to observe crash
     
  7. mikeysg

    mikeysg Ancient Guru

    Messages:
    3,286
    Likes Received:
    740
    GPU:
    MERC310 RX 7900 XTX
    I recently had this same crashing issue with the green screen message 'Thread stuck in device driver,' I'd found the cause in the strangest place. One of my HDD's was failing, and my system kept rebooting with that error message. I'd replaced that particular HDD and haven't had that particular problem since. I don't know what the heck a failing HDD has to do with that particular error, but hey, live and learn.
     
  8. PrEzi

    PrEzi Master Guru

    Messages:
    723
    Likes Received:
    585
    GPU:
    XFX MERC310 7900XTX
    Do me a favour mate and lower your FS Monitor's max refresh rate to something like 90Hz and retry. I also have an 144 Hz FS monitor and don't have that issue, but since mine is capable of doing 34-90Hz range I am limiting it to 90Hz tops (in windows refresh rate settings of the monitor).
     
  9. A2Razor

    A2Razor Guest

    Messages:
    543
    Likes Received:
    110
    GPU:
    6800XT, XFX Merc319
    Tried changing the freesync range to 34-90 using CRU, and also to 40-90 and 40-60. (40-144 being the normal range supported)
    - Sadly seems to still cause the BSOD, however, yet I've found that if I remove the freesync support completely -- BSOD's stop on display detection (as in, I can unplug and reconnect the monitors hundreds of times without a crash).


    So no freesync = no crashes.

    Yep, it's definitely something specific with my Threadripper build. Tested my old E5 earlier and I find that the Vega FE works flawlessly (no crashes with freesync enabled displays connected).
    I've tried swapping their PSU's (so that's been ruled out), and disconnecting everything unnecessary from the 1950X. (one drive, 8GB memory, just the videocard) Even with it stripped down to the basics, it's still happening.

    May have to try getting my hands on another brand motherboard... The strange part is that the Fury-X is completely solid in there (including with freesync active), same with my older Geforce 980 (on GSync).
     
    Last edited: Feb 19, 2019
  10. PrEzi

    PrEzi Master Guru

    Messages:
    723
    Likes Received:
    585
    GPU:
    XFX MERC310 7900XTX
    I also have an TR 1950 and 64GB Ram @3200 (and a Vega 64 Liquid) and no crashes on my side. But like I've said - I have set/locked the refresh rate in Windows to 90.
     

  11. Erick

    Erick Member Guru

    Messages:
    127
    Likes Received:
    21
    GPU:
    RTX 3080 Ti 12 GB
    Oooh....this is pretty bad. You should throw this directly into AMD's Red Team forum...unless you already have.
     
    A2Razor likes this.
  12. A2Razor

    A2Razor Guest

    Messages:
    543
    Likes Received:
    110
    GPU:
    6800XT, XFX Merc319
    In more experimenting I've also found (so far) that my locking of clock frequency was insufficient, and that with HBM clocks now locked at 945 -- I am unable to cause the BSOD on connecting monitors with the 1950x rig..
    Had a read through this thread of which alot of the replies sounded mighty similar to my situation: https://community.amd.com/thread/223844

    May go for an RMA, yet considering it works on the E5 properly with no adjustments, that sounds pointless or that it's not the card persay.


    Played with Soft PowerPlay through registry and tried kicking the speed of the lower HBM states up in addition to adjusting the lower voltages (raising them) to match state 4. Yet this seems to result in the system hardlocking immediately on startup (doubt that the voltage is actually changed).... For now I seem limited to the game-mode drivers for ADL.

    Is there any way to disable lower VRAM states via softpowerplay? (eg, where the card runs at a constant 945 HBM even in P0)


    - Goal is to make this "permanent" where a TDR, etc, cannot cause the settings to be wiped.


    EDIT: Changing table indices rather than directly touching voltage of lower states gets Windows to boot ... sometimes. Still not consistent, missing something here. changed drivers back up to the latest, **WORKS**.


    Just did the other day, no responses yet though.
     
    Last edited: Feb 22, 2019

Share This Page