970 memory allocation issue revisited

Discussion in 'Videocards - NVIDIA GeForce' started by alanm, Jan 23, 2015.

Thread Status:
Not open for further replies.
  1. alanm

    alanm Ancient Guru

    Messages:
    12,220
    Likes Received:
    4,407
    GPU:
    RTX 4080
    Could be something to it. Of course not the silly smaller bit bus or less physical vram arguments, but with the way the 970 handles vram once it approaches the 3.5gb mark. Its picked up steam at other forums and ManuelG a few hours ago said they are looking into it. A German programmer (Nai) has made a small program that benchmarks vram performance and we can see the 970 mem bandwidth tanking around the 3.3gb vs the 980:

    http://www.guru3d.com/news-story/does-the-geforce-gtx-970-have-a-memory-allocation-bug.html


    I'm not concerned because no game has let me down in performance (yet), dont play Skyrim with ultra texture packs or use exagerrated AA settings in games. Benchmark also seems to affect 780ti at last couple hundred mb of its 3gb vram as well (according to a 780ti owner who ran it).
     
    Last edited: Jan 26, 2015
  2. Headd

    Headd Active Member

    Messages:
    75
    Likes Received:
    5
    GPU:
    GTX970
    On ocnet forum there is explanation

    GTX970=208bit card.
     
  3. Undying

    Undying Ancient Guru

    Messages:
    25,205
    Likes Received:
    12,611
    GPU:
    XFX RX6800XT 16GB
    WoW, that is a hell of a bandwidth drop.

    So, nvidia was selling 3GB/208bit cards as 4GB/256bit? Oh, my... :D
     
  4. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Fresh user.....T_T....okay....
    Pill monster, could you take a look on Nai source code? See if there is any issue with the code? I admit I am not coding literate.
     

  5. demise

    demise Member

    Messages:
    18
    Likes Received:
    2
    GPU:
    Pulse RX 6700 10GB
    Testing methods are all over the place, so not really anything conclusive there. I personally won't bother testing until Witcher 3 comes out. Most of these other games are questionable console ports from Ubisoft or Shadow of Mordor which I can't be bothered to re-download.

    Interested to see how this concludes myself. Not too worried about it for the moment though. The 970 is still a massive improvement over the 560Ti I was using previously.
     
    Last edited: Jan 23, 2015
  6. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    No, they are selling 4GB/256bit card as there are 4GB physically and each of 8 chips have 32bit bus.
    But From their assumption it is quite possible that while some parts of memory can be accessed directly, others are accessed via shared switching infrastructure as crucial parts of GPU are cut.

    Get the link, while I don't do CUDA I can check for obvious deviations.
     
  7. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    http://www.computerbase.de/forum/showthread.php?t=1435408&p=16868213#post16868213

    Please do check, I appreciate it ^.^
    This is Nai's source code.
    The only problem, it is preferable to use IGPU and set GTX970 in headless display mode when running the benchmark. Otherwise, the result might be inaccurate due to web browsers and windows compositing reserving / using the VRAM.
     
  8. skacikpl

    skacikpl Maha Guru

    Messages:
    1,205
    Likes Received:
    594
    GPU:
    Inno3D RTX 4090
    MSI GTX 970 4G here:
    [​IMG]

    Right as it reaches 3rd Gigabyte, the bandwith starts going face down, ass up.
    The heck, NVIDIA?
     
  9. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Are you running the bench with nvidia gpu in headless display mode?
     
  10. Im2bad

    Im2bad Guest

    Messages:
    791
    Likes Received:
    0
    GPU:
    3080 Gaming X Trio
    Same card as above and can confirm as well.

    Certainly odd.

    Edit: Noticed someone had posted while I was testing. Started wondering and checked Afterburner, and what do you know: 4038MB of VRAM allocated during the test. Still, it's odd that the 980 didn't show it in the test.
     
    Last edited: Jan 23, 2015

  11. skacikpl

    skacikpl Maha Guru

    Messages:
    1,205
    Likes Received:
    594
    GPU:
    Inno3D RTX 4090
    No, not really but i could do that or retry the test with as much VRAM left free as i can.

    Generally i believe that there's something wrong here.
     
  12. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    =.= Sigh, when you guys run the benchmark, please mention if you are running it with NVIDIA GPU being put in headless display mode!
    Otherwise, Windows compositing + web browser will reserve some portion of VRAM and skew the result.
     
  13. skacikpl

    skacikpl Maha Guru

    Messages:
    1,205
    Likes Received:
    594
    GPU:
    Inno3D RTX 4090
    In normal scenario (games/rendering) nobody is going to run on IGPU, the drop in bandwidth is dramatic and i guess that something IS wrong here.

    Technically somebody could try running it clean just to prove it once and for all.
    Though, in normal usage, even if the drop in performance is expected - come on, DRAM drops from ~150 to ~16/~20, cache bandwith goes from ~422 to ~16/~25/~77. That much of a performance drop is suspicious.

    Also, i'm not an expert in VRAM allocation but i doubt that windows and normal desktop programs would anyhow impact last gigabyte of VRAM while leaving other three without any issues.
     
  14. Im2bad

    Im2bad Guest

    Messages:
    791
    Likes Received:
    0
    GPU:
    3080 Gaming X Trio
    If I understand that mode correctly, what you're asking for is impossible with single GPU configurations.
     
  15. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Code looks solid, no logical/math errors.
    1. Code allocates 128MB chunk by chunk till card runs out of memory (last sub 128MB block is not allocated). Therefore if you already allocate some when you run this code it allocates remaining memory and test should not be affected.
    2. There were some experimental rewrites. I picked 3 definitions. 2 of them are fixed ones while 3rd is based on previous two. And I am really not sure why there is not fixed value for 3rd as it is based on 2 previous and they are not altered by code.
    3. Only question I have is
    Code:
    __global__ void BenchMarkDRAMKernel(float4* In)
    {
    int ThreadID = blockDim.x *blockIdx.x + threadIdx.x ;
    float4 Temp = make_float4(1);
    Temp += In[ThreadID];
    if (length(Temp) == -12354)
    In[0] = Temp;
    }
    and its cache bench counterpart. As I am not sure what kind of overhead "blockDim.x *blockIdx.x + threadIdx.x " has. As those are CUDA related allocations. And where those are held.

    I would rather make something what allocates entire block, fills it with random incompressible data. and then bench some simple math operation over each chunk. Like negation since it is repeatable and always have same result.

    Would be much slower, would not give bandwidth, but would show if each chunk gets processed in same amount of time.
     

  16. Kashinoda

    Kashinoda Guest

    Messages:
    25
    Likes Received:
    0
    GPU:
    AMD R9 Fury
    Wouldn't most people have IGP on their motherboard? Except for maybe older i7s. Easily tested.
     
  17. Im2bad

    Im2bad Guest

    Messages:
    791
    Likes Received:
    0
    GPU:
    3080 Gaming X Trio
    Of course, yes. Easy to forget that fact when you don't really ever need it.
     
  18. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    As you are testing, I would ask you exactly opposite thing.
    Do not try to have minimal allocation before you start test.

    Do allocate even 1GB of vram before test, and terminate game after test allocates whole remaining 4GB.
    (if it allocates it only during bench itself and not in earlier part where it pauses, then kill game as soon as chunks start to get tested).

    This way you clear additional space which should not be allocated by benchmark.

    And if test does not show drop in performance after getting additional vram, issue is due to overhead.
     
  19. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Hmm, I see....Perhaps you would like to tell this crucial information to Nai at that forum. I can't speak german.......:bang:
     
  20. sykozis

    sykozis Ancient Guru

    Messages:
    22,492
    Likes Received:
    1,537
    GPU:
    Asus RX6700XT
    Why should people have to alter their configuration just to test a theory? Especially a theory that can not be definitively proven?

    While I trust Fox's evaluation of the source code, even he can't guarantee that there is not a flaw somewhere causing "unusual" results. If there was actually a "flaw", then all the results would fall within a respectable margin of error, which based on the screen shots posted here is not happening.

    Using CUDA to demonstrate a flaw is already creating a flaw in the testing to begin with. CUDA is an NVidia IP so it should not have been used and any results derived from a CUDA based test should be disregarded.
     
Thread Status:
Not open for further replies.

Share This Page