970 memory allocation issue revisited

Discussion in 'Videocards - NVIDIA GeForce' started by alanm, Jan 23, 2015.

Thread Status:
Not open for further replies.
  1. Pill Monster

    Pill Monster Banned

    Messages:
    25,214
    Likes Received:
    8
    GPU:
    7950 Vapor-X 1100/1500
    Just to clarify, what that OCN guy said regarding fresh users wasn't directed at you specifically, nor did I mean it that way. :)

    Instead it was illuminating a theme which seems to be constantly reoccurring in this situation. Similar comments have also been made on AnandTech.....


    And I don't know anything about coding so I can't help you there sorry. Coding is one of the things I know least about tbh.
     
  2. skacikpl

    skacikpl Master Guru

    Messages:
    344
    Likes Received:
    111
    GPU:
    MSI RTX 2070 8G
    I'll try that.
    //
    Initial test - Lords of the Fallen, maxed out (3.5Gb VRAM used), benchmark crashes right away.
    BF4(1.1GB Vram used) ran alongside the benchmark
    BF4(1.1GB Vram used) killed ASAP after bench allocates memory.
     
    Last edited: Jan 23, 2015
  3. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,810
    Likes Received:
    3,363
    GPU:
    6900XT+AW@240Hz
    You are right, I do not know inner CUDA workings. That is why I want to preallocate 1GB, let test to allocate remaining 3GB.
    Then Kill 1st GB of allocation leaving free space for bench itself. If it indeed has CUDA based memory overhead 1GB of vram should be enough to accommodate it and test would show only 22/23 chunks, but they would all perform well.
    If even with free 1GB block test shows that end blocks are performing bad then it is not caused by code.
     
  4. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,810
    Likes Received:
    3,363
    GPU:
    6900XT+AW@240Hz
    Likely crash because it could not allocate even 0th block and then tried to run bench on it. Because there is no protection which checks if even one got allocated.
     

  5. JohnLai

    JohnLai Member Guru

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Thanks, I really appreciate it.
     
  6. sykozis

    sykozis Ancient Guru

    Messages:
    21,800
    Likes Received:
    1,056
    GPU:
    MSI RX5700
    The graphics card needs memory to perform operations, since that's where the data is stored that the GPU needs to be able to perform the requested operations. The more data you force into memory, the less is available to continue performing operations. The memory bandwidth is going to drop. Even an OpenCL based application would show this occuring. Being able to run from the CPU gives OpenCL the advantage of less overhead which would mean less drop in measured bandwidth.

    When you execute this "test", the instructions required for the operations are loaded into graphics memory. From there, the instructions are processed. For each function, more instructions have to be loaded into memory. When you flood the memory, there's no place to store the next set of instructions so the necessary space has to be flushed, which negatively affects memory bandwidth.

    The only way to avoid this would be for NVidia to partition the ram to give CUDA it's own, dedicated memory partition, which just isn't feasible on a consumer graphics card. For the Quadro line it might be, but it'll provide limited benefit to consumers compared to the cost associated with doing such.
     
    Last edited: Jan 23, 2015
  7. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,810
    Likes Received:
    3,363
    GPU:
    6900XT+AW@240Hz
    This overhead is extremely small in comparison to sizes it allocates. And test is smart in initial way where it allocates memory regions as it stops once allocation fails.
    That is why I want o free post allocation some memory. Test will still run on same blocks, so results should not be affected.

    And there is one last thing someone could have missed. Running test while low vram allocation (only desktop) so it can get most. But monitor vram and system ram usage, as performance drop may be caused by some emergency allocation from system ram.
     
  8. -Tj-

    -Tj- Ancient Guru

    Messages:
    17,182
    Likes Received:
    1,928
    GPU:
    Zotac GTX980Ti OC
    Well mine apparently also drops ~ 2560mb by both, but I surpassed this line all the time and performance doesn't drop, so idk how legit this test really is.


    EDIT: I see its using 128mb chunks could be something with that, what if it uses 64mb chunks? I would like to test that as well, but idk how to change that..
     
    Last edited: Jan 23, 2015
  9. JohnLai

    JohnLai Member Guru

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Are you using Zotac GTX 780 3GB GDDR5?
    Probably you didn't run the bench in headless mode for GPU.
    I doubt GTX 780 has issue in the first place though.
    Try unplug the HDMI/DVI cable from your GPU and plug it to your motherboard IGPU. Make sure your primary display is using the IGPU. Then re-run the bench, probably your model doesnt have any bandwidth drop issue. :)
     
  10. sykozis

    sykozis Ancient Guru

    Messages:
    21,800
    Likes Received:
    1,056
    GPU:
    MSI RX5700
    That's what Fox and I are discussing. He's got a good theory if you're able to test it and post back with results.

    In theory, the test should be able to fill all 4GB of ram without error. If you start with 1GB pre-allocated, it shouldn't crash. It should just loop around and fill the 1GB that was allocated when the test started. In your case, it would only be 3GB, but would still prove or disprove Fox's theory.
     

  11. Noisiv

    Noisiv Ancient Guru

    Messages:
    7,860
    Likes Received:
    1,183
    GPU:
    2070 Super
  12. alanm

    alanm Ancient Guru

    Messages:
    10,208
    Likes Received:
    2,365
    GPU:
    Asus 2080 Dual OC
    The test as it applies to the 970 is nothing by itself. Its when the 980 does same test but retains full bandwidth past the 3.2gb point and not choke like the 970 does. Havent run too many games since getting my 970 but FC4 is about as tough as it gets, and I run that maxed other than modest AA levels. Dont care at all if the card is gimped past the 3.2gb point, would probably have bought it anyway even if it was a 3gb card. Interesting to see what ManuelG comes up when he gets answers.
     
  13. alanm

    alanm Ancient Guru

    Messages:
    10,208
    Likes Received:
    2,365
    GPU:
    Asus 2080 Dual OC
  14. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,810
    Likes Received:
    3,363
    GPU:
    6900XT+AW@240Hz
    Here are chunk size definitions:
    Code:
    	int Float4Count = 8 * 1024 * 1024;
    int ChunkSize = Float4Count*sizeof(float4);
    return of function sizeof(float4) should be 16Bytes so if you change multiplication above for Float4Count to be 1/2 of what it is it will take chunks of 64MB. ( 4 * 1024 * 1024 )

    But still it will go on till it has as much memory as it can get.
    Then you have to change:
    Code:
    int BlockSize = 128;
    to 64, as otherwise benchmark part will try to pump 128MB into each 64MB chunk.

    This should be all.
     
  15. palvo23

    palvo23 Member

    Messages:
    14
    Likes Received:
    0
    GPU:
    MSI GTX970 4G OC
    I can tell you its not Aero.

    Odd thing is that when I turn all textures to Extra in CoD: AW, the game starts to microstutter. Vram usage hovers around 3400MB ish.

    However when I lower just one of the textures to high, the microstuttering is gone, and the VRAM usage seems to hover around 3100MB.

    I just plugged in 4G more Ram so I have 12G total Ram, I don't think that's the issue.

    Maybe it's just app specific problem. I don't quite have other games to test with, neither am I tech savvy about these problem, but I just want to enjoy stutter free gaming!

    Hopefully Nvidia has some answers soon :pc1:
     
    Last edited: Jan 23, 2015

  16. skacikpl

    skacikpl Master Guru

    Messages:
    344
    Likes Received:
    111
    GPU:
    MSI RTX 2070 8G
  17. -Tj-

    -Tj- Ancient Guru

    Messages:
    17,182
    Likes Received:
    1,928
    GPU:
    Zotac GTX980Ti OC
    Yeah 780 doesnt have this crashing or performance issue if I hit max 3040mb by games (Kombustor or that cube vram test), but this bandwidth drop is here non the less.. @ Win8.1, I can't test @ igpu port atm.

    [​IMG]



    I think its also the way it allocates these memory blocks, e.g. 128MiByte, if it used 64MiByte im sure the test would be a little different.


    Although kinda strange why 980GTX isn't affected like that.. If there is this SMX/256bit thing then this test really has to use 64MiByte blocks or its not really legit.


    24x128 3072
    23x128 2944 << imo it should use this not stop by 22 blocks, i closed FF and still ended at 22 blocks.

    47x64 3008


    @ Fox2232
    interesting, but idk how to input this in exe :D
     
    Last edited: Jan 23, 2015
  18. JohnLai

    JohnLai Member Guru

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Yeah, those last 2 chunks definitely caused by windows reserving gpu vram for desktop compositing.

    Don't fret about it. Once you can test it with IGPU as primary display, feel free to report back. :)

    Your current result is quite okay.
     
  19. -Tj-

    -Tj- Ancient Guru

    Messages:
    17,182
    Likes Received:
    1,928
    GPU:
    Zotac GTX980Ti OC
    Ah I see ok thanks. So this is not the case by 970gtx as well, windows allocation?

    I mean it apparently allocates 512mb for it? I get a drop @ 2560mb - 3072mb is = 512mb :nerd:
     
  20. JohnLai

    JohnLai Member Guru

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Ideally this bench should be run on GTX970/980 in headless mode to eliminate windows desktop compositing vram allocation as well as other vram overhead.

    I have seen few users who ran this bench in headless mode. GTX 980 = bandwidth maxed all the way to 4GB. GTX 970 = unfortunately, memory bandwidth dropped by large margin starting from range of 3.2GB - 3.5GB.

    This is not to say GTX970 can't use full 4GB, it just the memory bandwidth after that range is too slow.
     
Thread Status:
Not open for further replies.

Share This Page