970 memory allocation issue revisited

Discussion in 'Videocards - NVIDIA GeForce' started by alanm, Jan 23, 2015.

Thread Status:
Not open for further replies.
  1. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    I did miss it. And it is interesting. There are 6 regions which show this bad performance even while one of test had enough of free vram to accommodate any overhead caused by CUDA error.

    Interesting enough, In test where BF was running in background some bad regions were allocated before there was need to do so.
    And all bad regions had same performance in both tests 1x better than average, 3x average, 1 worse than average and 1 even worse than that.

    At this point I would say that this CUDA test is not defective and truly points towards some issue.

    If anyone has environment for compiling CUDA code, I will modify it to eat exactly 3GB of VRAM.
    And then victim should try to start in remaining (presumably bad) memory region some game even small which needs like 500MB of VRAM.
    Should prove for sure that there is something very bad.
     
    Last edited: Jan 23, 2015
  2. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    I am sure it is not due to excessive allocation from other applications or OS.
    Because this bench basically asks for 128MB of vram again and again till there is refusal error. If you preallocate 2 of 3GB then this test will eat only last 1GB.
     
  3. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    @skacikpl: Can you please rerun bench again with and without BF killed. But have game running in windowed mode. As once Fullscreen games are moved to tray v ram is quite often unloaded to system ram.
     
    Last edited: Jan 23, 2015
  4. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Eh? Turn out Nai has updated his source code at post #20.

    http://www.computerbase.de/forum/showthread.php?t=1435408&p=16873496#post16873496

    It is hidden by spoiler tag.
     

  5. Pill Monster

    Pill Monster Banned

    Messages:
    25,211
    Likes Received:
    9
    GPU:
    7950 Vapor-X 1100/1500
    I'm pretty sure Composition Mode (DWM) can't be disabled in W8. Might want to double check on that though.
     
  6. skacikpl

    skacikpl Maha Guru

    Messages:
    1,227
    Likes Received:
    609
    GPU:
    Inno3D RTX 4090
    I'll do that a bit later.
     
  7. sykozis

    sykozis Ancient Guru

    Messages:
    22,492
    Likes Received:
    1,537
    GPU:
    Asus RX6700XT
    -TJ- ran the same test on a GTX780 (a cut-down GK200 based card) with similar results, which indicates that this may be normal behavior for cards using "cut-down" GPUs.

    @-TJ-, I appreciate you taking the time to run the test and post the results. I agree, smaller blocks very well may show different behavior.

    I know some people are going to argue against using a Kepler based card in this "testing", but to eliminate previous generation hardware is to admit that no problem exists. You have to have a "control group" in any real testing method if your goal is to actually prove anything. Now, if we can get a 470 and 670 involved that show the same behavior, there's no reason to continue this thread.
     
    Last edited: Jan 23, 2015
  8. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    I kinda agree, but the control group must run the bench in headless mode!
     
  9. FDisk

    FDisk Guest

    Messages:
    766
    Likes Received:
    0
    GPU:
    ASUS STRIX GTX970 OC 4GB
    Here are my results with ASUS STRIX GTX970:

    Code:
    Nai's Benchmark
    Allocating Memory . . .
    Chunk Size = 128 MiByte
    Press any key to continue . . .
    Allocated 30 Chunks
    Allocated 3840 MiByte
    Benchmarking DRAM
    Press any key to continue . . .
    DRAM-Bandwidth of Chunk no. 0 (0 MiByte to 128 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 1 (128 MiByte to 256 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 2 (256 MiByte to 384 MiByte): 153 GByte/s
    DRAM-Bandwidth of Chunk no. 3 (384 MiByte to 512 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 4 (512 MiByte to 640 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 5 (640 MiByte to 768 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 6 (768 MiByte to 896 MiByte): 153 GByte/s
    DRAM-Bandwidth of Chunk no. 7 (896 MiByte to 1024 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 8 (1024 MiByte to 1152 MiByte): 153 GByte/s
    DRAM-Bandwidth of Chunk no. 9 (1152 MiByte to 1280 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 10 (1280 MiByte to 1408 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 11 (1408 MiByte to 1536 MiByte): 154 GByte/s
    DRAM-Bandwidth of Chunk no. 12 (1536 MiByte to 1664 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 13 (1664 MiByte to 1792 MiByte): 153 GByte/s
    DRAM-Bandwidth of Chunk no. 14 (1792 MiByte to 1920 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 15 (1920 MiByte to 2048 MiByte): 153 GByte/s
    DRAM-Bandwidth of Chunk no. 16 (2048 MiByte to 2176 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 17 (2176 MiByte to 2304 MiByte): 153 GByte/s
    DRAM-Bandwidth of Chunk no. 18 (2304 MiByte to 2432 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 19 (2432 MiByte to 2560 MiByte): 153 GByte/s
    DRAM-Bandwidth of Chunk no. 20 (2560 MiByte to 2688 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 21 (2688 MiByte to 2816 MiByte): 153 GByte/s
    DRAM-Bandwidth of Chunk no. 22 (2816 MiByte to 2944 MiByte): 155 GByte/s
    DRAM-Bandwidth of Chunk no. 23 (2944 MiByte to 3072 MiByte): 27 GByte/s
    DRAM-Bandwidth of Chunk no. 24 (3072 MiByte to 3200 MiByte): 22 GByte/s
    DRAM-Bandwidth of Chunk no. 25 (3200 MiByte to 3328 MiByte): 22 GByte/s
    DRAM-Bandwidth of Chunk no. 26 (3328 MiByte to 3456 MiByte): 22 GByte/s
    DRAM-Bandwidth of Chunk no. 27 (3456 MiByte to 3584 MiByte): 7 GByte/s
    DRAM-Bandwidth of Chunk no. 28 (3584 MiByte to 3712 MiByte): 6 GByte/s
    DRAM-Bandwidth of Chunk no. 29 (3712 MiByte to 3840 MiByte): 6 GByte/s
    Press any key to continue . . .
    Benchmarking L2-Cache
    Press any key to continue . . .
    L2-Cache-Bandwidth of Chunk no. 0 (0 MiByte to 128 MiByte): 348 GByte/s
    L2-Cache-Bandwidth of Chunk no. 1 (128 MiByte to 256 MiByte): 344 GByte/s
    L2-Cache-Bandwidth of Chunk no. 2 (256 MiByte to 384 MiByte): 347 GByte/s
    L2-Cache-Bandwidth of Chunk no. 3 (384 MiByte to 512 MiByte): 344 GByte/s
    L2-Cache-Bandwidth of Chunk no. 4 (512 MiByte to 640 MiByte): 344 GByte/s
    L2-Cache-Bandwidth of Chunk no. 5 (640 MiByte to 768 MiByte): 347 GByte/s
    L2-Cache-Bandwidth of Chunk no. 6 (768 MiByte to 896 MiByte): 359 GByte/s
    L2-Cache-Bandwidth of Chunk no. 7 (896 MiByte to 1024 MiByte): 355 GByte/s
    L2-Cache-Bandwidth of Chunk no. 8 (1024 MiByte to 1152 MiByte): 356 GByte/s
    L2-Cache-Bandwidth of Chunk no. 9 (1152 MiByte to 1280 MiByte): 358 GByte/s
    L2-Cache-Bandwidth of Chunk no. 10 (1280 MiByte to 1408 MiByte): 354 GByte/s
    L2-Cache-Bandwidth of Chunk no. 11 (1408 MiByte to 1536 MiByte): 356 GByte/s
    L2-Cache-Bandwidth of Chunk no. 12 (1536 MiByte to 1664 MiByte): 359 GByte/s
    L2-Cache-Bandwidth of Chunk no. 13 (1664 MiByte to 1792 MiByte): 358 GByte/s
    L2-Cache-Bandwidth of Chunk no. 14 (1792 MiByte to 1920 MiByte): 355 GByte/s
    L2-Cache-Bandwidth of Chunk no. 15 (1920 MiByte to 2048 MiByte): 359 GByte/s
    L2-Cache-Bandwidth of Chunk no. 16 (2048 MiByte to 2176 MiByte): 358 GByte/s
    L2-Cache-Bandwidth of Chunk no. 17 (2176 MiByte to 2304 MiByte): 355 GByte/s
    L2-Cache-Bandwidth of Chunk no. 18 (2304 MiByte to 2432 MiByte): 356 GByte/s
    L2-Cache-Bandwidth of Chunk no. 19 (2432 MiByte to 2560 MiByte): 358 GByte/s
    L2-Cache-Bandwidth of Chunk no. 20 (2560 MiByte to 2688 MiByte): 385 GByte/s
    L2-Cache-Bandwidth of Chunk no. 21 (2688 MiByte to 2816 MiByte): 395 GByte/s
    L2-Cache-Bandwidth of Chunk no. 22 (2816 MiByte to 2944 MiByte): 398 GByte/s
    L2-Cache-Bandwidth of Chunk no. 23 (2944 MiByte to 3072 MiByte): 81 GByte/s
    L2-Cache-Bandwidth of Chunk no. 24 (3072 MiByte to 3200 MiByte): 66 GByte/s
    L2-Cache-Bandwidth of Chunk no. 25 (3200 MiByte to 3328 MiByte): 66 GByte/s
    L2-Cache-Bandwidth of Chunk no. 26 (3328 MiByte to 3456 MiByte): 66 GByte/s
    L2-Cache-Bandwidth of Chunk no. 27 (3456 MiByte to 3584 MiByte): 7 GByte/s
    L2-Cache-Bandwidth of Chunk no. 28 (3584 MiByte to 3712 MiByte): 6 GByte/s
    L2-Cache-Bandwidth of Chunk no. 29 (3712 MiByte to 3840 MiByte): 6 GByte/s
    Press any key to continue . . .
    Ouch! That drop :3eyes:
     
  10. UZ7

    UZ7 Ancient Guru

    Messages:
    5,537
    Likes Received:
    74
    GPU:
    nVidia RTX 4080 FE
    I do notice when I play Unity that it hovers no more than 3500, any changes in AA will cause it to use more and the fps drastically goes down to a crawl.
     

  11. sykozis

    sykozis Ancient Guru

    Messages:
    22,492
    Likes Received:
    1,537
    GPU:
    Asus RX6700XT
    I'm not here to prove or disprove anything. I actually have no stake in the outcome of these "tests". I'm simply here to see how things unfold and assist in getting to a final outcome. The games I most commonly play, aren't affected in the least as they use less than 1GB of VRAM.

    Not true at all. The point of a control group is simply to maintain a constant condition. As long as all "questionable" cards are run under the same conditions as the "control group" then the test is valid. We should run the test under both conditions to see how things shape up though.

    I will say that it should be done after a reboot though and not after the system has been run through a plethora of games. We do want VRAM as clean as possible outside of testing scenarios presented by Fox where he's asking for a % of memory to be pre-allocated to see how the "benchmark" fairs.

    Adding AA will reduce framerates anyways. That's been a constant over the years.
     
    Last edited: Jan 23, 2015
  12. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Oh....I see your point.

    But who is going to design a program that can pre-allocate the exact required memory before running the bench?
     
  13. sykozis

    sykozis Ancient Guru

    Messages:
    22,492
    Likes Received:
    1,537
    GPU:
    Asus RX6700XT
    We don't really have to be "exact" here. Reasonably close will work. As long as those testing can stay within an acceptable margin of the pre-allocation limit that Fox is asking for, the results should be viable. If Fox is asking for 1GB, 1.5+ GB would skew the results but 900MB - 1.1GB would be viable.
     
  14. UZ7

    UZ7 Ancient Guru

    Messages:
    5,537
    Likes Received:
    74
    GPU:
    nVidia RTX 4080 FE
    Oh I know that, its pretty obvious lol.. But if the game runs at 50-60FPS and when it starts using more than 3500 ram and it hits 5-11FPS then its something else? :p Granted Unity is not the greatest game to test as the game is buggy in itself.
     
  15. nanogenesis

    nanogenesis Guest

    Messages:
    1,288
    Likes Received:
    6
    GPU:
    MSI R9 390X 1178|6350
    My 970 put through the test.

    Code:
    Nai's Benchmark
    Allocating Memory . . .
    Chunk Size: 128 MiByte
    Allocated 30 Chunks
    Allocated 3840 MiByte
    Benchmarking DRAM
    DRAM-Bandwidth of Chunk no. 0 (0 MiByte to 128 MiByte):190.08 GByte/s
    DRAM-Bandwidth of Chunk no. 1 (128 MiByte to 256 MiByte):189.87 GByte/s
    DRAM-Bandwidth of Chunk no. 2 (256 MiByte to 384 MiByte):189.90 GByte/s
    DRAM-Bandwidth of Chunk no. 3 (384 MiByte to 512 MiByte):190.43 GByte/s
    DRAM-Bandwidth of Chunk no. 4 (512 MiByte to 640 MiByte):190.01 GByte/s
    DRAM-Bandwidth of Chunk no. 5 (640 MiByte to 768 MiByte):190.37 GByte/s
    DRAM-Bandwidth of Chunk no. 6 (768 MiByte to 896 MiByte):190.35 GByte/s
    DRAM-Bandwidth of Chunk no. 7 (896 MiByte to 1024 MiByte):190.26 GByte/s
    DRAM-Bandwidth of Chunk no. 8 (1024 MiByte to 1152 MiByte):189.91 GByte/s
    DRAM-Bandwidth of Chunk no. 9 (1152 MiByte to 1280 MiByte):190.06 GByte/s
    DRAM-Bandwidth of Chunk no. 10 (1280 MiByte to 1408 MiByte):190.24 GByte/s
    DRAM-Bandwidth of Chunk no. 11 (1408 MiByte to 1536 MiByte):190.30 GByte/s
    DRAM-Bandwidth of Chunk no. 12 (1536 MiByte to 1664 MiByte):190.46 GByte/s
    DRAM-Bandwidth of Chunk no. 13 (1664 MiByte to 1792 MiByte):190.03 GByte/s
    DRAM-Bandwidth of Chunk no. 14 (1792 MiByte to 1920 MiByte):190.00 GByte/s
    DRAM-Bandwidth of Chunk no. 15 (1920 MiByte to 2048 MiByte):190.12 GByte/s
    DRAM-Bandwidth of Chunk no. 16 (2048 MiByte to 2176 MiByte):190.35 GByte/s
    DRAM-Bandwidth of Chunk no. 17 (2176 MiByte to 2304 MiByte):189.85 GByte/s
    DRAM-Bandwidth of Chunk no. 18 (2304 MiByte to 2432 MiByte):189.93 GByte/s
    DRAM-Bandwidth of Chunk no. 19 (2432 MiByte to 2560 MiByte):190.30 GByte/s
    DRAM-Bandwidth of Chunk no. 20 (2560 MiByte to 2688 MiByte):189.69 GByte/s
    DRAM-Bandwidth of Chunk no. 21 (2688 MiByte to 2816 MiByte):190.08 GByte/s
    DRAM-Bandwidth of Chunk no. 22 (2816 MiByte to 2944 MiByte):190.47 GByte/s
    DRAM-Bandwidth of Chunk no. 23 (2944 MiByte to 3072 MiByte):190.26 GByte/s
    DRAM-Bandwidth of Chunk no. 24 (3072 MiByte to 3200 MiByte):189.55 GByte/s
    DRAM-Bandwidth of Chunk no. 25 (3200 MiByte to 3328 MiByte):58.57 GByte/s
    DRAM-Bandwidth of Chunk no. 26 (3328 MiByte to 3456 MiByte):28.13 GByte/s
    DRAM-Bandwidth of Chunk no. 27 (3456 MiByte to 3584 MiByte):28.13 GByte/s
    DRAM-Bandwidth of Chunk no. 28 (3584 MiByte to 3712 MiByte):28.13 GByte/s
    DRAM-Bandwidth of Chunk no. 29 (3712 MiByte to 3840 MiByte):21.82 GByte/s
    Benchmarking L2-Cache
    L2-Cache-Bandwidth of Chunk no. 0 (0 MiByte to 128 MiByte):487.12 GByte/s
    L2-Cache-Bandwidth of Chunk no. 1 (128 MiByte to 256 MiByte):486.94 GByte/s
    L2-Cache-Bandwidth of Chunk no. 2 (256 MiByte to 384 MiByte):487.03 GByte/s
    L2-Cache-Bandwidth of Chunk no. 3 (384 MiByte to 512 MiByte):487.11 GByte/s
    L2-Cache-Bandwidth of Chunk no. 4 (512 MiByte to 640 MiByte):486.95 GByte/s
    L2-Cache-Bandwidth of Chunk no. 5 (640 MiByte to 768 MiByte):487.33 GByte/s
    L2-Cache-Bandwidth of Chunk no. 6 (768 MiByte to 896 MiByte):487.00 GByte/s
    L2-Cache-Bandwidth of Chunk no. 7 (896 MiByte to 1024 MiByte):487.12 GByte/s
    L2-Cache-Bandwidth of Chunk no. 8 (1024 MiByte to 1152 MiByte):487.17 GByte/s
    L2-Cache-Bandwidth of Chunk no. 9 (1152 MiByte to 1280 MiByte):487.12 GByte/s
    L2-Cache-Bandwidth of Chunk no. 10 (1280 MiByte to 1408 MiByte):486.99 GByte/s
    L2-Cache-Bandwidth of Chunk no. 11 (1408 MiByte to 1536 MiByte):487.20 GByte/s
    L2-Cache-Bandwidth of Chunk no. 12 (1536 MiByte to 1664 MiByte):487.05 GByte/s
    L2-Cache-Bandwidth of Chunk no. 13 (1664 MiByte to 1792 MiByte):487.06 GByte/s
    L2-Cache-Bandwidth of Chunk no. 14 (1792 MiByte to 1920 MiByte):486.89 GByte/s
    L2-Cache-Bandwidth of Chunk no. 15 (1920 MiByte to 2048 MiByte):487.28 GByte/s
    L2-Cache-Bandwidth of Chunk no. 16 (2048 MiByte to 2176 MiByte):487.09 GByte/s
    L2-Cache-Bandwidth of Chunk no. 17 (2176 MiByte to 2304 MiByte):487.18 GByte/s
    L2-Cache-Bandwidth of Chunk no. 18 (2304 MiByte to 2432 MiByte):486.88 GByte/s
    L2-Cache-Bandwidth of Chunk no. 19 (2432 MiByte to 2560 MiByte):487.04 GByte/s
    L2-Cache-Bandwidth of Chunk no. 20 (2560 MiByte to 2688 MiByte):487.17 GByte/s
    L2-Cache-Bandwidth of Chunk no. 21 (2688 MiByte to 2816 MiByte):486.97 GByte/s
    L2-Cache-Bandwidth of Chunk no. 22 (2816 MiByte to 2944 MiByte):487.18 GByte/s
    L2-Cache-Bandwidth of Chunk no. 23 (2944 MiByte to 3072 MiByte):486.95 GByte/s
    L2-Cache-Bandwidth of Chunk no. 24 (3072 MiByte to 3200 MiByte):487.02 GByte/s
    L2-Cache-Bandwidth of Chunk no. 25 (3200 MiByte to 3328 MiByte):174.77 GByte/s
    L2-Cache-Bandwidth of Chunk no. 26 (3328 MiByte to 3456 MiByte):87.34 GByte/s
    L2-Cache-Bandwidth of Chunk no. 27 (3456 MiByte to 3584 MiByte):87.34 GByte/s
    L2-Cache-Bandwidth of Chunk no. 28 (3584 MiByte to 3712 MiByte):87.34 GByte/s
    L2-Cache-Bandwidth of Chunk no. 29 (3712 MiByte to 3840 MiByte):27.86 GByte/s
    Press any key to continue . . .
    
    Windows 7 headless.
     

  16. VultureX

    VultureX Banned

    Messages:
    2,577
    Likes Received:
    0
    GPU:
    MSI GTX970 SLI
    I could compile it and give it some run-time arugments to specify block size, however I found out that this http://www.computerbase.de/forum/showthread.php?t=1435408&p=16868213#post16868213 is not the latest version of the source code.

    Where is the version with the additions for readability?
     
  17. Russ369

    Russ369 Guest

    Messages:
    621
    Likes Received:
    0
    GPU:
    EVGA GTX 1070 FE
    Just ran the bench, get the same drop... GG fellas
     
  18. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    I only know post #34 has latest version.

    Post #20 for the source code, not sure if that the latest, it is hidden under spoiler tag.
     
  19. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    So, nanogenesis, do you have gtx 980 for testing (maybe loan it from your friend? XD) so that we may have comparison between 970 and 980?
     
  20. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    If anyone has CUDA compiler, I can simply modify Nai's code to reserve desired amount.
    I would like to really see pre-allocated 3GB and instead of benching running game in remaining part.
    (something small to fit around 500MB)

    And btw, those changes Nai made there are mainly to increase accuracy of bench (number of cycles) and changed benching method a bit in those elements where I was not sure about (this stays, I simply do not know inner CUDA workings).
     
Thread Status:
Not open for further replies.

Share This Page