970 memory allocation issue revisited

Discussion in 'Videocards - NVIDIA GeForce' started by alanm, Jan 23, 2015.

Thread Status:
Not open for further replies.
  1. 4KOLED

    4KOLED Guest

    Messages:
    20
    Likes Received:
    0
    GPU:
    SLI GTX 970 EVGA FTW
    sorry i dont wanna read the 28 pages, but did you notice the same issue with 980 ?

    cause someone does...
     
  2. sykozis

    sykozis Ancient Guru

    Messages:
    22,492
    Likes Received:
    1,537
    GPU:
    Asus RX6700XT
    It's actually a drawback of the architecture and how NVidia chose to design/implement the memory controller. When they "cut-down" the GPU, it loses memory bandwidth and a portion of the memory will become "slower" as a result. The memory bus is in fact 256bit. The memory controller simply isn't able to use the full bandwidth on a small portion of the memory due to SMM's being disabled.
     
  3. Monchis

    Monchis Guest

    Messages:
    1,303
    Likes Received:
    36
    GPU:
    GTX 950
    Maybe they could make a test driver that uses those 500mb as first resource to calm the paranoia?. :)

    Or maybe be more clear in their specs next time around?.
     
    Last edited: Jan 25, 2015
  4. sykozis

    sykozis Ancient Guru

    Messages:
    22,492
    Likes Received:
    1,537
    GPU:
    Asus RX6700XT
    Exactly what "issue" are you referring to?
     

  5. 4KOLED

    4KOLED Guest

    Messages:
    20
    Likes Received:
    0
    GPU:
    SLI GTX 970 EVGA FTW
    the same story about vram slowing down from reaching ~ 3.5 GB vram
     
  6. sykozis

    sykozis Ancient Guru

    Messages:
    22,492
    Likes Received:
    1,537
    GPU:
    Asus RX6700XT
    The GTX460 768mb cards had the same behavior. They'd fill the first 512mb of VRAM and wouldn't touch the last 256mb unless absolutely necessary because it was slower. They used a 128+64bit implementation to get the "192bit" memory bus on those cards. Nobody made a huge fuss over it either....

    Read post #690 from Hilbert Hagedoorn. It's explained there.
     
  7. Monchis

    Monchis Guest

    Messages:
    1,303
    Likes Received:
    36
    GPU:
    GTX 950
    I know of a couple of forums where they made huge fuss about the gtx660.
     
  8. sykozis

    sykozis Ancient Guru

    Messages:
    22,492
    Likes Received:
    1,537
    GPU:
    Asus RX6700XT
    Similar design there to the GTX460 768mb cards. In the defense of the GTX660 though, it was crap to begin with. Most of just upgraded from cards slow enough to make the 660 look better than it really was. In my case, it was 10-50% slower than the 560Ti I replaced in most instances.
     
  9. 4KOLED

    4KOLED Guest

    Messages:
    20
    Likes Received:
    0
    GPU:
    SLI GTX 970 EVGA FTW
    Hilbert says over 4Gb for the 980, but i found a 980 which has a slowed down bandwith from 3.5 GB, is the first or u already knew it ?
     
  10. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Sure. But it leaves quite some uncertainty, since utilization was at max for only 2 points and rest was quite below.

    But to add actual value, since only one person went to create my synthetic scenario to see possible impact (not regular one), I'll show some math.

    GTX970 has official memory bandwidth of 224GB/s.
    Nai's Bench shows 150GB/s for fast part & from 6 to 22GB/s for slow part. Normalizing this to:
    224GB/s for fast & 21GB/s for slow address space.
    Scenario 1: Big game uses 3.2GB, but per frame uses only 768MB (0.75GB)
    - best case scenario: 0.75 / 224 = 0.00335 ~ max achievable fps = 298.67
    - worst case scenario: 0.5 / 21 + 0.25 / 224 = 0.0249 ~ max achievable fps = 40.12
    - Average scenario: 0.09375 / 21 + 0.65625 / 224 = 0.0074 ~ max achievable fps = 135

    Scenario 2: very small game utilizing 0.75GB, but per frame using exactly 0.5GB (small arena game)
    - best case: 0.5 / 224 = 0.002232 ~ max fps = 448
    - worst case: 0.5 / 21 = 0.02381 ~ max fps = 42
    - average case: 0.0625 / 21 + 0.4375 / 224 = 0.004929 ~ max fps = 203

    Explanation for average case:
    Since only 0.5GB out of 4 is slow, we have (if random distribution applies) 12.5% change to put data into slow region.

    And as you can see, since performance of games which utilize on their own 3.5GB of vram is already in range of 25 to 45 fps, impact of this slow part is minimal.

    I believe nVidia did best thing they could by separating address spaces this way. Because otherwise there would be cases where perfectly fine running games end up with huge framerate drops. Imagine going from 120 to 40fps with no apparent visual effects going on screen, just by entering room full of slow textures.
     

  11. rflair

    rflair Don Coleus Staff Member

    Messages:
    4,854
    Likes Received:
    1,725
    GPU:
    5700XT
    The first paragraph makes no sense at all.

    The second is naturally how the chip is configured. The GTX980 is a full GK204 configuration, the GTX 970 will never be better.

    The last part makes no sense, it is a 256bit card, look at the chip layout. The thing is Nvidia ties the SMM/Cuda cores to be allocated a certain amount of memory each, this is where the memory segmentation happens. The thing that has me wondering is how the segmented memory is used; is it one large block of memory, or little packets?
     
  12. alanm

    alanm Ancient Guru

    Messages:
    12,235
    Likes Received:
    4,437
    GPU:
    RTX 4080
    Someone elsewhere posted this on what may be going on, that the card can "load 4GB but only 3.5 can be sent to the SMMs, the data in the remaining 0.5GB cant be sent for execution to the SMMs used for executing the datas that are within the 3.5GB, it s only a theory but the GPU design and the presence of a separated partition point to this implementation, what transpire is that each SMM has a fixed adress space in the RAM and that data meant to be executed by a given SMM must be retired in the relevant adress space, if this given SMM is disabled there s no mean to send the data to another SMM, so as said the whole 4GB is adressable but only 3.5GB can be executed by the GPU computing units..."

    Which in effect may indeed make it a 3.5gb card. Although I still dont believe that is the cause of crippled performance in examples others have presented here and elsewhere.
     
  13. alanm

    alanm Ancient Guru

    Messages:
    12,235
    Likes Received:
    4,437
    GPU:
    RTX 4080
    It may have, but its probably due to very highly set settings that can slow things down regardless of vram usage.
     
  14. Monchis

    Monchis Guest

    Messages:
    1,303
    Likes Received:
    36
    GPU:
    GTX 950
    Your 560ti must have been a golden monster overclocker I guess.

    Anyway, using the latest 500mb until the first 3500mb start choking doesn´t look like the most seamless method out there. So maybe in some configurations it gets stuck?, in some videos looks like it gets stuck around 3.5gb.
     
  15. Fox2232

    Fox2232 Guest

    Messages:
    11,808
    Likes Received:
    3,371
    GPU:
    6900XT+AW@240Hz
    Driver/card tries to prevent applications from using more than 3.5GB, that is reason why people started this investigation in 1st place. They saw games with same settings on 980 vs 970 and noticed that 970 had 3.5GB used while 980 always went over this limit.
     

  16. 4KOLED

    4KOLED Guest

    Messages:
    20
    Likes Received:
    0
    GPU:
    SLI GTX 970 EVGA FTW
    Just by doing the Nai's test ....
     
  17. bwat47

    bwat47 Guest

    Messages:
    79
    Likes Received:
    0
    GPU:
    MSI GTX 970
  18. sykozis

    sykozis Ancient Guru

    Messages:
    22,492
    Likes Received:
    1,537
    GPU:
    Asus RX6700XT
    My 560Ti OC'd to 1050mhz with only .025V increase. The card was rock solid. Then NVidia released drivers to limit overclocking at which point the card wasn't even stable at it's default clocks. (It was a Gigabyte GTX560Ti OC)

    If that last 500mb couldn't be accessed for processing, we'd be seeing graphical anomalies from where graphics data is missing.
     
  19. alanm

    alanm Ancient Guru

    Messages:
    12,235
    Likes Received:
    4,437
    GPU:
    RTX 4080
    Oh. But the Nai test has proven inconsistent with many other non-970 cards, incl 980s, 780s. Mostly depending on how you run it, ie, headless/IGP mode.
     
  20. Raider0001

    Raider0001 Master Guru

    Messages:
    521
    Likes Received:
    45
    GPU:
    RX 7900 XTX Nitro+
    If GTX 960 is exactly half of the 980, why nvidia didn't cut 970 to: 3GB, 192bit, 1536 cuda ?
     
Thread Status:
Not open for further replies.

Share This Page