PCIe 6.0 Specification finalized in 2021 and 4 times faster than PCIe 4.0

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Jun 11, 2020.

  1. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,748
    Likes Received:
    2,193
    GPU:
    HIS R9 290
    It really boggles my mind how stubborn you are and how much you feed into my points. Loading data from the background is another way to reduce bandwidth, because you're spreading out the load.
    If you actually had any real experience with this kind of stuff, you'd realize that real-world results (techpowerup.com has them, in case you're interested) show that you can get very good results without 16 lanes. Not perfect results, but very good ones. And this is based on the aging PCIe 3.0.

    I'm not pulling my claims out of my ass, the data is there. Again, servers are a bit of an exception.
     
  2. Carfax

    Carfax Ancient Guru

    Messages:
    2,915
    Likes Received:
    465
    GPU:
    NVidia Titan Xp
    Well if that is what he's saying, he's quite welcome to put his GPU into a x8 slot while myself and many others will continue to plug it into a x16 slot.
     
  3. Carfax

    Carfax Ancient Guru

    Messages:
    2,915
    Likes Received:
    465
    GPU:
    NVidia Titan Xp
    What do you think PCIe is? PCIe fulfills the same function as NVLink, it's just much slower. Do you even realize that AMD uses PCIe to connect their GPUs in crossfire or DX12 multi GPU mode?

    This is exactly why NVLink and other technologies like it were developed. PCIe was just too slow for doing serious work with GPUs. But this doesn't negate the fact that PCIe has the same basic functionality as NVLink.

    I'll say it again. PCIe is the main bus protocol standard for PCs and allows data access and transfer between the CPU and GPU, but it is inadequate for doing commercial and or scientific work. Which is why with every generation, they target a doubling of the speed.

    Nobody can say for sure what the future will bring. The consoles have finally gotten off of HDDs, and that alone will bring a ton of changes to how developers code their engines/games.

    One of the main reasons why GPUs are so slow to saturate older PCIe standards is because games and game engines have been designed around the lowest common denominator for decades....the HDD.

    Ever played a game and wondered why you are still getting texture and object pop in very close to the camera, and you look at your VRAM usage and see that only a third of it is even being utilized? That's because game engines and games are programmed around the use of HDDs and slowly stream in data. Now that both Sony and Microsoft have switched over to SSDs, this is going to create a new paradigm that is going to have a huge impact on game and engine development because data will be able to be streamed in at much faster rates which will definitely increase the utilization of the PCIe 16x bandwidth.

    So the assumptions of the past, may no longer be relevant for the future.
     
  4. Aura89

    Aura89 Ancient Guru

    Messages:
    8,157
    Likes Received:
    1,274
    GPU:
    -
    Literally this comment makes zero sense, but okay. You do you.

    You really seem to be fighting yourself here you keep replying to everyone with statements or topics they didn't make.

    Like your quote here:


    He specifically talks about how would adding more lanes help with latency, and you replied specifically to that, talking about the speed of new generations doubling.

    Completely different topics yet you're the one trying to argue statements that aren't made.

    I don't get it, i don't know if you're just trying to argue to argue or what.
     
    Last edited: Jun 16, 2020
    schmidtbag and Fox2232 like this.

  5. MyEinsamkeit

    MyEinsamkeit Member Guru

    Messages:
    180
    Likes Received:
    67
    GPU:
    Radeon Pro W5500
    PCIE 3.0 runs every game perfectly fine, not sure why they are even upgrading to 6.0, or even bother to do 4.0 . I don't get it.
     
  6. Aura89

    Aura89 Ancient Guru

    Messages:
    8,157
    Likes Received:
    1,274
    GPU:
    -
    PCI-Express isn't only about GPUs, not sure why you're confused there.
     
  7. Carfax

    Carfax Ancient Guru

    Messages:
    2,915
    Likes Received:
    465
    GPU:
    NVidia Titan Xp
    Serious question but are you his brother or something? He's a grown man I'm sure he doesn't need you white knighting on his behalf.

    And you don't even bother pointing out any of his inconsistencies I see.

    That's because you haven't been following the discussion. You're just taking isolated quotes of what I've said, without looking at the greater context of the overall discussion which has been occurring over several posts.

    schmidtbag seems to be confused on what PCIe does in comparison to NVLink. He seemingly thinks the latter requires huge amounts of bandwidth because the GPUs can share VRAM. However, PCIe also allows GPUs to share and mirror each other's memory.

    PCIe has the same core functionality as NVLink, and vice versa.
     
  8. DmitryKo

    DmitryKo Master Guru

    Messages:
    376
    Likes Received:
    122
    GPU:
    ASUS RX 5700 XT TUF
    Now test it on a PCIe 1.0 motherboard and a GPU with 2 GBytes of video RAM.

    Only if you can hold every resource in VRAM, but unfortunately graphics memory isn't as cheap as system RAM and there's not going to be enough fast dedicated video RAM to hold everything. Video memory size has been 1/4th of system memory for the last 15 years, and shared system pool is typically twice the size of dedicated video pool (limited to half of system RAM).

    Storage is not the bottleneck for rendering, since you don't stream resources from disk for each frame. That's a PC, not a gaming console. You pre-load your resources into system memory (to let virtual memory manager handle disk paging) then move them into graphics memory - either dedicated video (fast) or system shared (slow) memory, according to your usage patterns. That's how things were on the PC since at least AGP 2.0 and Direct3D 7.

    SLI as a synonym for split-frame or alternate-frame rendering was made obsolete by Direct3D12 and WDDM 2.0 and was replaced with explicit multi-GPU and adapter nodes for similar adapters. It has become too hard for vendors to effeciently balance the workloads in their drivers.
    SLI as a multi-adapter configuration is still supported and you can install two RTX 2000-series cards and connect them with an NVLink bridge, or two GTX 1000-series cards with an SLI-HB bridge.

    I have not said anything like that. That's a discrete GPU, not a gaming console with unified memory. Streaming from slow NVMe disk with (~3 GByte/s sequential) doesn't make sense when you can use system shared memory with much better throughput (31.5 GByte/s over PCIe 4.0). This won't avoid PCIe bus either.

    I understand that it's just some arbitrary assumption which you were unable to support with convincing technical arguments.
     
    Last edited: Jun 16, 2020
    Fox2232 and Carfax like this.
  9. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,748
    Likes Received:
    2,193
    GPU:
    HIS R9 290
    lol no... it really doesn't function the same at all. NVLink is functionally more akin to PCI than PCIe, and even then, not really.
    I am well aware AMD uses PCIe for crossfire. I suppose you think crossfire functions the same way as NVLink? Because it doesn't. AMD got away with using PCIe lanes because there just needs to be enough communication between the GPUs to make sure they're synced with the same task at the same time. Why use PCIe instead of the old bridge? Because there was excess bandwidth. And... probably because it was cheaper.
    wtf is your point here? This argument isn't about NVLink. We're not disagreeing on anything [significant] about NVLink.
    The argument at hand is that PCIe 6.0 @ x16 is unnecessary. Sure, it has comparable performance to NVLink (at leas in one direction), but you're an idiot if you think all of that bandwidth is freely available to the GPUs, because a lot will be saturated by communication to the CPU. It's still not enough bandwidth.
    By the time PCIe 7.0 is released, Nvidia is likely going to have products that still demand even more bandwidth.
    Nvidia spent a lot of money on NVLink because they knew PCIe was never going to work out for them.
    That's a load of crap that you otherwise need to prove. Last I checked (today), the fastest NICs available supply 40Gbps. These are both physical and electrical x8 slots. Most server-based GPU workloads that need to share data would do so through something like NVLink. Of the ones that don't, they tend to have a heavy enough workload that their communication over PCIe (while running) is minimal; they just need a burst of bandwidth to get started.
    Regardless, we are getting close to saturating PCIe 3.0 @ x16 and servers tend to have fancier hardware, so currently it makes sense why they need the full lots. My argument does not speak on behalf of PCIe 3.0, I'm talking about generations down the line with 8x the performance per lane.
    Y'know why PCIe 3.0 lasted so long? Because we don't need so much bandwidth. The reason it's being phased out is because the lesser slots like x1 or M.2 need more bandwidth, not because x16 needs more bandwidth.
    Yes, and the new lowest common denominator for games will be SSDs... Storage is still the slowest.
    No... it isn't... Both the high-res and low-res textures are loaded into VRAM at the same time. The reason the textures change like that is to optimize GPU load. Textures don't make a big difference in processing performance but when you're loading 4K textures for billions of triangles off in the distance, that's going to overwhelm your GPU with no visual gain.
    What is a symptom of textures popping in due to slow HDDs is when games built on the Unreal 3 engine are first loading up a level. The Borderlands series is a great example of this for the first few seconds when you first load up a new area. Getting the game started sooner and loading up crappy textures is better than waiting longer for the full-res textures to load.
     
    Last edited: Jun 16, 2020
  10. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,748
    Likes Received:
    2,193
    GPU:
    HIS R9 290
    You are digging yourself deeper and deeper into not understanding what it is you're arguing here. The argument at hand is that future x16 slots will no longer be necessary. Existing ones obviously are, and I'd say even 4.0 @ x16 is necessary.
    You don't need to. VRAM seems to keep up with what games demand fairly well. Once GPUs are powerful enough, they can actually save VRAM by not loading in lower-res textures or lower-poly meshes.
    You're right - storage isn't the bottleneck for rendering. At that point the bottleneck is either VRAM or the GPU core. While the GPU is crunching numbers, the PCIe lanes go relatively quiet.
    You can, but you're not going to accomplish a whole lot in doing so since most applications are no longer properly optimized for it. You might as well not even bother.
    That doesn't make sense either. The only reason certain assets are loaded into system memory is because it needs to then be dumped into the GPU. You'll improve loading times and significantly reduce bandwidth if the GPU gets to access the data directly.
    Says the person who thinks that assets will still be loaded into system memory if the GPU has on-board storage...
     
    Last edited: Jun 16, 2020

  11. MyEinsamkeit

    MyEinsamkeit Member Guru

    Messages:
    180
    Likes Received:
    67
    GPU:
    Radeon Pro W5500
    feel free to explain it to me :)
     
  12. Aura89

    Aura89 Ancient Guru

    Messages:
    8,157
    Likes Received:
    1,274
    GPU:
    -
    Are you being serious?

    If so, just look through this very thread, you'll find plenty of information.
     
    MyEinsamkeit likes this.
  13. MyEinsamkeit

    MyEinsamkeit Member Guru

    Messages:
    180
    Likes Received:
    67
    GPU:
    Radeon Pro W5500
    I'm too lazy to read lol. but i understand, thanks.
     
  14. Aura89

    Aura89 Ancient Guru

    Messages:
    8,157
    Likes Received:
    1,274
    GPU:
    -
    If you really want to know, it's not needed by consumers 99% of the time, it's needed by servers which need massively fast channels so they have no downtime/wait time.

    What we do with our computers and what servers do with their computers can't really be compared lol but realistically speaking, GPUs are the least of PCI-Express' worry about needing more speed, at least with how they are currently developed.
     
    MyEinsamkeit likes this.
  15. MyEinsamkeit

    MyEinsamkeit Member Guru

    Messages:
    180
    Likes Received:
    67
    GPU:
    Radeon Pro W5500
    True, makes sense :)
     

  16. Carfax

    Carfax Ancient Guru

    Messages:
    2,915
    Likes Received:
    465
    GPU:
    NVidia Titan Xp
    You come across like you know what you're talking about, but this entire conversation is clearly over your head. Well, you're just making yourself look like a fool but by all means continue.

    PCIe and NVLink has the SAME core functionality because they have the same basic core technology. Both PCIe and NVLink use serial point to point links to achieve much higher performance than the old PCI interface, which used a parallel bus.

    The fact that you think NVLink is functionally more akin to PCI rather than PCIe demonstrates handily that you are totally clueless.

    Source #1

    Source #2

    Excerpt from source #1:

    Excerpt from source #2:

    The main differences between NVLink and PCIe is that the former has more links and the devices use a mesh framework.

    This totally eluded you and caught you in a logic trap. You initially claimed that the reason why NVLink required so much bandwidth is because the GPUs need to share and have access to each others VRAM. Well PCIe allows the exact same thing, which totally blew your argument out of the window.

    My point is that PCIe provides the same core functionality as NVLink, and as NVLink has gotten faster to keep up with the demands, so has PCIe. It's impossible to say that PCIe 6.0 x16 will be unnecessary because in 3 or so years the hardware will be much more powerful and the games will be bigger and more complex than anything out today.

    Also, you're flat out wrong when you say that communication with the CPU will saturate a significant amount of the bandwidth. This can easily be proved by running Aida64 GPGPU benchmark:

    PCIe 3.0 x8

    [​IMG]

    PCIe 4.0 x8

    [​IMG]

    Both PCIe 3.0 x8 and PCIe 4.0 x8 bandwidth are very close to their theoretical limit in this benchmark, so apparently, you're the idiot.

    As I said before, if you think future PCIe 5.0 or 6.0 x16 slots will be overkill for your GPUs, plug them into an x8 slot instead. The rest of us will continue to use the x16 slot like good PC enthusiasts.

    So although my GPU is only using 2.5GB out of 12GB of VRAM and isn't even close to being maxed out, you're telling me that the reason this happens is to optimize GPU load? What a bunch of B.S.

    It's clearly a software problem, and not hardware. Textures remain in a compressed state in VRAM until they are ready to be used. They can be decompressed very quickly by the GPU in hardware. Further more, streaming performance varies greatly by engine.

    Witcher 3's Red Engine 3.0 had horrible streaming performance compared to the big engines like Frostbite 3.5 or UE4 for instance.

    This is all going to change with the next generation of 3D engines that will be fundamentally programmed to be SSD aware when loading and streaming assets in the background at a 2GB/s rather than a few hundred MB/s like they do now.
     
  17. Carfax

    Carfax Ancient Guru

    Messages:
    2,915
    Likes Received:
    465
    GPU:
    NVidia Titan Xp
    You should reserve your criticism about digging yourself into a deeper hole for yourself. What DmitryKo said was true. The system memory acts as a large cache for the VRAM, so whatever is in the VRAM, is also in the system memory. The only thing the VRAM contains are graphically related, ie shaders, textures, meshes etcetera, while system RAM will have all that and more.

    That's the reason why PC gamers typically recommend having at least twice as much system RAM as VRAM.
     
    Undying likes this.
  18. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,748
    Likes Received:
    2,193
    GPU:
    HIS R9 290
    I could say the same of you. Your insults are a meaningless distraction and say nothing about what you actually know.
    Stay focused on the subject, not me.
    By your logic, that means it's also similar to RS232 or any serial point-to-point link. That's not how it works.
    Electrically, I stand by my point. The similarity is how NVLink, like PCI, is a shared bus between all connected devices. PCIe uses discrete lanes per device.
    Also, although I said it's more functionally akin to PCI, I said it isn't functionally similar. A dog and a cat are more similar to each other than a fish but they're still completely different.
    Those are some pretty damn big differences.
    wtf is your nonsense logic here? First of all, I'm not the one who claimed that. Second, my point remains exactly the same: PCIe doesn't allow the exact same thing and you did nothing to prove otherwise. There isn't sufficient bandwidth over PCIe, which is why NVLink was made. Not even PCIe 6.0 @ x16 is enough. How dense can you possibly get to not understand this very simple point?
    The same core functionality is moot. A serial port provides the same core functionality too: transmitting point-to-point data in a linear fashion.
    As I stated before, a revolutionary breakthrough will be necessary for PCIe 6.0 @ x16 to be needed. I don't think you understand the magnitude of additional bandwidth it offers from PCIe today. At the pace hardware is improving, not even PCIe 4.0 @ x16 is going to get saturated for consumer GPUs for many years. And even then, I'm not so sure bandwidth scales up proportionately with performance, because whether you're at 720p or 4K, once the assets are loaded, much (not all) of the data being streamed over the bus per-frame is the same.
    Then why the hell are you advocating for 6.0 @ x16 if you're so sure that the CPU isn't using significant bandwidth? We've already established NVLink negates the need of PCIe for inter-communication between GPUs.
    No crap, but nobody cares about theoretical limits because in the real world, that's not how it works.
    What you're showing there is basically the VRAM equivalent of BogoMIPS, which pretty much anyone with half a brain would agree is a meaningless metric of performance.
    I don't need to. As I said before, techpowerup has benchmarks of running high-end GPUs at PCIe 3.0 and in most cases, the differences are minimal at x8.
    But, you clearly don't care about real-world results. You just care about theory, as though that accomplishes anything.
    I said GPU load, as in, the processing cores. Not the VRAM. Holy crap you are frustratingly dense.
    Depends on how you look at it, but games are coded to behave that way deliberately.
    And what does this have anything to do with what we're discussing? That's still negligible compared to the total available bandwidth.

    It's not that it isn't true, but completely irrelevant. He might as well have said "yeah but the sky is blue and water is wet!". That's not wrong but isn't a counter-argument. Back in the PCIe 1.0 days or 2GB VRAM days, yeah, you want all the PCIe lanes you can get. But today is completely different. Now we have GPUs that have a large surplus of VRAM. VRAM is now compressed (it wasn't back then). Modern APIs like DX12 and Vulkan reduce overhead of their predecessors for the same workflow. Next-gen GPUs will have onboard storage, to further reduce bandwidth. GPUs aren't multiplying in power the way they used to.
    I'm talking about the future of PCIe and he (and you) are arguing about running outdated tech at x8. That's not how you argue successfully.
     
    Last edited: Jun 18, 2020
  19. DmitryKo

    DmitryKo Master Guru

    Messages:
    376
    Likes Received:
    122
    GPU:
    ASUS RX 5700 XT TUF
    No, it's not how video memory allocation is designed in Windows.

    Windows display driver model always used both local video and shared (system) memory pools to hold graphics assets. This is not just plain system memory - it's a separate memory pool managed by the graphics driver.

    This is how it was designed for Direct3D 5 in Windows 9x for AGP cards, it did not change with WDDM 1.x for Direct3D 9ex/10/11, and it still remains valid for WDDM 2.x and Direct3D 12.

    The only substantial change is the management of resources in virtual memory. While WDMM 1.x in Vista used linear video memory model, so the runtime had to patch virtual addresses, WDDM 2.x uses GPU virtual address space with abstracted page tables that map physical allocations into either dedicated video memory or system shared memory, and each resource is assigned a virtual video memory address. WDMM 2.x also maps the entire local video memory into CPU virtual address space to make it accessible by CPU - the driver manages GPU access to system memory with PCIe bus-master DMA transactions (paging buffers).

    Even if you instruct video memory manager to allocate your heaps in dedicated local video memory, it can still page physical memory between system shared and dedicated video pools when overcommitment happens, such as some other application requesting video memory.


    https://gpuopen.com/events/gdc-2018-presentations/
    Memory management in Vulkan and DX12
    Adam Sawicki (AMD)


    That's pure fantasy world - nobody ever promised discrete GPUs with onboard NVMe drives or dedicated flash memory, which should somehow rival system DRAM bandwidth over PCIe. You are confusing desktop gaming parts with Radeon Pro SSG 2TB, a professional VR accelerator that retailed for $6500 and contained dedicated hardware to allow NVMe disk cache.

    This is not how next-gen consoles work either.


    Except that NVMe disks have real-world bandwidth of 0.3 to 3 GBytes/s, while system RAM can sustain 32 Gbyte/s over PCIe 4.0 x16.
     
    Last edited: Jun 19, 2020
  20. DmitryKo

    DmitryKo Master Guru

    Messages:
    376
    Likes Received:
    122
    GPU:
    ASUS RX 5700 XT TUF
    Yes, but on Windows the GPU can actually use system shared RAM to read or write data - though local video RAM is best for resources that require read-modify-write access from shaders.

    PCIe 5.0 x16 and NVLink 3.0 full-link do have very similar bandwidth, but PCIe implementations do not use a common CPU cache coneherence protocol yet (unlike NVLink which is supported by IBM POWER). This would change with Zen4 and cDNA, which are selected for the El Capitan supercomputer.

    100G/200G Infiniband and 100G/200G Ethernet transceiver silicon is commerically available.

    Ever heard about mipmaps and anisotropic filtering?

    NVLink is just a simple, proprietary point-to-point interface, and it has to be used with proprietary multi-point NVSwitch cross-bar controllers in multi-GPU configurations - which kind of defeats the point of a simple protocol.

    No, you're just making random unsubstantiated claims and keep repeating them to prove your point:
     
    Last edited: Jun 19, 2020
    Carfax likes this.

Share This Page