PCIe 6.0 Specification finalized in 2021 and 4 times faster than PCIe 4.0

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Jun 11, 2020.

  1. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,803
    Likes Received:
    3,358
    GPU:
    6900XT+AW@240Hz
    I think AMD will be ready. But I also think that they'll not release products where they will expect very low sales.
    And that brings us to DDR4 vs DDR5 performance per $.
    People can buy DDR4 4000MHz CL19 for just 50% more than 3600MHz CL17. Both at lower end of price spectrum.
    If DDR5 can match performance per $ there, everything will be ready for release.
    We'll get Zen3... higher IPC, likely higher clock => higher potential fps.
    Then AM5 gets Zen4... rinse and repeat above.
     
  2. Alessio1989

    Alessio1989 Ancient Guru

    Messages:
    1,956
    Likes Received:
    547
    GPU:
    .
    When PCI-E 6.0 will be released for consumer devices, cards like 2080Ti will be no more a reference.
     
  3. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,812
    Likes Received:
    2,238
    GPU:
    HIS R9 290
    Agreed; exactly my point.
     
  4. DmitryKo

    DmitryKo Master Guru

    Messages:
    376
    Likes Received:
    123
    GPU:
    ASUS RX 5700 XT TUF
    It's digital since it consumes a stream of bits. Most digital modulaton schemes encode several bits, but internal buses typicaly use simpler line code with baseband (unmodulated) signal.

    It's 128 GByte/s in one direction.
    One possible application is seamless texture and geometry streaming from disk, where you map all your resources into virtual memory and let the OS memory manager physically load them on-demand. Or a local NVMe/Flash memory buffer to store textures on the video card, like in the Radeon Pro SSG.

    It did stick because there was no progress on the next version of the standard.

    On the desktop it's always going to be the latest and the greatest - that's PCIe 5.0 in early 2022, PCIe 6.0 in 2024-2025.

    Because vendors are skipping PCIe 4.0 and going directly to PCIe 5.0 / 6.0.


    Like 640 KB in DOS?

    It's not about reducing physical lanes, it's about increasing the actual bandwidth. 16-lane slots and 16-lane connectors are not going anywhere.
     
    Last edited: Jun 13, 2020
    Alessio1989 likes this.

  5. Alessio1989

    Alessio1989 Ancient Guru

    Messages:
    1,956
    Likes Received:
    547
    GPU:
    .
    That's exactly the opposite. You cannot know now when 16 lines 6.0 will be necessary and for what devices. Moreover, 16 electrical lines are also meant for backward full-speed compatibility: an hypothetical 16 lines 5.0 GPU (that could take advantage of such bandwidth) will never get the full bandwidth on a 8 lines 6.0 connection since it couldn't handle PCIe 6.0
     
  6. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,812
    Likes Received:
    2,238
    GPU:
    HIS R9 290
    Be realistic... As stated several times before, no single device is known to fully saturate 3.0 @ x16, let alone a GPU. That means we're not saturating x8 lanes on 4.0 either. And you're somehow concerned about 5.0, or 6.0? Each time a new generation comes out, we're basically doubling the bandwidth per lane. That means 6.0 will have quadruple the bandwidth of 4.0, which is already way faster than we really need for current hardware. There will need to be some mind blowing revolutionary breakthrough in expansion card technology to demand more than 64GB/s (6.0 @ x8) by the time 6.0 is released. It's just not going to happen, especially not for consumer-grade hardware. I'm confident that isn't even going to happen with 5.0 either, unless it lives for as long as 3.0 did.

    Even GPUs that demand more than 3.0 @ x8 are not so heavily crippled compared to x16. In some cases, they're actually still very much usable in x4. So, I really don't understand why you're so concerned over a loss of lanes when there is no sign of x16 slots being a necessity in the future.
     
  7. Carfax

    Carfax Ancient Guru

    Messages:
    2,913
    Likes Received:
    465
    GPU:
    NVidia Titan Xp
    I'll say it again. PCIe is not used purely for desktops. It's used heavily in HPC and datacenters where massive amounts of data are moving between the CPU, GPU and RAM. So while PCIe 6.0 might seem like overkill to us, it's definitely not in other sectors that are heavily reliant on bandwidth.
     
    DmitryKo and Alessio1989 like this.
  8. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,812
    Likes Received:
    2,238
    GPU:
    HIS R9 290
    I'm aware.... I have said several times in most of my posts that consumer grade hardware doesn't need x16. I am aware of the demand for servers and mainframes. I never said or even implied otherwise.

    However, I have doubts that even server hardware will demand 6.0 @ x16. But that's a long way away, so who knows how demands might change by then.
     
    Aura89 likes this.
  9. Carfax

    Carfax Ancient Guru

    Messages:
    2,913
    Likes Received:
    465
    GPU:
    NVidia Titan Xp
    You've heard of NVLink right? Nvidia designed it because PCIe was too slow for them. When Ampere launches, the NVLink and NVSwitch technology will be capable of bandwidths up to 600GB/s....and even that is not enough for certain workloads.

    At any rate, it's not just about bandwidth. PCIe 5.0 and 6.0 will also reduce latency, which is more of a problem on desktop. The faster data can flow between CPU, GPU and RAM, the better I say.
     
  10. bobblunderton

    bobblunderton Master Guru

    Messages:
    406
    Likes Received:
    192
    GPU:
    EVGA 2070 Super 8gb
    Essentially the AM4 Ryzen 3xxx models (and future 4xxx models) have 2x the bandwidth in lanes due to PCI-E 4.0. That precise thing you said about 'too few lanes on consumer cpus' is exactly what irritated me on Z97 - it was so limiting. You now won't need to take as many lanes to get the bandwidth you need - and the connection to the chipset is 2x as wide.
    Only catch? You need pci-e 4.0 devices to go with it, which are slow to catch on until there is a good amount of market penetration of pci-e 4.0 motherboards.
    So, AMD's x570 is solid pci-e 4.0, b550 boards have the top-half as 4.0 bottom as 3.0. Intel's socket lga1200 boards have pci-e-4.0 ready slots on them but no cpu's yet that output a clean /strong enough signal to meet pci consortium pci-e 4.0 standards.
    New standards are always being worked on, and deployed when needed all over the industry.
    Otherwise we would have too many issues like things working with only one manufacturer's tech and not the other, if the standards came too late. Nevermind all the bottlenecks we would "potentially" have until the standards got here if that wasn't the case. We needn't have any more things slowing down / stagnating innovation besides intel the last 6~7 years!

    PCI-E 3.0 x16 is not a bottleneck until you're reading from / writing to system RAM, that will push a full slot worth of traffic without too much issue (provided half-decent system ram speed!)
    If everything can fit on the card's memory though, you're fine with pci-e 3.0 x16, even x8 most times (aside of maybe 1~3fps).
    The biggest problem will be keeping signal integrity as it's already getting much worse with pci-e 4.0 - integrity loss after anything over 6 inches VS 3 feet+ on pci-e 3.0.

    I'm just a wee bit perturbed we don't have the gpu having 100% full speed access to system RAM yet. That'd cost a pretty penny but sure would be worth it - cost being the prohibitive factor.
    I completely removed the top x16 slot of an Asus Maximus Z97 Hero motherboard last December.
    Why? It wouldn't release my spare GPU and I needed it for work (too many Rx 480 gpu driver errors!!!), and I couldn't get anything down to the release mechanism without scratching the board if it slips without taking the whole computer apart. Well I got my card out, slot and all.
    I will never, ever, ever buy another Asus board, as the rest of the boards just have simple retention mechanisms that pop off. Bought Asus boards almost solely for 20~25 years. They're over engineered, good boards surely. No thank-you, though, this simple ASRock board the regular clips that just break off and that's fine with me. After that, I broke every clip on every board I owned, don't need them unless you're banging the tower around a lot & can't get my 40 y/o hands in them small spaces.
    Just sorry it took until 6 months ago for me to snap all the retention mechanisms off, and that I wasn't doing it out-of-the-box for the last 15 years since we have had pci-e slots.
     
    Last edited: Jun 14, 2020

  11. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,812
    Likes Received:
    2,238
    GPU:
    HIS R9 290
    Do you know what NVLink actually does? Not even PCIe 6.0 @ x16 is sufficient for what Nvidia wants. They need the bandwidth because the memory is shared. PCIe is not meant to compete with memory bandwidth.
    Agreed, and you don't need x16 lanes to do that.
     
  12. Carfax

    Carfax Ancient Guru

    Messages:
    2,913
    Likes Received:
    465
    GPU:
    NVidia Titan Xp
    Yes, do you?

    This is exactly my point. The workload dictates what sort of bandwidth is actually needed. NVLink allows memory access/sharing/transfers of data between the GPUs and the CPUs, which is the largest bottleneck in accelerated computing because the CPU and GPU use disparate memory pools.

    Memory bandwidth is just one part of the problem. The other part is latency. If you have a multi-GPU paired with a couple of CPUs and you want the GPUs to be able to access each other's VRAM as well as the system memory that the CPUs use and vice versa, you want a high bandwidth, low latency solution. While PCIe can work for the vast majority of consumer workloads, it starts to buckle under commercial and scientific workloads because the working sets are so much larger, ie terabytes of data or more.

    Perhaps not, but PCIe is a standard that has continued to become stronger over the years. PCIe's mandate was always to double the bandwidth with every generation. They shouldn't stop adhering to that now just because the standard is outrunning the needs of the consumer market.

    Think of it this way. The most intensive consumer applications are games, and games have continued to become larger and larger over the years. Having a faster PCIe protocol will allow faster streaming and more advanced physics due to the elimination of latency and bandwidth bottlenecks between the CPU, GPU and RAM.
     
    DmitryKo and Alessio1989 like this.
  13. Aura89

    Aura89 Ancient Guru

    Messages:
    8,161
    Likes Received:
    1,274
    GPU:
    -
    I'll be honest, i havnt read every single thing written in both of yours conversation, however, i'm not sure why you are stating this exact statement.

    I have not seen him say pci-express shouldnt get better, only that the more it advances, the less lanes a typical PC will need for its products, such as GPU, which could theoretically be just fine with 4x pci-e 6.0 (as they currently stand, which is correct, and unless something drastic happens, that'll stay correct for awhile).

    Maybe i am wrong, maybe he did state that and i glossed over it. Who knows. But to me it seems you two are discussing different topics with at least one of you thinking one is saying something the other isnt saying.
     
  14. Fox2232

    Fox2232 Ancient Guru

    Messages:
    11,803
    Likes Received:
    3,358
    GPU:
    6900XT+AW@240Hz
    I wrote it in similar way. There is no reason to have interconnect between GPU and system memory that's faster than memory itself. It may benefit tiny bit from reduced latency at best.
    That's why I wrote that fast PCIe would make CPU to VRAM access faster than CPU to its system memory. (Latency ignored.)
    Maybe there would be benefits for GPU to GPU communication.

    @Carfax : Sharing VRAM between GPU is problem for applications (and games) that need bandwidth. with GPU that has IMC and memories capable to read 256GB/s you get certain performance downgrade once this bandwidth is saturated and GPU is able to process more data than VRAM can provide.
    Moment 2nd GPU accesses same data too, it is eating bandwidth which GPU owning VRAM need for itself.
    Yet, at same time when GPU-owning-VRAM needs some 160GB/s to fully process, that secondary GPU with direct access will need same 160GB/s from board to board to help sit same work.

    That makes pretty unpleasant situation where single GPU using its VRAM can get away with memory bandwidth X. But for two GPUs accessing each other memory at full performance, each of them would need memory bandwidth of 2X and interconnect capable to do X transfer. (Optimal situation where multi GPU would fully benefit.)
    Today, GDDR6 16Gbps per pin can do theoretical 768GB/s via 384-bit IMC. Interconnect between 2 GPUs capable to achieve this in both directions is not small (cheap) thing. And neither would be improvement to IMC of each GPU to provide twice as much data.

    Secondly, such GPUs would be big waste if used as single parts. This endeavour would be very very situational. And only benefit over having chiplet based (bigger in total) GPU with shared bigger VRAM would be that heat would come from 2 separate cards and not from one.

    And I ted to believe AMD's success in chiplet CPU design will enable them to move it to GPUs. At 1st maybe for CDNA. But in time it may be ready for RDNA too.
    And same applies to nVidia. They know that chiplets will result in bigger profits in long run.
     
    Carfax likes this.
  15. PrMinisterGR

    PrMinisterGR Ancient Guru

    Messages:
    7,670
    Likes Received:
    591
    GPU:
    Inno3D RTX 3090
    That's not very accurate for a lot of open world games, for example. And with DirectStorage which is already here (and in Ampere and RDNA 2.0, as rumoured), the GPU will need to talk to NVMe storage directly, so this will change quite fast.
     

  16. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,812
    Likes Received:
    2,238
    GPU:
    HIS R9 290
    Seeing as I understand why it needs more bandwidth than 6.0 @ x16, yes. I don't get why you think PCIe would have ever been a viable option.
    Right, so how exactly does PCIe come into play here when it provides nowhere near enough bandwidth, even if the communication from the CPU is entirely silent?
    Again.... how exactly does PCIe come into play here? Adding more lanes doesn't improve latency.
    I completely agree. Where did I say otherwise? What exactly do you think you're arguing against here? Because for the most part you're not saying anything we disagree on or goes against my point [that we very likely won't need 6.0 @ x16 for consumer-grade hardware].
    No.... it won't. As stated several times already, even the best of gaming GPUs haven't saturated 3.0 @ x16. Drop down to x8 and the performance loss in many cases is minimal, if not negligible. At the pace GPU technology is evolving, we are years away from putting use to 4.0 @ x16, and you're arguing that we're going to need x16 for 5.0 or 6.0? By the time a GPU would saturate a 6.0 @ x8 slot, PCIe 7.0 will probably have been released multiple years prior.


    None of that really changes my point. In fact, the thing about DirectStorage would actually reduce the need of extra bandwidth over the PCIe bus. Sure, it'll soak up a lot of bandwidth at first while the storage drive is being filled up, but worst-case scenario, it'd use half the available bandwidth (but in modern times it's more like 1/4) and the main bottleneck will be filling up the drive.
     
    Last edited: Jun 15, 2020
  17. Astyanax

    Astyanax Ancient Guru

    Messages:
    10,305
    Likes Received:
    3,703
    GPU:
    GTX 1080ti
    NVLink is as much about latency as it is about bandwidth.
     
  18. DmitryKo

    DmitryKo Master Guru

    Messages:
    376
    Likes Received:
    123
    GPU:
    ASUS RX 5700 XT TUF
    Top graphics cards will soon use HBM2E memories with >1 TByte/s bandwidth - so even 126 GByte/s available with 16-lane PCIe 6.0 (and two-channel DDR5-8000 memory) would be an order of magnitude slower.

    Also graphics cards on the desktop (and fibre-optic network cards on server platforms) have long settled on the PCIe x16 connector (just like the few remaining addon cards settled on PCIe x1 connector, and M.2 settled on 4 PCIe lanes). These form factors are here to stay for the foreseeable future, long after PCIe 6.0 is initially available.

    Using PCIe x8/x4 connectors, or x16 connectors with fewer physical lanes, makes no sense for new platforms and extension cards for compatibility reasons (newer x8 cards will be handicapped on old x16 systems and old x16 cards will be handicapped on newer x8 systems).


    PCIe 6.0 is not faster than system memory - 126 Gbyte/s in one direction is the same as dual-channel DDR5-8000 (which should be common by 2025).

    Multi-link NVLink/NVSwich are designed for professional graphics cards and supercomputers with multiple embedded GPUs (and preferably embedded POWER9 processors). The only viable alternative on the desktop is cross-switched PCIe with CXL/GenZ on top.

    We'll see how it stands for next-gen console games with several dozen GBytes of graphics assets.

    No. DirectStorage is about reducing loading times from disk, but using shared system memory will still be an order of magnitude faster than streaming from an NVMe disk to local video memory.

    (Video memory on the PC encompasses both dedicated video memory and system shared memory which is half of system RAM).

    I doubt 32 GByte/s PCIe bandwdith will be enough for Unreal Engine 5 games released in the year 2025, unless videocards will come with 64 GBytes of local memory and NVMe drives will switch to Optane (3D Xpoint).
     
    Last edited: Jun 30, 2020
    PrMinisterGR, Carfax and Alessio1989 like this.
  19. schmidtbag

    schmidtbag Ancient Guru

    Messages:
    5,812
    Likes Received:
    2,238
    GPU:
    HIS R9 290
    Uh... so? What does memory bandwidth have anything to do with this? The whole point of VRAM is to feed the GPU directly, so you don't have to feed data through PCIe directly. Even with HBM2 GPUs today, the bottleneck isn't PCIe, it's the storage devices.
    wtf? I know x16 is "long settled"... My point is it doesn't need to be anymore for consumer-grade hardware. It's really not a hard concept to grasp and it's mind boggling the kinds of arguments thrown my way as though they change my point in any way at all.
    Yes it does, you're just too dense to understand it. Here, I'll spoonfeed you the thought process:
    * Currently, nothing saturates PCIe 3.0 @ x16. It is possible for devices to get saturated at x8, but not many and not often.
    * GPU technology is evolving very slowly; next-gen GPUs (configured for PCIe 4.0) are still very unlikely to saturate 3.0 @ x16. Therefore, they are also unlikely to saturate 4.0 @ x8.
    * At this rate, it will take years until 4.0 @ x16 will be saturated by consumer-grade hardware, and very likely some server-grade hardware too.
    * Despite how 4.0 has a very long shelf life, it's already obsoleted by 5.0 (Intel appears to be skipping 4.0). Remember, each generation [nearly] doubles the bandwidth per-lane.
    * Considering all of the above, 5.0 @ x8 is already excessive by today's standards, and seriously overkill by consumer-grade standards. A major breakthrough in technology will need to occur for 5.0 @ x8 to be insufficient. That could be the better part of a decade away from us.
    * 6.0 spec is nearly finalized. By the time 5.0 @ x8 gets anywhere close to saturation, manufacturers can move onto 6.0 when necessary, and that will be a long way away.
    * As stated before, nobody is going to use a 2080 Ti on a PCIe 6.0 board, and if they do, they probably don't care about the small handful of FPS they might lose at x8.
    PCIe 6.0 just doesn't need x16, and I'm willing to bet 5.0 doesn't either for consumer-grade hardware.
    Right, and clearly Nvidia gave up on that, which is why SLi isn't really a thing anymore and why you're not going to see GTX/RTX GPUs with NVLink any time soon.
    As stated before, the bottleneck in this situation is the storage. Even if it wasn't, are you really so impatient that you can't wait 1 or 2 seconds for tens of gigabytes to be written during a loading screen?
    Seriously do you not think at all about what the point is I'm trying to make? You literally just proved my point. As you said, the NVMe disk is local storage for the GPU. Meaning, that is data that doesn't need to be re-loaded over PCIe. Therefore, it reduces bandwidth. It further obsoletes the need to x16 slots.
     
    Fox2232 likes this.
  20. Alessio1989

    Alessio1989 Ancient Guru

    Messages:
    1,956
    Likes Received:
    547
    GPU:
    .
    You load data from storage devices in the background when you do not need it to system memory (eg: level loading), on video-games the bottleneck for streaming is the bus (both bandwidth and latency) between GPU and system memory, at least on dedicated GPUs. You always load data between the smaller bottlenecks (secondary memory -> system memory <-> processor). Your points come from a point of view of someone who never wrote a single line of code of a real-time interactive software. Your rants will never ever remove 16 lines PCI-Express slots.
     
    Last edited: Jun 16, 2020
    DmitryKo and Carfax like this.

Share This Page