Samsung 980 PRO M.2 gets PCIe Gen4 and performance up to 6,500 MB/s

Discussion in 'Frontpage news' started by Hilbert Hagedoorn, Jan 9, 2020.

  1. wavetrex

    wavetrex Ancient Guru

    Messages:
    2,464
    Likes Received:
    2,574
    GPU:
    ROG RTX 6090 Ultra
  2. DmitryKo

    DmitryKo Master Guru

    Messages:
    447
    Likes Received:
    159
    GPU:
    ASRock RX 7800 XT
    That would require an overhaul of the disk IO subsystems and file systems, which were designed in the days of 100 MB (megabyte) hard drives spinning at 1700 RPM with access times of 30 ms (millisecond) and transfer rates of 1 MB/s (megabyte) - as opposed to 1 TB (terabyte) SSDs with almost instant access time and several hundred thousand IOPS.


    Namely sector sizes on SSDs need to get much bigger than allowed by 512e or 4Kn Advanced format, which was designed for hard drives in the file servers.
    Typical flash memory has a write page size of 8-16 KB and erase block size of 128-256 pages (512-2048 KB) - so disk IO structures and filesystem sector/cluster sizes need to match or exceed these size in order to impove random read/write performance.
    Good thing NVMe 1.x structures supports arbitrary sector sizes with 'LBA Data Size (LBADS)', a 7-bit field that contains the exponent of a power of two (i.e. 2^N) - so values 9 to 127 define sector sizes from 2^9=512 bytes to practically 2^20=1 MB (up to the maximum of 2^127 bytes).


    Also we'd need a WORM (write once read many) based filesystem that takes advantage of contiguous block allocations and uses susbstantially larger blocks. There's some improvement on this in latest versions of Windows.


    First, the latest version of the built-in compact.exe command supports NTFS 'CompactOS' file compression since Windows 8 - this new algorithm works with arbitrary cluster size and writes the compressed data to a contiguous block of new clusters. This is unlike the original NTFS compression from 1992, which only works on 4 KB clusters and brutally chops the compressed file in pieces in process, resulting in heavy fragmentation.
    Unfortunately this compression only persists during read-only access - any write will automatically decompress the file.


    Second, exFAT and NTFS do support large clusters - exFAT up to 32 MByte clusters, and NTFS up to 2 MByte clusters since Windows 10 version 1709.
    Windows 10 automatically allocates 128 KByte clusters for exFAT volumes below 512 GB, then uses a progression of 256/512/1024/2048 KB etc. cluster sizes for volumes of 512GB /1/2/4 TB etc.

    Unfortunately, NTFS still uses 4 KB clusters by default, which is still Microsoft-recommended size; you need to manually run the command-line format and specify these large cluster sizes, since the File Explorer UI does not support them.


    I believe the 4KB limitation is related to virtual memory paging (pagefile.sys). Windows has always been using page size of 4 KB on Intel x32 and x64 (x86-64) and now on ARM64. Even though all x64 (Intel64/x86-64) processors support page size of 2 MB (Page Size Extension flag), and most recent ones support page size of 1 GB (pdpe1gb flag), it's not been used by the OS by default, due to possible memory fragmentation - and employing sector/cluster sizes larger than processor page size is probably a bad idea from a performance standpoint.

    There's a recent Intel patent US9858198 - 64KB page system that supports 4KB page operations; it proposes extensions to the 5-level paging caching system, so that legacy 4KB pages would be mapped into 64KB allocation units and the virtual adress is extended to full 64-bit.
    5-level paging is available since Ice Lake and supports 56-bit 'canonical' address, while current 4-level paging implementations of virtual adress space use 48-bit 'canonical' address.
    But it's probably years before 64KB pages are implementerd in actual CPUs, and even longer for the OS to start taking advantage of them.


    So it's a long way to fully exploiting peformance improvements of recent NVMe SSDs on the PCIe 4.0 bus.


    Phison PS5018-E18 controller is rated for up to 7.0/7.0 GBytes/s and 1M IOPS.

    Samsung in-house controllers tend to perform better than competition in real-world loads though.


    It looks like some disk IO subsystem limitation, probably some internal PCIe or memory bus clock/timer.
     
    Last edited: Jan 16, 2020
    Carfax, JonasBeckman and Venix like this.
  3. nosirrahx

    nosirrahx Master Guru

    Messages:
    450
    Likes Received:
    139
    GPU:
    HD7700
    If that was the whole story then Intel could not have smashed right through 200M/S 4KQ1T1 with their 900/905P drives. Storage medium and controller matter a lot in the equation.

    All 3 of these are Optane driven systems I personally own:

    [​IMG]

    As far as PICe 4.0 goes I am interested in seeing if the faster frequency can result in faster 4KQ1T1 due to the NAND latency hit technically getting a half cycle earlier jump over PCIe 3.

    We see something similar with RAM and latency. Reducing latency helps but much faster frequency can actually trumps latency improvements in many cases.
     
  4. nosirrahx

    nosirrahx Master Guru

    Messages:
    450
    Likes Received:
    139
    GPU:
    HD7700

    I don't either. I want to see 4TB+ M.2 NVMe drives with very good 4KQ1T1 (100 MB+). High sequential speed is the icing on the cake but certainly is not the actual cake.
     
    angelgraves13 likes this.

  5. nizzen

    nizzen Ancient Guru

    Messages:
    2,419
    Likes Received:
    1,157
    GPU:
    3x3090/3060ti/2080t
    Last edited: Jan 13, 2020
  6. nosirrahx

    nosirrahx Master Guru

    Messages:
    450
    Likes Received:
    139
    GPU:
    HD7700
    Those are both wider and longer than conventional 2280 M.2 ports, compatibility would be pretty limited.

    If you had a wide open 22110 M.2 port (not sandwiched between 2 PCIe slots) you might be able to use one of these.

    EDIT :

    I would be willing to bet that this would allow you to use these drives in most PCs:

    https://www.xt-xinte.com/JEYI-SK18-...ze-M-2-SSD-High-Speed-Riser-Card-p588659.html

    The centrally located M.2 port would allow plenty of room for the width and the 22110 length is also supported.
     
    Last edited: Jan 13, 2020
  7. DmitryKo

    DmitryKo Master Guru

    Messages:
    447
    Likes Received:
    159
    GPU:
    ASRock RX 7800 XT
    Well, Optane (3D Xpoint) is not flash memory, it's half-way from NAND flash to DRAM. Technology is quite different - instead of a grid of MOSFET transistors with floating gates that can hold isolated charge, Optane uses a phase-change material (chalcogenide glass) which transfers from amorphous (low resistance) to crystalline (high resistance) state when heated by high electric current.


    Thus Optane has significantly faster access time, on the order of dozens nanoseconds as opposed to hundreds microseconds for NAND flash - that's 3-4 orders of magnitude faster (i.e 1000-10 000) than SSDs when used in Direct mode.
    That's why Intel is actually selling Optane DC persistent memory modules - DIMM memory for servers which works as slower far memory (replacing much more expensive local DRAM as a file cache in storage access bound applications).
    And even when Optane is used in NVMe SSDs - which is suboptimal due to controller/disk IO/file system overhead - latency is still 5-10 better than NAND flash SSDs.


    Additionally, read bandwidth is several times faster than typical flash SSDs (though uncached writes to Optane are several times slower than reads, unlike NAND flash).
    Longevity is also several times better than typical flash SSDs - though enterprise-grade flash-based SSDs now approach similar figures.
    And last but not least, Optane is individually bit/word adressable, so writes do not have to consider write page granularity (8-16 KBytes) and erase block granularity (512-2048 KB), as in NAND flash.

    https://software.intel.com/en-us/videos/operating-modes-of-intel-optane-dc-persistent-memory
    https://www.intel.com/content/www/u...0/memory-and-storage/intel-optane-memory.html

    https://blocksandfiles.com/2019/07/02/optane-dimm-access-modes/
    https://thessdguy.com/intels-optane-two-confusing-modes-part-4-comparing-the-modes/
    https://www.researchgate.net/public..._the_Intel_Optane_DC_Persistent_Memory_Module



    On the downside, Optane Memory drives are much more expensive than NAND flash SSDs, and also significanlty bulkier - while 1 TB NVMe NAND flash SSDs are made with few flash chips in an M.2 2280 form factor and currently retail for $150-250, 960 GB Optane 900/905 NVMe SSDs are only available in 2.5" U2 and PCIe x4 card form factors and retail for $1250, while 380 GB Optane 900/905 are available in M.2 22100 form factor (10 cm length) for $500.
    And Optane DC persistent memory modules require very expensive server LGA-3647 motherboards and processors, and only come as expensive 512 and 768 GByte DIMM modules.

    I wouldn't really mind having an Optane-based PCIe 5.0 NVMe SSD - or even better a 1 TB Optane DIMM - in my desktop by 2022, but for now it looks like NVMe SSDs with 3D stacked flash memory has much better application support and much higher price-performance ratio, and that will probably continue well into 2030s, when the 3D Xpoint patent is set to expire...
     
    Last edited: Mar 25, 2021
  8. nosirrahx

    nosirrahx Master Guru

    Messages:
    450
    Likes Received:
    139
    GPU:
    HD7700
    There actually is a compelling use case now but Intel does not officially support it. You can cache a cheap 2TB SATA SSD against a 58GB 800P module and get both decent capacity and great 4KQ1T1 performance for less than a higher end 2TB NVMe drive. Unfortunately Intel made so many mistake with Optane that the latest developments are the canceled M15 and 815P Optane drives. The Optane DIMMs coming to workstation will be pretty cool but I hope by then there are a few other options for low latency storage, I know Samsung is working on something.

    The funny part about Intel's unofficial support of Optane cache is that it is actually unlimited. I tried using a 240GB 900P U.2 drive adapted to M.2 as cache for a SATA drive. Their software did not even blink, cache was setup and after reboot I had a massive 240GB Optane cache available. They don't talk about it but in a system with a lot of RAM and a bigger Optane cache the software pulls some strange voodoo giving pretty amazing performance above what seems possible. This is my Vivobook pro laptop:

    [​IMG]
    The numbers are great but look at the sequential read, PCIe 3.0 is not even capable of that.
     
  9. Venix

    Venix Ancient Guru

    Messages:
    3,472
    Likes Received:
    1,972
    GPU:
    Rtx 4070 super
    @DmitryKo this was an interesting read ! Thank you for taking the time to write it!
     
  10. DmitryKo

    DmitryKo Master Guru

    Messages:
    447
    Likes Received:
    159
    GPU:
    ASRock RX 7800 XT
    As long as there is LBA translation overherhead for 512B sectors, real-world usage benefits for such "hybrid SSDs" will be minimal.

    There can be sizeable improvements from read caching of substantially slower drives, like the actual HDDs.
    As for SSDs, for the price difference you can get better results by upgrading to a faster NVMe SSD.
    See Intel H10, an Optane Memory and QLC SSD hybrid with the two devices on the same M.2 card:
    https://www.anandtech.com/show/14249/the-intel-optane-memory-h10-review-two-ssds-in-one/5

    Also Optane Memory disk cache requires a supported Intel chipset and Intel RST drivers, which rules out AMD platforms - though it could be used with StoreMI, a free version of Enmotus Fuzedrive storage tiering technology:
    https://www.amd.com/en/technologies/store-mi

    Yep, loks like some side effects of RAM caching in the Intel RST driver - Intel 900P cannot possibly sustain that kind of bandwidth, it's a PCIe 3.0 x4 drive.

    No problem. I just took some time to research if 64KB disk sectors would be practical to implement - and found out Intel is already thinking about some future extensions which allow exactly this.
    Unfortunately they still remain unannounced as of now, so supporting CPUs are probably at least 5 years away.
     
    Last edited: Jan 17, 2020
    Venix likes this.

  11. nizzen

    nizzen Ancient Guru

    Messages:
    2,419
    Likes Received:
    1,157
    GPU:
    3x3090/3060ti/2080t
    I'm using Intel Optane 900p for my Threadripper. Works pretty well :D
     
  12. nosirrahx

    nosirrahx Master Guru

    Messages:
    450
    Likes Received:
    139
    GPU:
    HD7700
    I did not find that to be the case. You can see in the allocation breakdown where the Optane cache prioritizes OS files. In particular booting is noticeably faster.

    For a time it was actually about $100 cheaper to combine a 2TB SATA SSD and 58GB 800P than it was to buy a 2TB NVMe drive.

    We know all about these drives, what a terrible idea from Intel. The combination of requiring a modern Intel chipset and limiting capacity to 1TB gave users almost none of what they wanted.

    Talking about Optane we should probably mention another area where Intel shot themselves in the foot, mitigations. The way Optane software works must take heavy advantage of the same technology patched by mitigations due to the 25% to 40% reduction in performance I saw when I tested this:

    https://forums.guru3d.com/threads/m...ormance-hit-tested-on-optane-and-vroc.421594/

    4KQ1T1 performance took a huge hit and this as before several more rounds of mitigation.
     
  13. DmitryKo

    DmitryKo Master Guru

    Messages:
    447
    Likes Received:
    159
    GPU:
    ASRock RX 7800 XT
    Booting is already just a few seconds, and practical cache size is too limited to make an impact on sustained transfers. Whereas disk tests like ATTO show how 64 KB blocks really saturate the transfer rate to the max even with low queue depth, comparing to 512B and 4 KB blocks.

    You used the Optane 900P in a software RAID implemented by Intel RST drivers (VROC - Virtual RAID on CPU) - and Intel Rapid Storage has quite a history of quirks plaguing different driver releases.
     
  14. nosirrahx

    nosirrahx Master Guru

    Messages:
    450
    Likes Received:
    139
    GPU:
    HD7700
    Check my post again, there were 3 scenarios, all showing big decreases in performance:

    VROC (4X RAID 0)
    RST (2X RAID 0)
    Pure NVMe (Single 905P)

    The best performance was with the newest VROC UEFI module and mitigations turned off.
     

Share This Page