Generate Groundbreaking Ray-Traced Images with Next-Generation NVIDIA DLSS | NVIDIA Technical Blog August 25, 2023 Interesting tidbit from Nvidia developer blog on DLSS 3.5 ... "DLSS 3.5 also adds Auto Scene Change Detection to Frame Generation. This feature aims to automatically prevent Frame Generation from producing difficult-to-create frames between a substantial scene change. It does this by analyzing the in-game camera orientation on every DLSS Frame Generation frame pair. Auto Scene Change Detection eases integration of new DLSS 3 titles, is backwards compatible with all DLSS 3 integrations, and supports all rendering platforms. In SDK build variants, the scene change detector provides onscreen aids to indicate when a scene change is detected so the developer can pass in the reset flag."
NVIDIA DLSS 3.5 SDK Published Along With Supporting Docs, Coming To AAA Games Soon August 26, 2023 But the thing that makes DLSS 3.5 fundamentally different from AMD's FSR 3 is that DLSS 3.5 is an extension of DLSS 3 which already introduced the two elements, frame generation and frame interpolation. The technology has already gotten better with its DLSS 3.1 revision and DLSS 3.5 aims to further enhance the ray tracing capabilities through a new tech called Ray Reconstruction. On a technical level, AMD's FSR 3 would be the competitor to DLSS 3/3.1, not DLSS 3.5 which is far more advanced in the things it's made to achieve. DLSS 3 itself has a far bigger adoption rate than DLSS 2 and is already featured in a range of titles whereas AMD has so far only promised two AAA title support for 2023 (Fall) and at least 10 more titles coming in 2024. AMD did announce that its HYPR-RX technology will enable fluid motion frame enablement across DX11/DX12 games but that will only be limited to Radeon RX 7000 GPUs and utilizes RSR over FSR which may have a slight bit of an impact on the visual quality of the game.
Tesla Launches New $300M AI Cluster for Advanced Computation (hpcwire.com) Based on the Nvidia H100 platform, it is expected to be a noteworthy addition in the industry. The system is equipped with 10,000 Nvidia H100 GPUs, enabling it to potentially reach a peak performance of 340 FP64 PFLOPS for technical computing and 39.58 INT8 ExaFLOPS for AI applications. Tesla’s 340 FP64 PFLOPS is higher than 304 FP64 PFLOPS from CINECA’s Leonardo supercomputer, the fourth fastest in the world. ... A significant feature of this new cluster is its focus on bolstering Tesla’s full self-driving (FSD) technology. However, with Nvidia currently facing supply limitations, Tesla’s foresight is evident. They have diversified their approach with a strategic investment in a proprietary supercomputer named Dojo, which is anticipated to work in tandem with the Nvidia H100 GPU cluster.
Indeed, at least they have foresight. They know that the car business won't get any better in the future years, and the home battery / solar roof business isn't really picking off that much. That said, yet another cloud computing provider... anything else I did miss?
True, unless Tesla has some unknown innovations that keep the company ahead of the EV pack. I'm just waiting for one EV company to do away with the subscription "based pay as you go" feature scheme (ie, heated seats, a/c, etc) and begin to include all features in the MSRP. Not sure Tesla will open their cloud capacity to outside companies. The fact that they also need to use Dojo to meet their requirements might mean they anticipate not having much excess capacity, but will be of interest if they need additional compute power next year.
I'm afraid this pandora's box is opened, since the high price manufacturers of traditional cars have adopted this too, e.g. Mercedes and BMW. But I'm all with you on this one, and I will have to buy a car a few years down the road...
Not really an EV company but my Hyundai Ioniq 5 has no subscriptions, heated/cooled seats + heated steering wheel. HDAII (Hyundai self driving) is really good on geofenced roads (not as good as Tesla's system but imo makes a long drive down the NJ Turnpike easy as can be) and the steering assist/radar cruise works great on non-geo roads. Plus it has tons of other cool features (HUD/Super fast charging/etc) that the Tesla doesn't have.. and was relatively affordable. After what my sister in-law went through with a Model Y, I'll never own a Tesla. She had one the earliest builds of the Model Y, so it was kind of expected to have some issues.. but the car basically fell apart in the 3 years she leased it and Telsa treated her like crap through the entire process. Her Husband at one point was going to get a P100D and ended up getting a Lucid instead because of how bad the experience was with the Y.
Just did some reading on the Hyundai EV and seems to be very highly rated. I'll be looking at this model when I'm in the market. It seems the 2024 model also uses the complete Nvidia stack:
Insane. And here I am contemplating either a Toyota GR86 Or a Civic SI for my next get around town car. Totally other end of the spectrum.
Unreal Engine 5.3 Release Notes regarding HWRT: Lumen Reflections support more than one bounce when hardware ray tracing (HWRT) is enabled with Hit Lighting, and the Max Reflection Bounces setting in the post process volume is set to 2 or greater. This can prevent black areas in reflections when there is enough performance budget to allow it. Lumen Reflections can now be used without Lumen GI, for games and applications which use static lighting but wish to scale up in reflection quality beyond reflection captures. Standalone Lumen Reflections only work when HWRT is enabled, and it will enable Hit Lighting automatically, as Lumen's surface cache optimization is not available when Lumen GI is disabled. We made significant Lumen HWRT optimizations, including enabling async compute for HWRT by default on consoles. Unreal Engine 5.3 Release Notes | Unreal Engine 5.3 Documentation
Nintendo showed Switch 2 demos at Gamescom | VGC (videogameschronicle.com) According to the publication, Nintendo privately showed invited developers specially prepared tech demos for its next-generation games console, which could launch next year. One ‘Switch 2’ demo is understood to have been an improved version of the Switch launch title Zelda: Breath of the Wild, running at a higher framerate and resolution than the original game did, on hardware targeting the new console’s specs (but there was no suggestion the game will actually be re-released.) The demo is said to have been running using Nvidia’s DLSS upscaling technology, with advanced ray tracing enabled and visuals comparable to Sony‘s and Microsoft’s current-gen consoles (however, it should be noted this does not mean the Switch successor will sport raw power anywhere near that of PS5 or Xbox Series X, which aren’t portable devices).
Nvidia Says New Software Will Double LLM Inference Speed On H100 GPU | CRN Nvidia said it plans to release new open-source software that will significantly speed up live applications running on large language models powered by its GPUs, including the flagship H100 accelerator. The Santa Clara, Calif.-based AI chip giant said on Friday that the software library, TensorRT-LLM, will double the H100’s performance for running inference on leading large language models (LLMs) when it comes out next month. Nvidia plans to integrate the software, which is available in early access, into its Nvidia NeMo LLM framework as part of the Nvidia AI Enterprise software suite. ... Nvidia teased TensorRT-LLM last month as part of the recently announced VMware Private AI Foundation platform, which will let VMware customers use their proprietary data to build custom LLMs and run generative AI apps using Nvidia AI Enterprise on VMware Cloud Foundation. Buck said TensorRT-LLM will support several Nvidia GPUs beyond the H100, including its previous flagship data center accelerator, the A100, as well as the L4, L40, L40S and the forthcoming Grace Hopper Superchip, which combines an H100 GPU with its 72-core Grace CPU. Nvidia said it worked closely with several major AI ecosystem players—including Facebook parent company Meta and Mosaic, the generative AI platform vendor recently acquired by Databricks—on the LLM inference optimizations that went into the open-source TensorRT-LLM. “Everyone can get the benefit of getting the best possible performance out of Hopper and, of course, other data center GPUs for large language model inference,” Buck said. TensorRT-LLM optimizes LLM inference performance on Nvidia GPUs in four ways, according to Buck. ... In two charts shared by Nvidia, the company demonstrated that the TensorRT-LLM optimizations allow the H100 to provide significantly higher performance for popular LLMs. For the GPT-J 6B LLM, Nvidia showed that an H100 enabled with TensorRT-LLM can perform inference two times faster than a regular H100 and eight times faster than the previous-generation A100. For Meta’s Llama2 LLM, the company showed the optimized H100 running nearly 77 percent faster than the vanilla H100 and 4.6 times faster than the A100. Buck said the performance gains translate into improved power efficiency, with the H100 using the same power to complete twice as many tasks as before thanks to TensorRT-LLM. The last critical aspect of TensorRT-LLM is that it’s optimized to take advantage of the H100’s Transformer Engine, which automatically converts LLMs that have been trained with a 16-bit floating point format to an 8-bit integer format that takes up less space in the GPU memory.
Fwiw... 4060ti is finally on Steam HW survey charts https://store.steampowered.com/hwsurvey/videocard/
Rather than performance optimizations, I think Nvidia’s game ready drivers just have some level of driver validation so newly launched games actually run on NV cards. This is so they don’t find themselves in a similar situation with intel arc cards and Starfield which didn’t even work at all.
GeForce RTX 5090 rumors take shape Chiphell leaker who revealed the first details on NVIDIA’s next-gen consumer GPU lineup now shares new rumors about the capabilities of the new architecture. As a reminder, Panzerlied was the individual who initially revealed that NVIDIA plans to skip the XXX04-class GPU in their upcoming gaming product series. Upon this disclosure, we reached out to Kopite7kimi, a well-respected NVIDIA insider, who subsequently confirmed these rumors. Kopite7kimi also informed us that the upcoming NVIDIA RTX 50 series would adopt the GB2XX naming convention. Today, Kopite7kimi has shared new details about the Blackwell series, which serves as the codename for NVIDIA’s next-generation GPU lineup. It is now anticipated that Blackwell will encompass both data-center and gaming series. However, there will be distinct naming schemes for these two GPU series, with GB1XX designated for high-performance computing (HPC) and GB2XX for gaming GPUs. Meanwhile, Panzerlied has provided some insights into what graphics enthusiasts can expect from the next-generation NVIDIA lineup. Instead of providing specific numerical values, Panzerlied is sharing percentage improvements across various aspects of the Blackwell family. NVIDIA RTX 5090 vs. RTX 4090 50% increase in scale (presumably cores) 52% increase in memory bandwidth 78% increase in cache (presumably L2 cache) 15% increase in frequency (presumably GPU boost) 1.7x improvement (presumably performance) Panzerlied later clarified in the thread that those claims are in reference to RTX 4090 specs, not AD102. If we consider that the RTX 4090 with 21 Gbps memory would see an upgrade to 32 Gbps (a 52.4% increase), this would suggest that the RTX 50 series might feature GDDR7 technology. It’s worth mentioning that the successor to the AD102 is also rumored to include a 512-bit memory bus, although it may not necessarily be used in the RTX 5090 specifically; it could be reserved for a TITAN/RTX workstation or a future 5090Ti variant. It seems unlikely NVIDIA might adopt the fastest GDDR7 memory on their card since day one, so RTX 5090 configurations such as 512-bit/24 Gbps or 448-bit/28 Gbps could also be considered. Assuming that other claims are also based on the RTX 4090 as a reference point, a 15% increase in frequency would translate to a 2.9 GHz boost clock, with actual workloads likely achieving clocks of 3.0 GHz or higher. Additionally, a 78% increase in cache suggests that the GB202 GPU would feature 128MB of L2 cache. We are more than a year away from the potential launch of the next-gen GeForce series, however just for fun, if we were to extrapolate these specifications to a potential RTX 5090 GPU, it might look something like this: -> https://videocardz.com/newz/nvidia-...lock-1-5-tb-s-bandwidth-and-128mb-of-l2-cache