AI

Nvidia says its Blackwell chips lead benchmarks in training AI LLMs

NVIDIA is rolling out its AI chips into data centers and what the AI ​​factories all over the world calls, and the company announced Nowadays the Blackwell -chips lead the AI ​​benchmarks.

Nvidia and its partners accelerate the training and use of AI applications of the next generation that use the latest progress in training and inference.

The NVIDA Blackwell architecture is built to meet the increased performance requirements of these new applications. In the newest Tour of MLPERF training-De 12th since the introduction of the benchmark in 2018, the NVIDIA AI platform The highest performance on scale on every benchmark and every result was submitted on the heaviest large language model of the Benchmark (LLM)–oriented test.

Nvidia praised his performance at MLPERF training benchmarks.

The NVIDIA platform was the only thing that set results on every MLPERF -training V5.0 -Benchmark -Underless of its exceptional performance and versatility over a wide range of AI workload, voltage LLMS, recommendation systems, multimodal LLMs, object detection and network.

The submissions on the scales used two AI-Supercomputers powered by the NVIDIA Blackwell platform: Tyche, built using NVIDIA GB200 NVL72 rack scale systems, and Nyx, based on NVIDIA DGX B200 systems. In addition, Nvidia worked with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 Nvidia Grace CPUs.

On the new Lama 3.1 405B Voor training Benchmark, Blackwell delivered 2.2 times larger performance compared to the architecture of the previous generation on the same scale.

Nvidia Blackwell drives AI factories.

At LLAMA 2 70B Lora Fine-Tuning Benchmark, Nvidia DGX B200 systems, driven by eight Blackwell GPUs, delivered 2.5 times more performance compared to an entry with the same number of GPUs in the previous round.

These achievements emphasize progress in the Blackwell architecture, including liquid-cooled racks with high density, 13.4 TB coherent memory per rack, fifth generation nvidia nvlink and nvidia nvlink Switch Interconnect technologies for scale and nvidia quantum-outumen,Tirenige-1,OndiTentenCentenCentenCenten-Ten-Ten-Ten-Ten-Ten-Ten-Ten-Ten-Ten-Ten-Ten-Ten-Ten-Ten-Ten-Ten-TIJNIbandenTenCandenCentenCenten-TnotenCandenCandenCandenCandenCandenCandenCandenCandenCandenCandenCandenCandenCenten-Tnotenkenking forking foring In addition, innovations in the NVIDIA NEMO Framework Software Stack increase the bar for the next generation of multimodal LLM training, crucial for marketing agent AI applications.

See also  LLMs Are Not Reasoning—They’re Just Really Good at Planning

These agent AI-driven applications will be carried out in AI factories on one day-the engines of the agentic AI economy. These new applications will produce tokens and valuable intelligence that can be applied to almost any industry and academic domain.

The NVIDIA data center platform includes GPUs, CPUs, high-speed fabrics and networks, as well as a wide range of software such as Nvidia Cuda-X libraries, the Nemo Framework, Nvidia Tensorrt-LLM and Nvidia Dynamo. This highly coordinated ensemble of hardware and software technologies enables organizations to train and implement models faster, so that the time is dramatically accelerated to appreciate.

Blackwell is conveniently hitting his predecessor Hopper in training.
Blackwell is conveniently hitting his predecessor Hopper in AI training.

The Nvidia Partner Ecosystem took extensively participated in this MLPERF round. In addition to the entry with CoreWeave and IBM, other compelling entries from Asus, Cisco, Giga Computing, Lambda, Lenovo Quanta Cloud Technology and Supermicro were.

First MLPERF training entry using GB200 were developed by MLCommons Association with more than 125 members and affiliated companies. The time-to-train metric ensures that training process produces a model that meets the required accuracy. And the standardized benchmark run rules ensure performance comparisons with apples-to-apples. The results are peer-reviewed before publication.

The basic principles about training benchmarks

Nvidia’s gets a great scale on his latest AI processors.

Dave Salvator is someone I knew when he was part of the technical press. Now he is director of Accelerated Computing Products in the Accelerated Computing Group at Nvidia. In a press conference, Salvator noted that Nvidia CEO Jensen Huang is talking about this idea of ​​the types of scale laws for AI. They include pre -training, where you actually teach the AI ​​model knowledge. That starts with zero. It is a heavy computational lift that is AI’s backbone, Salvator said.

From there, Nvidia switches to Scharing after training. This is where models go to school a bit, and this is a place where you can do things such as fine tuning, for example, where you bring in a different data set to teach a pre -trained model that has been trained to a certain extent, to give the extra domain knowledge of your specific data set.

See also  OpenAI now reveals more of its o3-mini model's thought process
NVIDIA has continued from only chips to building AI infrastructure.

And then finally, there is time testing or reasoning, or sometimes mentioned for a long time. The other term that this runs is agentic AI. It is AI who can actually think and solve reasons and problem, where you in principle ask a question and get a relatively simple answer. Testing time scaling and reasoning can actually work on much more complicated tasks and deliver a rich analysis.

And then there is also generative AI that can generate content on the basis of a necessary basis that may contain translations of text overview, but then also visual content and even audio content. There are many types of dishes that take place in the AI ​​world. For the benchmarks, Nvidia focused on the results for training and post training.

“That is where AI starts what we call the investment phase of AI. And then when you start these models in the beginning and use those models and then generate those tokens, you start getting your return on your investment in AI,” he said.

The MLperf -Benchmark is in its 12th round and dates from 2018. The Support consortium has more than 125 members and it has been used for both inference and training tests. The industry regards the benchmarks as robust.

“As I know for sure that many of you know, sometimes performance claims in the world of AI can be a bit of the wild west. MLPERF tries to bring an order to that chaos,” Salvator said. “Everyone has to do the same amount of work. Everyone is held at the same standard in terms of convergence. And once the results have been submitted, those results are then assessed and examined by all other petitioners, and people can ask questions and even challenge results.”

The most intuitive statistics on training is how long it takes to train an AI model that is trained in what convergence is called. That means that it goes well a certain level of accuracy. It is a comparison of Appels-to-Apples, Salvator said, and it takes into account constantly changing workload.

See also  The US is reviewing Benchmark’s investment into Chinese AI startup Manus 

This year there is a new LLAMA 3,140 5B workload, which replaces the Chatgpt 170 5B workload that was previously in the benchmark. In the benchmarks, Salvator noted that Nvidia had a number of records. The NVIDIA GB200 NVL72 AI factories are fresh from the manufacturing factories. From one generation of chips (hopper) to the next (Blackwell), Nvidia saw a 2.5 times improvement for results of image generation.

“We are still quite early in the life cycle of Blackwell product, so we fully expect that they will get more performance from the Blackwell architecture over time, while we continue to refine our software optimizations and as new, frankly heavier workloads come on the market,” Salvator said.

He noted that Nvidia was the only company that had submitted submissions for all benchmarks.

“The great achievements we achieve are due to a combination of things. It is our fifth gene Nvlink and NVSWitch that deliver up to 2.66 times more performance, together with other only general architectural goodness in Blackwell, along with only our current software optimizations that make those performance possible,” Salvator said.

Hij voegde eraan toe: “Vanwege het erfgoed van Nvidia staan ​​​​we de langste tijd bekend als die GPU -jongens. We maken zeker geweldige GPU’s, maar we zijn weggegaan van gewoon een chipbedrijf zijn om niet alleen een systeembedrijf te zijn met dingen zoals onze DGX -servers, om hele datcenters te bouwen, wat nu wordt ingebouwd, wat nu wordt gebouwd, wat nu wordt gebouwd, wat nu wordt gebouwd, wat nu is om hele datacenters te Building, which is now being built in, which is now being built in, which is now being built in, which is now being built in, which is now being built in, which is now being built, which we now call AI factories.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button