Nvidia Corp. and Google LLC have won top spots in the MLPerf Training machine learning competition, the organization that hosts the competition detailed today.
MLPerf Training is run by the MLCommons Association, an industry group that develops open-source AI tools. Participants in the competition test how quickly they can train a series of neural networks to perform various computing tasks. The goal is to complete the training process as fast as possible and in accordance with certain technical criteria set forth by the MLCommons Association.
This year’s competition consisted of eight tests. Each test involved training a different neural network using open-source training datasets specified by the MLCommons Association. Nvidia achieved the fastest performance in four of the tests, while Google won the other four.
Nvidia performed AI training using its internally developed Selene supercomputer, which is based on the company’s A100 data center graphics card. The supercomputer also incorporates Advanced Micro Devices Inc. processors. When running AI workloads, Selene can provide top performance of nearly 2.8 exaflops, with 1 exaflop being the equivalent of 1 million trillion computing operations per second.
The four MLPerf Training tests in which Selene achieved the fastest performance spanned four AI use cases: image segmentation, speech recognition, recommendation systems and reinforcement learning. The reinforcement learning test involved training a neural network to play Go.
“In the two years since our first MLPerf submission with A100, our platform has delivered 6x more performance,” Shar Narasimhan, a senior group product marketing manager at Nvidia, wrote in a blog post today. “Since the advent of MLPerf, the Nvidia AI platform has delivered 23x more performance in 3.5 years on the benchmark — the result of full-stack innovation spanning GPUs, software and at-scale improvements.”
Google, in turn, achieved the fastest performance across four MLPerf Training tests that focused on image recognition, image classification, object detection and natural language processing. The natural language processing test involved training a neural network called BERT. Originally developed by Google engineers, BERT is one of the most widely neural networks in its category and also helps power the company’s search engine.
Google carried out AI training using a cluster of TPU Pods, internally-developed hardware systems optimized for machine learning. The systems are based on the search giant’s custom Cloud TPU v4 chip. According to Google, its TPU Pod cluster provides up to 9 exaflops of maximum aggregate performance.
“Each Cloud TPU v4 Pod consists of 4096 chips connected together via an ultra-fast interconnect network,” detailed Google principal engineer Naveen Kumar and Vikram Kasivajhula, the company’s director of product management for machine learning infrastructure. “The TPU v4 chip delivers 3x the peak FLOPs per watt relative to the v3 generation.”
Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.