What Is MLPerf? A Deep Dive into AI Benchmark Testing

AI testing based on real-world environments continues to evolve, prompting Intel’s technical experts to adapt to industry demands, consistently enhance product performance, and foster a more user-friendly development environment, thereby promoting the widespread adoption of AI technologies.

Many may have heard of MLPerf, but not everyone fully understands this AI benchmarking tool. Although a clear definition of the term itself is hard to find, with the help of an AI assistant, we found a reliable explanation: MLPerf first appeared in May 2018 and was likened to the “SPEC for ML.” The AI assistant further explained: “‘MLPerf’ is a compound word formed from ‘ML’ (representing machine learning) and ‘Perf’ (representing performance).”

It continued: “Although there’s no official, detailed explanation about the naming process, the name itself is intuitive and was likely chosen because it directly reflects the purpose of the benchmark.” (This explanation also aligns with what people expect from AI-generated answers—built and continuously improved by researchers.)

Actual results support this point: just last week, Intel was the only vendor to consistently submit server CPU test results to MLPerf. The submitted results included common AI tasks such as image detection and information analysis using Intel® Xeon® 6 processors.

Organizations and Processes Behind Accelerating AI Development

Ramesh Chukka, from the Software Division of Intel’s Data Center and AI Group, stated: “MLPerf is currently the top benchmark in the AI field.”

Chukka represents Intel on the MLCommons board, a consortium formed at the end of 2020 with the goal of expanding the original MLPerf effort to “advance the development of state-of-the-art AI and machine learning datasets, models, best practices, benchmarks, and metrics, and make them easier to use.”

According to Chukka, MLPerf can refer to all benchmarking tests, which are “rapidly evolving like technology itself,” thereby pushing forward development in the field through the “fast prototyping of new AI technologies.” Each benchmark measures how quickly a specific AI task can be completed under a certain quality standard.

These benchmarks fall into two main categories: training, which involves building AI models using data, and inference, which involves running those models like applications. Using large language models (LLMs) as an analogy: training is when an LLM learns from massive datasets, and inference is when it performs tasks each time you use it.

MLCommons releases two sets of benchmark results annually, one for training and one for inference. Intel’s most recent training results were released in June of last year, while the latest inference results were just released this month.

From the inception of MLPerf to the formation of MLCommons, Intel’s AI experts have been actively involved, contributing benchmark results. Intel participates in two ways: helping shape and advance the overall project, and compiling and submitting benchmark results using Intel processors, accelerators, and solutions.

Problems Addressed by MLPerf Benchmarks

AI models are complex programs, and now a growing variety of computers can run them. MLPerf benchmarks not only allow for better comparisons between different types of machines but also encourage researchers and companies to further explore cutting-edge technologies.

Each benchmark aims to closely mimic real-world application scenarios, and the results are divided into two categories. The “closed” category strictly controls the AI models and software stack to allow precise hardware comparisons, meaning that the same program is used to achieve the same result across different systems, such as testing the accuracy of natural language processing.

The “open” category includes room for innovation, allowing each system to push performance boundaries as much as possible while achieving the same goal.

It is worth noting that MLPerf is fully open-source and all contents are shared. The benchmark results must be reproducible with no hidden information. This open-source nature enables manufacturers to conduct more comprehensive comparisons—not just speed tests. For example, vendors can also compare performance per watt or cost efficiency.

MLPerf’s Operation and Evolution

As Chukka mentioned, MLPerf is widely recognized in the industry partly because it continues to evolve and add new benchmarks. This evolution is driven by open discussions and debates within the MLCommons community, which includes participation from major corporations, startups, and academia.

First, new benchmarks are proposed and debated. Approved benchmarks then require a public dataset for training—this dataset may already exist or need to be created. Next, participants voluntarily form teams to build the benchmarks, define or gather data, and set timelines for release.

Finally, any company wishing to publish results must submit them before the deadline. If the deadline is missed, they must wait for the next cycle to begin again.

Faster, More Efficient AI Shapes the Future of the World

As more people tackle various challenges with semiconductor technology, Intel experiences a significant positive impact on a macro level. However, Intel’s involvement in MLPerf benchmarks carries deeper significance.

Intel has consistently contributed to open-source AI frameworks like PyTorch and its extensions. When Intel engineers work to optimize code for better MLPerf performance, users deploying related AI applications on Intel chips automatically benefit from these technological improvements without needing to take any extra steps.

Chukka stated, “For new benchmarks, we’re always exploring viable optimizations and actively preparing for future submissions.”

To achieve better test results, Chukka’s team unites efforts across the company and has delivered impressive performance improvements across multiple test rounds. For example, in the 2024 benchmark results, inference performance for recommendation systems improved by 80%, and this month’s results showed a 22% performance increase in the GPT-J benchmark.

Thus, each time Intel releases a new round of MLPerf results, it often signifies that overall AI systems have become faster and more efficient. Even today’s popular large models can respond more swiftly and intelligently to meet evolving user demands.

Note: Performance varies depending on use, configuration, and other factors.

Related:

Benchmark Scores or Real Use: Which Reflects Performance?

Disclaimer:

This channel does not make any representations or warranties regarding the availability, accuracy, timeliness, effectiveness, or completeness of any information posted. It hereby disclaims any liability or consequences arising from the use of the information.
This channel is non-commercial and non-profit. The re-posted content does not signify endorsement of its views or responsibility for its authenticity. It does not intend to constitute any other guidance. This channel is not liable for any inaccuracies or errors in the re-posted or published information, directly or indirectly.
Some data, materials, text, images, etc., used in this channel are sourced from the internet, and all reposts are duly credited to their sources. If you discover any work that infringes on your intellectual property rights or personal legal interests, please contact us, and we will promptly modify or remove it.

Sign up for Newsletter

Organizations and Processes Behind Accelerating AI Development

Problems Addressed by MLPerf Benchmarks

MLPerf’s Operation and Evolution

Faster, More Efficient AI Shapes the Future of the World

Leo

Leave a Reply

Product

Solution

Guides

Quote

Sign up for Newsletter

Tech Talk

What Is MLPerf? A Deep Dive into AI Benchmark Testing

Organizations and Processes Behind Accelerating AI Development

Problems Addressed by MLPerf Benchmarks

MLPerf’s Operation and Evolution

Faster, More Efficient AI Shapes the Future of the World

Leo

Leave a Reply

Product

Solution

Guides

Quote