A new CM workflow for MLPerf helps to benchmark commodity hardware for AI performance, power and cost efficiency
Illustration by Chloé Tessier

A new CM workflow for MLPerf helps to benchmark commodity hardware for AI performance, power and cost efficiency

While MLPerf benchmarks are traditionally used to demonstrate the latest and most powerful hardware for AI, we see a growing interest towards benchmarking and comparing a wide variety of commodity servers, laptops, embedded boards and cloud instances in terms of performance, power and cost efficiency.

During the MLPerf inference v4.0 round, cKnowledge.org and cTuning.org have developed a new version of a modular workflow using the open-source CM automation framework to run MLPerf benchmarks across commodity models, software and hardware from the cloud to the edge. 

CM workflow for MLPerf is composed from technology-agnostic automation recipes (CM scripts) with a unified and human-friendly command line, a simple GUI and extensible Python, C++ and network implementation templates. The modular structure of CM workflow for MLPerf makes it easier to plug optimized implementations of MLPerf inference benchmarks from different vendors while testing, customizing and running them on different systems, and submitting performance and power results in a unified and automated way.

Thanks to a very fruitful collaboration with MLCommons members within the MLCommons Task Force on Automation and Reproducibility during the past 3 months, we’ve managed to add, for the first time, all edge and datacenter MLPerf models to CM workflow. We also provided preliminary support for different MLPerf inference implementations from Nvidia, Intel, Qualcomm, Neural Magic, MLCommons and cTuning with PyTorch, TensorFlow, ONNX, QAIC, TFLite and TensorRT.

We have successfully validated the new version of the CM workflow for MLPerf in the MLPerf inference v4.0 submission round by automating ~90% of all performance and power results via cTuning and achieving several top performance and latency results on commodity hardware as discussed later in this article.

CM workflow also helped us to submit the first open result for the new Llama 2 MLPerf benchmark using a commodity Nvidia-based server with a smaller Llama 2 model with 7B parameters (downloaded from the Hugging Face hub using a CM automation recipe) instead of the official 70B MLPerf model.

We are also very proud to benchmark Qualcomm Cloud AI 100 systems in the cloud for the first time using MLCommons CM and thank Qualcomm for their support. We also thank colleagues from Intel, Nvidia and Google for their feedback and suggestions.

We also demonstrate for the first time that our portable and technology-agnostic CM workflow for MLPerf can automatically benchmark network MLPerf implementations (using our reference Python code available to everyone from the MLCommons GitHub) and even commodity laptops with Windows 11 such as Lenovo P14s Gen3.

Below we present a few highlights of the cTuning's MLPerf inference v4.0 results that you can explore at the MLCommons website:

cTuning submission using Nvidia RTX 4090 is in second place in the performance per accelerator in the edge category behind QAIC AI 100 Ultra:

We achieved the best power efficiency result in the edge category for ResNet50 on Thundercomm RB6 with QAIC AI 100 DM.2e:

Among the CPU only submissions, cTuning submission got the best performance per core for bert-99 in the datacenter category using Intel implementation:

Comparison of 2xRTX 4090 vs 2xL40S:

Best latency in Edge category: RTX 4090 vs L4

We also tested the improved MLCommons C++ implementation of the MLPerf image classification benchmark with ResNet50 FP32 model that achieved the highest performance on a commodity server with 2 Nvidia RTX 4090:

The next steps

Collective Mind (MLCommons CM) is a open-source collection of reusable, extensible and technology-agnostic automation recipes with a human-friendly interface and minimal dependencies to make it easier to compose, benchmark and optimize AI, ML and other applications and systems across diverse and continuously changing models, data sets, software and hardware (cloud/edge).

We have released the new version of MLCommons CM along with our MLPerf inference v4.0 results and we are looking for volunteers and collaborators to continue testing and improving CM workflow for MLPerf across diverse implementations, models, data sets, software and hardware from different vendors - please stay tuned by following the CM GitHub, checking the MLCommons Task Force on Automation and Reproducibility and joining the public Discord server.

We also plan to continue improving CM GUI for MLPerf, CK playground and extensible Python, C++ and network implementations to assist new submitters and make it easier to add new models and hardware to MLPerf.

Finally, we are preparing a new project to automatically co-design high-performance and cost-effective AI systems using MLPerf and CM based on user requirements and constraints (cost, accuarcy, throughput, latency, power consumption, size, reliability, etc) - please get in touch with Grigori Fursin for more details.

Acknowledgments

We thank all our great colleagues for their feedback and discussions that help us improve the MLCommons Collective Mind framework to benefit everyone!


Grigori Fursin Very insightful. Thank you for sharing

Michael Goin

Engineering Lead at Neural Magic | vLLM Committer

3mo

Excellent effort adding new benchmarks, while improving performance between rounds and maintaining the great usability of CM-MLPerf. Congratulations!

Grigori Fursin

Fixing AI software & hardware mess | Creator of the CM automation @MLCommons | Open science advocate @cTuning | founder @cKnowledge | ex VP MLOps @OctoAI | ex co-director of the Intel Lab | ex senior scientist @INRIA

4mo

Please follow our project on GitHub: https://github.com/mlcommons/ck

To view or add a comment, sign in

Explore topics