Nutanix Shines in the New MLPerf Storage v1.0 Benchmark: Scalability, Performance, and the Flexibility of a Software-Defined Solution

Kaushik Ghosh, Product Management

December 17, 2024 7:00 am |

min

Nutanix is excited to be part of the MLPerf Storage v1.0 benchmark suite, a new industry standard designed to measure storage performance for machine learning (ML) workloads. The test results highlight the Nutanix Unified Storage (NUS) solution’s ability to deliver exceptional scalability and performance, positioning it as a leader in AI/ML storage.

The MLPerf Storage v1.0 benchmark focuses on a critical question for AI/ML infrastructure: How many GPU accelerators can a storage solution support for various ML workloads? Simulating NVIDIA H100 and A100 GPU accelerators, the benchmark evaluates the maximum aggregate throughput that the storage can sustain. With more than 90% utilization of each accelerator, higher aggregate throughput translates to more supported accelerators.

Breaking Down the Results

Here are some standout results from the benchmark tests, showcasing how Nutanix performed across different environments:

ResNet50 Image Classification in the Cloud

For this test, we provisioned a 32-node NUS cluster with Nutanix Unified Storage (file services) running on AWS. The simulated accelerator clients ran at approximately 93% utilization, delivering:

191,044 MiB/s aggregate throughput driven by 1056 H100 accelerators
192,900 MiB/s aggregate throughput driven by 2100 A100 accelerators

3D-Unet Medical Image Segmentation On-Premises

Using a 7-node NUS cluster with Nutanix Unified Storage (file services) on Nutanix NX servers, the simulated accelerator clients, running between 93% and 97% utilization, achieved:

56,067 MiB/s aggregate throughput driven by 20 H100 accelerators
54,786 MiB/s aggregate throughput driven by 40 A100 accelerators

3D-Unet Medical Image Segmentation in the Cloud

With a 32-node NUS cluster with Nutanix Unified Storage (file services) running on AWS, and with the simulated accelerator clients between 91% and 95% utilization, Nutanix delivered:

262,992 MiB/s aggregate throughput driven by 100 H100 accelerators
273,234 MiB/s aggregate throughput driven by 195 A100 accelerators

The table below summarizes the key performance results.

Workload	Where	NUS Storage	Result
ResNet50 Image Classification	Cloud	32-node NUS cluster (AWS EC2 with EBS)	191,044 MiB/s, 1056 H100 accelerators 192,900 MiB/s, 2100 A100 accelerators
3D-Unet Medical Image Segmentation	On-Premises	7-node NUS cluster (Nutanix NX servers)	56,067 MiB/s, 20 H100 accelerators 54,786 MiB/s, 40 A100 accelerators
3D-Unet Medical Image Segmentation	Cloud	32-node NUS cluster (AWS EC2 with EBS)	262,992 MiB/s, 100 H100 accelerators 273,234 MiB/s, 195 A100 accelerators

Key Takeaways: Performance and Flexibility

These tests demonstrate the linear scalability and flexibility of the Nutanix Unified Storage platform. Whether it’s an image classification or medical image segmentation workload, the benchmark results highlight the Nutanix platform's ability to easily handle demanding ML workloads. What’s more, Nutanix can scale linearly from a small cluster supporting dozens of GPU accelerators to large deployments handling thousands of accelerators and requiring no custom hardware or special client software, controlling both cost and complexity.

Notably, Nutanix is one of the few vendors to provide results for both on-premises and cloud-based workloads. This flexibility allows customers to run workloads where they need—on-premises, where the data resides or in the cloud, where they can rapidly scale thousands of accelerators. Regardless of the location, Nutanix delivers the same linear scalability and performance.

What This Means for Nutanix Customers

For enterprises embarking on their AI/ML journey, these results offer more than just technical insights—they demonstrate that Nutanix can efficiently power workloads while providing unmatched scalability, deployment flexibility, and operational simplicity.

Unmatched Scalability: Nutanix's ability to support thousands of accelerators and deliver high throughput across environments shows its capacity to grow with your AI/ML needs. Start with a few nodes and scale up as your demands increase.
Flexibility in Deployment: The Nutanix software-defined architecture allows customers to deploy on standard infrastructure on-premises or in the public cloud. With support for a wide range of servers from Cisco, Dell, HPE, Lenovo, SuperMicro, and cloud providers like AWS and Azure, customers can select the best model without compromising on performance.
Simplified Management: Nutanix Unified Storage offers more than just raw performance—it provides operational simplicity and cost efficiency. With a software-defined architecture, a flexible licensing model, no special software clients or custom hardware and advanced data services like ransomware protection and global data management, Nutanix reduces the complexity, costs and risks involved in managing large-scale AI/ML infrastructure.

Addressing the Nuances of the Benchmark

While the MLPerf Storage v1.0 benchmark provides valuable insight into storage performance for AI/ML workloads, it’s essential to consider additional real-world factors. The benchmark only focuses on storage scalability and performance in terms of aggregate throughput and number of GPU accelerator clients that can be sustained for various ML workloads, but enterprises must also evaluate other parameters, such as cost, power consumption, rack space, standard NFS versus special software clients, generic versus custom hardware and whether the solution needs to be deployed on-premises, in the cloud or both.

Nutanix Unified Storage excels across all these dimensions, offering a unified platform that reduces total cost of ownership (TCO) and operational complexity, without compromising the performance required to drive AI/ML innovation.

Final Thoughts

Nutanix's performance in the MLPerf Storage v1.0 benchmark underscores its ability to deliver scalability, performance, flexibility and simplicity for AI/ML workloads. As organizations ramp up their AI initiatives, Nutanix stands ready to support them—whether they’re just starting or scaling to support thousands of GPU accelerators running demanding ML workloads. With Nutanix, you can build confidently, knowing your infrastructure will scale seamlessly and easily to meet all your AI/ML needs. To learn more, visit us at AI-ready Infrastructure Solutions for Enterprises.

©2024 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). Our decision to link to or reference an external site should not be considered an endorsement of any content on such a site. Certain information contained in this content may relate to, or be based on, studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of publication, they have not independently verified unless specifically stated, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from a third-party. Customer statements on results, benefits, savings or other outcomes depend on a variety of factors including their use case, individual requirements, and operating environments, and should not be construed to be a promise or obligation to deliver specific outcomes.