Kaushik Ghosh, Product Management
Nutanix is excited to be part of the MLPerf Storage v1.0 benchmark suite, a new industry standard designed to measure storage performance for machine learning (ML) workloads. The test results highlight the Nutanix Unified Storage (NUS) solution’s ability to deliver exceptional scalability and performance, positioning it as a leader in AI/ML storage.
The MLPerf Storage v1.0 benchmark focuses on a critical question for AI/ML infrastructure: How many GPU accelerators can a storage solution support for various ML workloads? Simulating NVIDIA H100 and A100 GPU accelerators, the benchmark evaluates the maximum aggregate throughput that the storage can sustain. With more than 90% utilization of each accelerator, higher aggregate throughput translates to more supported accelerators.
Here are some standout results from the benchmark tests, showcasing how Nutanix performed across different environments:
ResNet50 Image Classification in the Cloud
For this test, we provisioned a 32-node NUS cluster with Nutanix Unified Storage (file services) running on AWS. The simulated accelerator clients ran at approximately 93% utilization, delivering:
3D-Unet Medical Image Segmentation On-Premises
Using a 7-node NUS cluster with Nutanix Unified Storage (file services) on Nutanix NX servers, the simulated accelerator clients, running between 93% and 97% utilization, achieved:
3D-Unet Medical Image Segmentation in the Cloud
With a 32-node NUS cluster with Nutanix Unified Storage (file services) running on AWS, and with the simulated accelerator clients between 91% and 95% utilization, Nutanix delivered:
The table below summarizes the key performance results.
Workload | Where | NUS Storage | Result |
---|---|---|---|
ResNet50 Image Classification | Cloud | 32-node NUS cluster (AWS EC2 with EBS) | 191,044 MiB/s, 1056 H100 accelerators 192,900 MiB/s, 2100 A100 accelerators |
3D-Unet Medical Image Segmentation | On-Premises | 7-node NUS cluster (Nutanix NX servers) | 56,067 MiB/s, 20 H100 accelerators 54,786 MiB/s, 40 A100 accelerators |
3D-Unet Medical Image Segmentation | Cloud | 32-node NUS cluster (AWS EC2 with EBS) | 262,992 MiB/s, 100 H100 accelerators 273,234 MiB/s, 195 A100 accelerators |
These tests demonstrate the linear scalability and flexibility of the Nutanix Unified Storage platform. Whether it’s an image classification or medical image segmentation workload, the benchmark results highlight the Nutanix platform's ability to easily handle demanding ML workloads. What’s more, Nutanix can scale linearly from a small cluster supporting dozens of GPU accelerators to large deployments handling thousands of accelerators and requiring no custom hardware or special client software, controlling both cost and complexity.
Notably, Nutanix is one of the few vendors to provide results for both on-premises and cloud-based workloads. This flexibility allows customers to run workloads where they need—on-premises, where the data resides or in the cloud, where they can rapidly scale thousands of accelerators. Regardless of the location, Nutanix delivers the same linear scalability and performance.
For enterprises embarking on their AI/ML journey, these results offer more than just technical insights—they demonstrate that Nutanix can efficiently power workloads while providing unmatched scalability, deployment flexibility, and operational simplicity.
While the MLPerf Storage v1.0 benchmark provides valuable insight into storage performance for AI/ML workloads, it’s essential to consider additional real-world factors. The benchmark only focuses on storage scalability and performance in terms of aggregate throughput and number of GPU accelerator clients that can be sustained for various ML workloads, but enterprises must also evaluate other parameters, such as cost, power consumption, rack space, standard NFS versus special software clients, generic versus custom hardware and whether the solution needs to be deployed on-premises, in the cloud or both.
Nutanix Unified Storage excels across all these dimensions, offering a unified platform that reduces total cost of ownership (TCO) and operational complexity, without compromising the performance required to drive AI/ML innovation.
Nutanix's performance in the MLPerf Storage v1.0 benchmark underscores its ability to deliver scalability, performance, flexibility and simplicity for AI/ML workloads. As organizations ramp up their AI initiatives, Nutanix stands ready to support them—whether they’re just starting or scaling to support thousands of GPU accelerators running demanding ML workloads. With Nutanix, you can build confidently, knowing your infrastructure will scale seamlessly and easily to meet all your AI/ML needs. To learn more, visit us at AI-ready Infrastructure Solutions for Enterprises.
©2024 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). Our decision to link to or reference an external site should not be considered an endorsement of any content on such a site. Certain information contained in this content may relate to, or be based on, studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of publication, they have not independently verified unless specifically stated, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from a third-party. Customer statements on results, benefits, savings or other outcomes depend on a variety of factors including their use case, individual requirements, and operating environments, and should not be construed to be a promise or obligation to deliver specific outcomes.