Podcast

Measuring the Prime Ingredient in Enterprise AI

In this Tech Barometer podcast, MLCommons Co-founder David Kanter talks about creating the MLPerf benchmark to help enterprises understand AI workload performance of various data storage technologies.

April 22, 2025

ML Perf Storage was initiated in 2024 by MLCommons to address the significant storage bottlenecks encountered during large-scale AI training. The collaborative engineering consortium wanted to explore the performance requirements of storage systems for AI training. So they created a benchmarking process to determine the efficiency of various storage types, such as direct-attached, network-attached, object, block, and file storage, in supporting AI workloads. 

“Storage is kind of like Baskin-Robbins, except there might be more than 31 flavors,” said David Kanter, co-founder and board member of MLCommons and Head of MLPerf, in an interview with The Forecast.

Podcast Measuring the Prime Ingredient in Enterprise AI
In this Tech Barometer podcast, MLCommons Co-founder David Kanter talks about creating the MLPerf benchmark to help enterprises understand AI workload performance of various data storage technologies.

April 22, 2025

In this Tech Barometer podcast, Kanter explains that object storage, edge storage, direct-attached storage, network-attached storage and many more varieties have distinct performance characteristics. When training models process trillions of data points, subtle differences between storage options can have significant effects on performance.

“The data, and where it lives, is the necessary ingredient to keep all the compute humming,” Kanter said. He explains that the MedPerf benchmark measures how well storage systems can feed data to compute systems, crucial for modern AI processors. 

Kanter talks about the quest to optimization computing resources, the evolution of storage needs, the integration of supercomputing techniques into mainstream AI applications and the broader implications of AI in scientific computing.

RELATED Importance of AI Data Storage Performance
How MLPerf Storage benchmark helps AI and ML developers compare performance of different data storage technologies.

November 22, 2024

Nutanix contributed to the MLPerf Storage project and its Nutanix Unified Storage significantly outperformed other participating vendors. 

According to the latest Enterprise Cloud Index (ECI), a survey of 1,500 IT and business decision-makers worldwide, nearly 85% of respondents already had a GenAI deployment strategy in place and nearly all cited difficulties scaling the compute-intensive technology from development to production. This is where MLCommons, an open engineering consortium, helps as an arbiter of AI performance. 

RELATED Study Shows Big Uptake of Enterprise AI and Cloud Native Technologies
As generative AI workloads and cloud native technologies proliferate, global decision-makers surveyed for the 2025 Enterprise Cloud Index cite infrastructure, security and talent issues as top deployment and scalability barriers.

February 12, 2025

AI and ML technologies are new and evolving quickly, so MLPerf performance benchmarks provide standardized measurements that can help enterprise AI systems managers choose the right technologies, best practices and strategies to meet specific needs. AI performance benchmarks can incentivize competition and drive innovation in AI technologies. MLCommons also develops benchmarks for measuring AI safety and reliability.

Transcript:

David Kanter: Data is the prime ingredient in modern machine learning. I think there was an economist headline saying data is the new oil. Of course, where does that oil live? That oil lives on storage. Storage is kind of like Baskin and Robbins, except it's possible there might be more than 31 flavors.

Jason Lopez: David Kanter is the co-founder of ML Commons. This is the Tech Barometer podcast. I'm Jason Lopez on this podcast, storage and the critical importance it plays in the age of AI. In this episode, David Cantor explains MLPerf Storage, a benchmark suite of ML Commons. This benchmark is designed to measure machine learning workloads in the context of storage systems, which hold the data used to train models. He says, to unlock the full potential of AI, storage systems need to be tailored to the process of machine learning, which is quite critical to AI training. And this is how the idea for the MLPerf Storage benchmark came about. Data fuels AI's ability to learn and make decisions. The more data you have, the more powerful your models, the greater the breakthroughs. This is at the heart of the AI scaling laws developed by Greg Diamos, a co-founder of ML Commons. His work on AI scaling laws, neural network optimization, and GPU acceleration has had a big influence on the field of AI development.

RELATED Get a Grip on Data Storage in Quest for Enterprise AI
In this video interview with The Forecast, Simon Robinson, principal analyst at Enterprise Strategy Group, discusses the complexities of managing data in cloud and hybrid multicloud environments, a challenge that is growing more acute with the rise of enterprise AI applications and data.

April 2, 2025

David Kanter: The lesson of the scaling laws is that if you get enough data and you get a big enough model and enough compute to combine those together, that's when you really get these qualitatively different outcomes, whether it's a self-driving car or the ability to recognize an image better than a human. Data is really the top priority here.

Jason Lopez: But it's not just about having data.

David Kanter: It's how you process the data, how you manipulate the data. And there's a ton of work that really shows that data is a first-class citizen. And of course, data needs a place to live, and that's storage. You're going to take batches of data and feed it into your compute system. And then you're going to compute a forward pass and see how good the model is at predicting on that batch of data, compute the errors, and then adjust the model so it gets better, and then over time, the model will hopefully and often converge to an answer.

Jason Lopez: After enough iterations, the model becomes good at recognizing whatever you're training it. It has converged to an optimal solution, but it can take a lot of data, which means a lot of storage. And to take it one step further, the faster and more efficient that data storage, the better for training, inference, and tuning models. In MLPerf's first storage evaluations, the Nutanix Unified Storage Platform was a benchmark leader, and Cantor commented on these benchmark tools.

David Kanter: We built up a great set of infrastructure using some tooling from Argonne National Lab to help us measure sort of that data loading for AI. And critically, we can do it without having to have accelerators. You can actually run our benchmark and ask the question, hey, what would it take to feed 4,000 accelerators without having to shell out for 4,000 accelerators?

Jason Lopez: That's how artificial intelligence is changing the way we process and analyze information. Making a model to see how something works is nothing new, but AI is raising the bar dramatically. It does this by requiring massive amounts of data to train the system.

David Kanter: I think the really critical thing is the size and type of data. We've got three different workloads in the benchmark, 3D images, and 2D images, and a scientific workload. So if we're doing image recognition for, say, smaller images, you know, each image might be 100 kilobytes or so. When you want to feed your compute system, you're going to be pulling in a batch of images, so your storage system has to be able to keep up. If you have really big images, then each image fetch is going to be a lot of data all in one fell swoop. But if you have smaller data, like let's talk about large language models, right, you're going to be working on a lot of text. That's much smaller data, and it's going to have actually a pretty different impact on the storage system.

Jason Lopez: Which leads to this insight. Image size is an issue, but it's not just about images.

David Kanter: It's actually about how many data samples, you know, whether it's a 3D volume, whether it's a 2D image, whether it's a sentence, you know, whatever it is, how many of these samples are you getting? And critically, how many accelerators can you keep busy with a given storage system? If you've got a system that has, you know, 65 of these generation of accelerators, you know, five nodes of Nutanix may be right for you. Or, you know, maybe you're looking to build a cluster that's a bit more expandable, and maybe you should get 10. But that's ultimately what the benchmark is telling us.

Jason Lopez: What the benchmark reveals helps users to understand how their system choices work together. That's the big overview. But when you zoom in, the benchmarks evaluate a range of things, such as the safety of chatbot-gen AI systems measured in the AI Luminate benchmark, or the performance of large language models and other AI workloads on PCs in the MLPerf client benchmark.

David Kanter: An initiative that we've started recently at ML Commons is trying to turn our expertise in AI measurement to how can we make sure that these AI systems are going to be responsible and doing the right thing for us as we intend.

Jason Lopez: One intention, which gets a lot of headlines around AI, is safety. But another intention is building systems which don't break the bank and don't gobble up inordinate amounts of power. The owners of data centers benefit from the MLPerf storage benchmark with insights into how to manage costs or how to build data centers for optimum performance. Today, companies are rethinking how they build data centers.

David Kanter: As we're shifting into the AI era, a lot of these systems use so much power that we may have to double the footprint of data centers. Obviously, that's a big deal. And so we're seeing people looking for new sources of energy and thinking more about where data centers are with energy in mind.

Jason Lopez: Older data centers often lack the electric capacity and cooling infrastructure needed to support hardware like GPUs, as well as train large-scale models, which use far more power than traditional computing. The next generation of data centers needs to be powerful and efficient.

David Kanter: Some of those older generation of data centers that we built in the 90s and 2000s just don't work for AI. I think part of what we're seeing is we need new data centers that can do AI. And that's, you know, stressing the whole system. But the great news is they're actually way more efficient than what we had before. And the systems we build are more efficient. Every day we're discovering new things.

Jason Lopez: Kanter says you can't optimize what you don't measure, in this case, energy use. Training a single large AI model can produce as much carbon as five cars over its lifetime. Tools like MLPerf Storage measures AI performance per watt, helping developers choose more efficient hardware and software.

David Kanter: From a sustainability standpoint, we focused on measuring energy usage because you can measure it. We want to measure the power consumption of inference and training systems. I'm thrilled that inference we delivered a couple of years ago and then training, which is a bit more complicated because they're bigger systems. We were able to get the first power measurements for AI training systems late last year. And we've got hopefully more coming soon with MLPerf training. The goal of my organization in many ways is how can we measure things in the AI world and help to make AI better for everyone. And where better means faster, more capable, more energy efficient and safer.

Jason Lopez: David Kanter is the co-founder of MLCommons, the organization which developed the benchmark suite MLPerf Storage. Some of the key players in this organization we've interviewed for Tech Barometer previously, Debo Dutta, chief AI officer at Nutanix, Greg Diamos, one of the key developers behind generative AI apps and discoverer of scaling laws. And Alex Karagris, the co-founder and co-chair for the MLCommons Medical Working Group. Look for these podcasts and print stories at our Forecast news page, theforkastbynutanix.com. That's all one word, theforecastbynutanix.com. Tech Barometer is a production of The Forecast. I'm Jason Lopez. Thanks for listening.

Jason Lopez is executive producer of Tech Barometer, the podcast outlet for The Forecast. He’s the founder of Connected Social Media. Previously, he was executive producer at PodTech and a reporter at NPR.

Ken Kaplan contributed to this podcast. 

© 2025 Nutanix, Inc. All rights reserved. For additional information and important legal disclaimers, please go here.

Related Articles