StorageMojo has been writing about latency and flash arrays for years (see The SSD write cliff in real life), with a focus on data from the TPC-C and SPC-1 benchmarks. The folks at Violin Memory asked me to create a Video White Paper to discuss the problem in a bite-size chunk.
Latency is the long pole
Steven Swanson’s and Adrian M. Caulfield’s work at the University of California San Diego found that with a 4Kbyte disk access, the standard Linux software stack accounted for just 0.3% of the latency and 0.4% of the energy consumption. With flash however, the same software stack accounted for 70% of the latency and 87.7% of the energy consumed.
Clearly, the software stack issue belongs to no single company. But array vendors can help by reducing the latency inside their products.
That’s why open and documented benchmarks are important. It is too bad that the erstwhile industry leader, EMC, doesn’t offer either benchmark, unlike other major vendors.
The StorageMojo take
The Violin engineering team has done an admirable job of reducing their array’s latency, as measured in TPC-C benchmarks. Not merely the average latency – which almost any flash array can keep under 1ms – but maximum latency as well, at 5ms or less.
Compare that to an early 2015 filing by a major storage company for their flagship flash storage array. The Executive Summary shows that the array’s average latency was under 1 second.
Impressive and reassuring. However, in the Response Time Frequency Distribution Data we see what the average response times don’t tell: millions of I/Os took over 5 seconds ms and hundreds took over 30 seconds ms – and perhaps much longer, since the SPC-1 report groups them all in one “over 30 second ms” bucket.
The basic insight of Statistical Process Control, that reduced component variability improves system quality, applies to computer systems as well. Reduced maximum latency and sustained IOPS are key metrics for improving system performance and availability.
Courteous comments welcome, of course. Violin paid StorageMojo to produce the video, however the opinions expressed are my own.
Near the end, there, you probably meant “1 millisecond”, “5 milliseconds”, “30 milliseconds”, but wrote “seconds” every time. Unless the Major Storage Company is using some really slow technology 🙂
Bryan, sad to say, but I meant seconds. Update: Boy, is my face red. Mr. Pendleton is correct and I was wrong! I’ve corrected the post to say ms instead of seconds. I think I needed the break I took over the last few weeks . . . . End update. These long-tail I/Os on some systems are unbelievably long.
You are a programmer. How do you handle such long delays in your code?
Robin
Helpful video. Do you think VMEM can grow its customer account and revenue? Its recent quarter didn’t show much progress on the customer growth side.
Thanks