EMC’s missing petabytes: the cost of short stroking

by Robin Harris on Tuesday, 10 February, 2015

A couple of weeks ago StorageMojo learned that a VMAX 20k could support up to 2400 3TB drives, it can only address ≈2PB. Where did the remaining 5 petabytes go?

Some theories were advanced in the comments, and I spoke to other people about the mystery. No one would speak on the record, but here’s the gist of the received wisdom.

Different strokes for different folks
Short stroking is the short and best answer. Short stroking uses the outermost tracks – the fastest, densest, and capacious tracks – to punch up drive performance.

By reducing head shift time and maximizing data transfer rates a short stroked drive gets more IOPS and faster transfers. Wonderful!

But at what cost? The 20k’s numbers serve as a first approximation.

Assuming a max’d out VMAX 20k, but using 80 SSDs, that leaves 2,320 3TB 3.5″ drives, for a raw capacity of 6,960TB. Assuming 8 drive RAID 6 LUNs we get a dual-parity protected capacity of 5,220TB. Taking EMC’s spec of an open system RAID 6 capacity of 2,067TB and dividing that by 5220 gives us 39.6% capacity efficiency, which would use roughly the outer 0.8″ of a 3.5″ platter. That would certainly improve IOPS and transfer rate.

Research indicates that the 2,320 disks are roughly half of the total BOM cost. Thus, if you pay $1.4m (not including software) for a fully loaded 20k, $700k goes for the raw 7PB. Since you only get 2PB usable, you are paying ≈$350k – depending on your discount, of course – per short stroked PB of capacity.

The StorageMojo take
We already knew traditional legacy arrays were expensive. What’s really interesting is that even with short stroking, 15k disks would be hard-pressed to do more than 600 IOPS each, or, generously, 1.4m IOPS. EMC promises “millions of IOPS” from the 20k, so even with short-stroking, it’s likely that much of the system’s total performance comes from its caching and SSDs rather than the costly short stroked disks.

Before you buy your next VMAX or other legacy architecture disk array, take a hard look at the cost of short stroked disks. You can do much better, with less complexity, with an all-flash solution at an equal or lower cost. Not to mention the lower OpEx from reduced floor space, power, cooling and maintenance.

Courteous comments welcome, of course. EMC’ers and others are welcome to offer their perspectives to this analysis. Update: Note that the missing petabytes come with using 7200RPM drives, not 15k drives. End update.

{ 6 comments… read them below or add one }

John_M February 11, 2015 at 12:52 am

This is about marketing, EMC VMX20K has a hard addressable limit of 2PB but they can claim to support many more disks than their competitors and of course bigger is always better in the world of marketing. Never mind the fact that you’ll run out of both engine horsepower and addressable memory way before that limit.

Andy February 11, 2015 at 10:27 am

Should those be PB’s not TB’s in the para beginning ‘Research indicates…’ ? Otherwise, these TB’s are mighty expensive!

Robin Harris February 11, 2015 at 12:39 pm

Oops! Andy, thanks for the catch!

Robin

Rob February 12, 2015 at 9:20 am

I would say some sort of performance consideration. But they have tiering, so why wouldn’t the cooler disk migrate to the inner tracks? Your hottest blocks are in the SSD tier. It could be something as simple as the added depth of the B-trees to track ALL the tracks would intro either a performance issue to trundle through OR architecturally hits a limit. Yes.. pure speculation but I would have no idea why cooler tracks couldn’t be aged off to the less performing tracks on the drives and make use of them. Similar to what John_M mentions above.

matt February 18, 2015 at 1:56 pm

3.5″ disks lose ~1″ to the spindle, bearings and media clamp area. The proper way to calculate how much data is storeable in the various stripes is as follows (where we divide the linear sweep distance of the head into thirds) and diameter of platters is ~3.25 with 1″ spindle:

Outer 1/3 = GB * 15/29
Mid = GB * 10/29
Inner = GB * 4/29

So for a 2TB drive 2000*15/29=1034GB. So the first 1TB will land in the outer third of the drive. The next 690GB will occupy the mid-tracks, and the last 275GB will occupy the innermost region.

Short-stroking (using outer 1/3) gains you a 70-85% increase over full-stroke IOPs. For a 15K drive 167 IOPs becomes 300. A 7.5K drive 74 becomes 140. At using 1/2 of the platter 15K 167->250, 7.5K 74->114.

Now that 4TB drives have a reasonably proven track record I prefer to buy them and then just use up to 1/2 of the space in my LUNs. Unless of course I don’t mind them being really slow.

Mike February 20, 2015 at 9:01 am

“Different strokes for different folks,” is a great summation. I generally see folks buying large drives for the purpose of dense capacity–when people need increased performance there are many better* ways to achieve it than spending money on extra bits, racks, amps, BTU, and floor tiles that aren’t going to be used efficiently.

If a customer expresses interest in short stroking drives on an enterprise array, it’s usually easy to change their mind with a simple, “what exactly is it that you’re trying to achieve here?”

If they continue to show interest in short-stroking, I pull out a white paper that EMC published in 2008 (DMX-4 Enterprise Flash with Microsoft Exchange) and show them that even EMC thinks it’s a bad idea.

*(I asterisked “better” out of appreciation for the folks who have plenty of reason to have a differing opinion.)

Leave a Comment

Previous post:

Next post: