Disk Vendors Dig 10 Petabyte Hole; Throw Selves In

by Robin Harris on Monday, 5 March, 2007

Optics? We don’t need no stinkin’ optics.
Computerworld reported Friday that drive vendors have very little to say about their MTBF specs. (BTW, link points to a more easily read version, one that also links to StorageMojo, which is how I found out about it.)

Drive vendors decline interviews
Computerworld did get emails from two of the major vendors, Seagate and Hitachi Global Storage Technology, which acquired the IBM disk business a couple of years ago.

Quoting Computerworld:

“The conditions that surround true drive failures are complicated and require a detailed failure analysis to determine what the failure mechanisms were,” said a spokesperson for Seagate Technology in Scotts Valley, Calif., in an e-mail. “It is important to not only understand the kind of drive being used, but the system or environment in which it was placed and its workload.”

Hitachi offered a similar rationale:

“Regarding various reliability rate questions, it’s difficult to provide generalities,” said a spokesperson for Hitachi Global Storage Technologies in San Jose, in an e-mail. “We work with each of our customers on an individual basis within their specific environments, and the resulting data is confidential.”

Horsepucky!
Both statements are transparently silly on their faces, although Seagate rates a barely passing D to Hitachi’s failing grade. Let’s deconstruct these statements:

“. . . difficult to provide generalities.” An MTBF or AFR spec is a generality. You provide those.

“. . . various reliability rate questions. . . There are two important ones: what is the MTBF/AFR of your drives that you actually observe from your warranty returns after adjusted for No Trouble Found rates; and how do failure rates correlate with age? Can you get me those?

“. . . the resulting data is confidential.” Strip out the customer data, aggregate across customers, and I’m sure your customers would have no objection.

“. . . true drive failures. . .” The studies found that even after adjusting for reported No Trouble Found rates, drive AFR’s are significantly higher than spec. From the comments I’ve seen, most customers are willing to accept that the drive isn’t always at fault. But that still leaves a big gap.

“. . . detailed failure analysis to determine what the failure mechanisms were . . .”
Nope. We’re interested in failure rates, not causes. Do you have those observed AFRs anywhere? Just send them to me and I’ll publish them, free of charge. In big caps.

“. . . the system or environment in which it was placed and its workload. . . . “
For Google, we know they use class A data center space, three drives per server, mounted horizontally, running sequential writes and shorter reads. HPC not all that different except for more drives per enclosure. Oh, and hey, shouldn’t slower drives with lower IOPS have LESS wear than their faster brethern? Just a thought.

The StorageMojo take
C’mon guys, it is time to go into crisis management mode before this spirals out of control and permanently damages customer confidence in both you and your array vendor customers. And that means a lot more transparency.

I have the highest respect for the modern disk drive. It is a marvel of high technology and volume manufacturing. As I’ve noted more than once, the disk drive industry is the unsung hero of information technology. Drive vendors should be very proud of what they have achieved.

Yet, as I wrote in Truth: The Ultimate Marketing Tool

. . . every marketer should ask: “what information could my customer use that would help them make the best decision, not for me, but for them?” As a vendor you usually have access to more information and resources than most of your customers. Use those resources to help your customer make better decisions, even if that decision is to not buy right now, and you’ll build a better relationship and a better brand. And save the perfume for those last-ditch efforts.

Please, don’t go down in flames over this. You are better than you now look.

Update: David Morgenstern over at eWeek has a different take here. I made some minor wording changes as well.

Comments welcome, as always. I try to respond to comments, but network issues are hampering me this week.

{ 2 comments… read them below or add one }

Rex March 12, 2007 at 9:47 am

Look at this from a different perspective.

These papers have created more FUD in the storage market – disk drives and arrays are not as reliable as we were led to believe. How will storage customers respond?

We could switch to a different brand of hard drive – except that we are not given the choice (try telling Apple what brand of hard drive to put in the XServe RAID); and we can’t trust any vendors MTBF numbers anyway (is Hitachi better than Seagate?).

So we need to go to RAID 6, or Google-like triple-redundant storage, and cross our fingers. Who wins in that situation? Surprise – the storage vendors who get to ship a lot more hard drives!

Seems to me the hard drive vendors don’t have much incentive to respond.

Robin Harris March 12, 2007 at 7:54 pm

Rex,

Excellent point. Given the reality of drive failure, redundancy is the only answer. I’m reluctantly persuaded that drive capacities have reached a point where triple redundancy is required for true high availability.

Twenty five years ago drive MTBFs were on the order of 25,000 hours – a tenth of today’ consumer drives. Data volumes were much smaller true, and yet sysadmins managed to protect data. Vendors have backed themselves into an self-imposed corner with their clumsy attempts to spin the facts. Customers can handle the truth. It is up to vendors to tell it.

Robin

Leave a Comment

Previous post:

Next post: