You can go about your business
Chuck Hollis of EMC took the challenge to respond to the Open Letter about the big differences between drive life specs and what research on 200,000 drives found (see Everything You Know About Disks Is Wrong and Google’s Disk Failure Experience).

Notable in the EMC response: They didn’t deny that the research is correct.

The Force is strong with this one
Chuck’s response is both hilarious and remarkably wide of the mark at the same time. In a parody of the blog-style, Chuck does a “gee-whiz, what’s all the fuss about” schtick that, if it didn’t reflect EMC’s marketing competence, would be merely goofy. Like this bit:

Maybe you saw the interesting white paper from a team at Google.

They tracked a population of disk drives over a period of five years, and concluded “hey, the data doesn’t really match up to what we might have thought”.

Fair enough.

And then the blogging started. Responses to responses. Vendor posturing.

Many of us took a look at this and thought “sheesh, what’s the big deal?”

Fair enough. Here’s the big deal.
First, there were two studies. One from Google, one from Carnegie-Mellon University. Serious techies. Key findings from the two papers – and for Chuck’s benefit a Venn diagram of their topics doesn’t overlap 100% – were

  • Disk drives have a field failure rate 2-4 times the vendors spec.
  • Reliability of cheap “consumer” drives and expensive “enterprise” drives is about the same, despite several hundred thousand hour differences in their MTBFs.
  • Drives failures are highly correlated, violating a chief assumption behind the data security of RAID systems.

Or to put it in the context of Chuck’s response:
“hey, the data [100,000 drives] doesn’t really match up to what we might have thought [based on the vendor’s specs]”.
Chuck’s response: Hey, stuff happens. Who knew? Who cares? Yawn.

But sir, nobody worries about upsetting a droid.
I asked the array companies to respond because I thought that with the millions of drives they buy each year and their field service experience they could offer unique insight into the validity of the two studies. I even offered a marketing line that I thought that EMC would find attractive:

These academic studies may reflect the conditions seen in these point-off-the-enterprise-curve installations, but thanks to our superior supply-chain management, manufacturing, test, burn-in and skilled field service we’ve never observed these effects. Here to give an in-depth review of our service experience is our director of field service engineering. Thank you for giving us the opportunity to highlight our operational superiority.

That’s ’cause droids don’t pull people’s arms out of their sockets when they lose.
Note, of course, this means denying that the company has seen these effects. And that’s the rub, isn’t it. Because if you have seen these effects, and you haven’t communicated them to customers, at least through Sales Engineers, then you are at least a tiny bit complicit with the fictions the drive vendors are peddling. If you haven’t seen these effects, then why wouldn’t you just step up and say, “hogwash!”

I used to bullseye womp rats in my T-16 back home.
When I wrote to Chuck, asking for a response, I said “This is an opportunity for EMC to take a leadership role in sorting this out.” EMC’s response: “we’ll pass.”

You will never find a more wretched hive of scum and villainy. We must be cautious.
Chuck goes on to suggest that one of the more “strident” blogs may have:

. . . had their pattern recognition circuitry turned up a bit too high. Either that, or they thought that by being controversial, they could increase their presence in the community. . . .

Do I think there is a conspiracy among vendors to mislead the public?

Don’t be ridiculous.

You guys are giving us way too much credit here.

The StorageMojo take
There are two sides to every story, but only one set of facts. I hoped that EMC would have offered some – any – in their response.

I’ve spent most of my working life in large companies and I respect what they can accomplish. I also have a well-honed appreciation of their many failings, how group dynamics can trump even the best intentions, such as these from EMC’s website:

We pride ourselves on doing what’s right and on putting our customers’ best interests first. We lead change and change to lead. We are devoted to advancing our people, customers, industry, and community. We say what we mean and do what we say.

Chuck, granted, you didn’t have much time to respond. But you and I both know that there are people inside EMC who know the answers to the questions these studies have raised. So here’s a suggestion: go and get the data and then respond. I’m sure that many EMC customers would appreciate the effort.

Update: Chuck kindly wrote me to assure me that he was NOT responding to the Open Letter:

I was responding to the many, many bloggers who’ve commented on the topic, and not you personally or specifically.

Thanks for clarifying that, Chuck. I stand corrected.

Update II: I try to reply to comments, such as the excellent ones this post received, and I realized as I did that my confusion about whether Chuck was responding to me or not was certainly understandable, since in an email to me he said

Hi Robin,

Had a chance to review all of the posts, the orginal white paper, etc. and I’ve responded from a personal perspective here: [url]

Make of it what you will.

Comments welcome, from one and all, in agreement or not. Moderation turned on because moderation is a virtue, except in the defense of liberty.