Finally, some drive model failure numbers

by Robin Harris on Thursday, 5 April, 2007

I wish I could report that one of the big guys stepped up . . .
Instead, Jon Bach of Puget Custom Computers in suburban Seattle offered his company’s data on his blog post titled Why RAID is (usually) a Terrible Idea. It is a good post and well worth reading in its entirety

I liked it so much that I blogged about it on my ZDnet blog yesterday. Here I’m focusing on the drive numbers.

What are these numbers?
PCC sells hundreds of desktop systems per month and they track all failures and trouble tickets. Jon’s numbers include ALL drive failures, including those caused by mishandling, like when a WD Raptor got dropped on the warehouse floor.

Here is the data I have for our hard drive sales in the last year, where we have sold at least 200 units:

Hard Drive Model # of Units Failure %
Seagate Barracuda 7200.9 250GB SATAII 280 3.21%
Seagate SATA Barracuda 80GB 271 2.58%
Western Digital SATA Raptor 74GB 592 2.03%
Seagate Barracuda 7200.10 320GB SATAII 202 1.98%
Seagate Barracuda 7200.9 160GB SATAII 265 1.89%
Seagate Barracuda 7200.9 80GB SATAII 403 1.74%
Western Digital ATA100 80.0GB WD800JB 290 1.72%
Western Digital SATA Raptor 150GB 278 1.44%
Total # of drives 2581 2.05%

These are all first year numbers. And I think they show how reliable disk drives as a group are. Make no mistake: disk drives are probably the greatest IT bargain out there. Drive companies have done a great job making massive storage affordable.

I added the total
Maybe one of my statistically smarter readers can do more with these numbers. As I look at the numbers though, I see a mix of desktop and server drives with no particular pattern – a result that agrees with Bianca Shroeder’s paper from FAST ’07. Any other conclusions readers can reach?

Let us all know in the comments.

The StorageMojo take
It isn’t clear to me why folks who have the data about drive model reliability don’t want to publish it. Maybe they don’t want the hassle of customers requesting specific drives. Maybe all the drive and array makers do back room deals where they take volumes of not-as-good drives for knock-down prices and shovel them off to less-favored customers. Who knows?

Perhaps StorageMojo readers who have businesses like Jon’s or who work in corporate IT with access to failure data could pass it on to me. I’ll total them up and publish them. If a vendor doesn’t like the numbers then they can send me their own.

From a statistical perspective that’s a little rough, but we have to start somewhere.

Comments, as always, welcome. Moderation turned on to keep spam at bay.

{ 6 comments… read them below or add one }

Darren Embry April 13, 2007 at 10:58 am

Google, for example, might not have wanted to publish brand reliability data in their report because their competitors would like to be able to use it.

aDEPT April 16, 2007 at 9:19 pm

I’m not sure why there is this big fuss about drive failures in the context of the commercial storage industry.

If I was to buy a drive for my home system, I don’t want the drive to fail, but in a work environment (with RAID) I don’t just purchase the drive but a support contact that makes a drive failure the vendors problem.

The real failure rates would be useful in calculating the real risk of data loss, I have a discussion with one customer on how many spare SATA drives they need to provide a given level of protection for their email archive. But once the chance of data loss is below an acceptable threshold then that is as far as it goes.

If EMC/HDS/IBM/NetApp decide to use a drive with a high failure rate, then there is no additional cost to me, an engineer shows up with a dive and swaps it. If vendor X uses a crummy drive and has to swap 5 drives a year, and vendor Y only does 2, then it is venodr X who gets hit on the bottom line. I have already chosen a vendor based on cost, features and how generous the company’s sales person was.

Jose Pinto May 6, 2007 at 3:44 am

Hi Mr. Robin Harris.
First of all, thank you very much for your informations. If there is something very hard to see is informations about hard disk, and about quality and reliability is harder. I will translate your informations to write to brasilian people, (of course I will put the credit) here in Brasil there none about hard disk. As I´m a hard disk harware repair worker and the only one that writes about disk I think that I need to write what you wrote because as you say we need to start somewhere.
Thank you very much, all your articles are very good I feell myself in a hard disk library.
Best regards
Jose Pinto

kace May 28, 2007 at 9:26 pm

I know a little about statistics. Hard data is great. But, this data is somewhat limited and people should be careful about drawing sweeping conclusions based on it. The biggest issue is that for reliability estimates you really want to know how long until you saw the failure. Granted, you said these are all within the first year. But, the typical assumption is that each model has a mean time to failure and you’d want to estimate that based on the observed failure times, in addition to the observed failure rate.

Also, including “handling errors” in the data is a mistake because it doesn’t help to answer the question I think you’re interested in: how long before an in-service drive fails? (For that matter, statisticians often choose to disregard “infant mortality” failures, i.e., very early failures which may indicate a manufacturing defect rather than a wear-and-tear, reliability-based failure. I think that technique might have been useful here, because how many of these failures may have been based on shipping damage or installation damage (tech. adds/removes components to new machine and damages drive in the process)? )

Anyway, that’s what statisticians do: complain about the data. 🙂 Thanks for interesting post.

Tim December 4, 2007 at 3:12 am

I guess the only thing i can see with the numbers is what were the drives doing? if the SATA drives were doing high speed, streaming data (yes, i have a customer doing this) then i am impressed with how low the numbers are. on the other hand, if your FC disks are doing non-critical (tier 2 or 3 storage – yes, same customer) then that too is good to know.

non the less, the info is great, thanks!

xila April 3, 2008 at 12:57 pm

the information i got of hard disk failure is not enough because these are there common failure that i use to come across like the noisy sound when a hard drive is getting its death sentence but what i would like to know is that if would came across a problem in a hard drive and the hard drive fails going down with your data what are chances of getting the data? and how would one be able to troubleshoot that problem? pls help im in a crisis

Leave a Comment

Previous post:

Next post: