StorageMojo




Robin Harris    


Finally, some drive model failure numbers

April 5th, 2007 by Robin Harris in Backup, Enterprise

I wish I could report that one of the big guys stepped up . . .
Instead, Jon Bach of Puget Custom Computers in suburban Seattle offered his company’s data on his blog post titled Why RAID is (usually) a Terrible Idea. It is a good post and well worth reading in its entirety

I liked it so much that I blogged about it on my ZDnet blog yesterday. Here I’m focusing on the drive numbers.

What are these numbers?
PCC sells hundreds of desktop systems per month and they track all failures and trouble tickets. Jon’s numbers include ALL drive failures, including those caused by mishandling, like when a WD Raptor got dropped on the warehouse floor.

Here is the data I have for our hard drive sales in the last year, where we have sold at least 200 units:

Hard Drive Model # of Units Failure %
Seagate Barracuda 7200.9 250GB SATAII 280 3.21%
Seagate SATA Barracuda 80GB 271 2.58%
Western Digital SATA Raptor 74GB 592 2.03%
Seagate Barracuda 7200.10 320GB SATAII 202 1.98%
Seagate Barracuda 7200.9 160GB SATAII 265 1.89%
Seagate Barracuda 7200.9 80GB SATAII 403 1.74%
Western Digital ATA100 80.0GB WD800JB 290 1.72%
Western Digital SATA Raptor 150GB 278 1.44%
Total # of drives 2581 2.05%

These are all first year numbers. And I think they show how reliable disk drives as a group are. Make no mistake: disk drives are probably the greatest IT bargain out there. Drive companies have done a great job making massive storage affordable.

I added the total
Maybe one of my statistically smarter readers can do more with these numbers. As I look at the numbers though, I see a mix of desktop and server drives with no particular pattern - a result that agrees with Bianca Shroeder’s paper from FAST ‘07. Any other conclusions readers can reach?

Let us all know in the comments.

The StorageMojo take
It isn’t clear to me why folks who have the data about drive model reliability don’t want to publish it. Maybe they don’t want the hassle of customers requesting specific drives. Maybe all the drive and array makers do back room deals where they take volumes of not-as-good drives for knock-down prices and shovel them off to less-favored customers. Who knows?

Perhaps StorageMojo readers who have businesses like Jon’s or who work in corporate IT with access to failure data could pass it on to me. I’ll total them up and publish them. If a vendor doesn’t like the numbers then they can send me their own.

From a statistical perspective that’s a little rough, but we have to start somewhere.

Comments, as always, welcome. Moderation turned on to keep spam at bay.

6 Responses to ' Finally, some drive model failure numbers '

Subscribe to comments with RSS or TrackBack to ' Finally, some drive model failure numbers '.

  1. Darren Embry said,

    on April 13th, 2007 at 10:58 am

    Google, for example, might not have wanted to publish brand reliability data in their report because their competitors would like to be able to use it.

  2. aDEPT said,

    on April 16th, 2007 at 9:19 pm

    I’m not sure why there is this big fuss about drive failures in the context of the commercial storage industry.

    If I was to buy a drive for my home system, I don’t want the drive to fail, but in a work environment (with RAID) I don’t just purchase the drive but a support contact that makes a drive failure the vendors problem.

    The real failure rates would be useful in calculating the real risk of data loss, I have a discussion with one customer on how many spare SATA drives they need to provide a given level of protection for their email archive. But once the chance of data loss is below an acceptable threshold then that is as far as it goes.

    If EMC/HDS/IBM/NetApp decide to use a drive with a high failure rate, then there is no additional cost to me, an engineer shows up with a dive and swaps it. If vendor X uses a crummy drive and has to swap 5 drives a year, and vendor Y only does 2, then it is venodr X who gets hit on the bottom line. I have already chosen a vendor based on cost, features and how generous the company’s sales person was.

  3. Jose Pinto said,

    on May 6th, 2007 at 3:44 am

    Hi Mr. Robin Harris.
    First of all, thank you very much for your informations. If there is something very hard to see is informations about hard disk, and about quality and reliability is harder. I will translate your informations to write to brasilian people, (of course I will put the credit) here in Brasil there none about hard disk. As I´m a hard disk harware repair worker and the only one that writes about disk I think that I need to write what you wrote because as you say we need to start somewhere.
    Thank you very much, all your articles are very good I feell myself in a hard disk library.
    Best regards
    Jose Pinto

  4. kace said,

    on May 28th, 2007 at 9:26 pm

    I know a little about statistics. Hard data is great. But, this data is somewhat limited and people should be careful about drawing sweeping conclusions based on it. The biggest issue is that for reliability estimates you really want to know how long until you saw the failure. Granted, you said these are all within the first year. But, the typical assumption is that each model has a mean time to failure and you’d want to estimate that based on the observed failure times, in addition to the observed failure rate.

    Also, including “handling errors” in the data is a mistake because it doesn’t help to answer the question I think you’re interested in: how long before an in-service drive fails? (For that matter, statisticians often choose to disregard “infant mortality” failures, i.e., very early failures which may indicate a manufacturing defect rather than a wear-and-tear, reliability-based failure. I think that technique might have been useful here, because how many of these failures may have been based on shipping damage or installation damage (tech. adds/removes components to new machine and damages drive in the process)? )

    Anyway, that’s what statisticians do: complain about the data. :) Thanks for interesting post.

  5. Tim said,

    on December 4th, 2007 at 3:12 am

    I guess the only thing i can see with the numbers is what were the drives doing? if the SATA drives were doing high speed, streaming data (yes, i have a customer doing this) then i am impressed with how low the numbers are. on the other hand, if your FC disks are doing non-critical (tier 2 or 3 storage - yes, same customer) then that too is good to know.

    non the less, the info is great, thanks!

  6. xila said,

    on April 3rd, 2008 at 12:57 pm

    the information i got of hard disk failure is not enough because these are there common failure that i use to come across like the noisy sound when a hard drive is getting its death sentence but what i would like to know is that if would came across a problem in a hard drive and the hard drive fails going down with your data what are chances of getting the data? and how would one be able to troubleshoot that problem? pls help im in a crisis

Leave a reply



StorageMojo RSS Feed May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006 October 2006 September 2006 August 2006 July 2006 June 2006 May 2006 April 2006 March 2006 June 2005 April 2005 March 2005 February 2005 January 2005 December 2004 November 2004 October 2004 September 2004