<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Google&#8217;s Disk Failure Experience</title>
	<atom:link href="http://storagemojo.com/2007/02/19/googles-disk-failure-experience/feed/" rel="self" type="application/rss+xml" />
	<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/</link>
	<description>Data storage info &#38; analysis</description>
	<pubDate>Mon, 12 May 2008 05:39:54 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Joe Kraska</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-112287</link>
		<dc:creator>Joe Kraska</dc:creator>
		<pubDate>Sun, 02 Sep 2007 20:44:31 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-112287</guid>
		<description>...we are seeing failure rates that are much lower than drive vendor specs — and definitely much lower than what Google reported...

Aloke, your findings are actually perfectly consistent with the Google Report. You have to look at what the term "age" means very carefully. For Google's statistics, they are looking at drives that are always and without exception "on". A drive's "age" in the Google Report and its runtime in years are both exactly the same thing. If one were to suppose that the correct age of a drive were its runtime in years only, and not its chronological age, the MAID findings would be perfectly consistent with Google's data.

Joe Kraska
BAE Systems
San Diego CA
USA</description>
		<content:encoded><![CDATA[<p>&#8230;we are seeing failure rates that are much lower than drive vendor specs — and definitely much lower than what Google reported&#8230;</p>
<p>Aloke, your findings are actually perfectly consistent with the Google Report. You have to look at what the term &#8220;age&#8221; means very carefully. For Google&#8217;s statistics, they are looking at drives that are always and without exception &#8220;on&#8221;. A drive&#8217;s &#8220;age&#8221; in the Google Report and its runtime in years are both exactly the same thing. If one were to suppose that the correct age of a drive were its runtime in years only, and not its chronological age, the MAID findings would be perfectly consistent with Google&#8217;s data.</p>
<p>Joe Kraska<br />
BAE Systems<br />
San Diego CA<br />
USA</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stephen Foskett, Pack Rat :: Specialized Hard Drives: Worth the Effort?</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-103303</link>
		<dc:creator>Stephen Foskett, Pack Rat :: Specialized Hard Drives: Worth the Effort?</dc:creator>
		<pubDate>Fri, 03 Aug 2007 14:06:06 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-103303</guid>
		<description>[...] Lately, there has been a lot of buzz in the enterprise storage arena about whether so-called &#8220;enterprise drives&#8221; are really any better than plain-Jane hard drives in Enterprise applications. This came to a head with the controversial findings of Google and CMU, but it&#8217;s been simmering under the covers everywhere from TiVo communities to gamers. I&#8217;ve normally been loathe to focus on a product so mundane as a hard disk unit in this blog, but I find that their functionality ripples up to the highest levels of strategic buying. [...]</description>
		<content:encoded><![CDATA[<p>[...] Lately, there has been a lot of buzz in the enterprise storage arena about whether so-called &#8220;enterprise drives&#8221; are really any better than plain-Jane hard drives in Enterprise applications. This came to a head with the controversial findings of Google and CMU, but it&#8217;s been simmering under the covers everywhere from TiVo communities to gamers. I&#8217;ve normally been loathe to focus on a product so mundane as a hard disk unit in this blog, but I find that their functionality ripples up to the highest levels of strategic buying. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lucid Information Systems</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-98646</link>
		<dc:creator>Lucid Information Systems</dc:creator>
		<pubDate>Fri, 20 Jul 2007 23:46:29 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-98646</guid>
		<description>Seagate produces hard drives which are designed for use in digital surveillance systems. They have low power requirements, quick spin-up times, ultra low power save modes and lower heat dissipation than most other drives. 

Does anyone know anything further about these drives. The reason I ask is because we have quite a few of these drives currently in production. They cost a little bit more than other models. We imagine that these improvements increase the longevity of these drives. We would be happy to report back in 5 years.

More information on these drives is available from the &lt;a href="http://www.seagate.com/www/en-us/products/consumer_electronics/sv35_series/SV35_7200.2/" rel="nofollow"&gt;Seagate product url&lt;/a&gt;.

If you are concerned about your most important digital assets after reading this article, please contact us.

All the best form the Lucid Team.</description>
		<content:encoded><![CDATA[<p>Seagate produces hard drives which are designed for use in digital surveillance systems. They have low power requirements, quick spin-up times, ultra low power save modes and lower heat dissipation than most other drives. </p>
<p>Does anyone know anything further about these drives. The reason I ask is because we have quite a few of these drives currently in production. They cost a little bit more than other models. We imagine that these improvements increase the longevity of these drives. We would be happy to report back in 5 years.</p>
<p>More information on these drives is available from the <a href="http://www.seagate.com/www/en-us/products/consumer_electronics/sv35_series/SV35_7200.2/" rel="nofollow">Seagate product url</a>.</p>
<p>If you are concerned about your most important digital assets after reading this article, please contact us.</p>
<p>All the best form the Lucid Team.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: walt</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-92160</link>
		<dc:creator>walt</dc:creator>
		<pubDate>Fri, 06 Jul 2007 15:51:30 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-92160</guid>
		<description>A hard drive is a magnetic device. After 3 years the earth's magnetic field "can" have an effect. I know a tech who does a total backup of a drive. Then reformats and restores the backup every 2 years. You should never put a large audio speaker to close to a TV OR a hrad drive. I suspect bad/faulty shielding of other internal equipment could also be a factor.</description>
		<content:encoded><![CDATA[<p>A hard drive is a magnetic device. After 3 years the earth&#8217;s magnetic field &#8220;can&#8221; have an effect. I know a tech who does a total backup of a drive. Then reformats and restores the backup every 2 years. You should never put a large audio speaker to close to a TV OR a hrad drive. I suspect bad/faulty shielding of other internal equipment could also be a factor.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fred Schaff</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-50539</link>
		<dc:creator>Fred Schaff</dc:creator>
		<pubDate>Wed, 11 Apr 2007 11:37:37 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-50539</guid>
		<description>(300,000/24x365) = 34 years with my calculator.

   Where can I get one of those long-lived hard drives ??

     30,000 hours, maybe ??</description>
		<content:encoded><![CDATA[<p>(300,000/24&#215;365) = 34 years with my calculator.</p>
<p>   Where can I get one of those long-lived hard drives ??</p>
<p>     30,000 hours, maybe ??</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nevin House</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-50325</link>
		<dc:creator>Nevin House</dc:creator>
		<pubDate>Wed, 11 Apr 2007 01:10:21 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-50325</guid>
		<description>If the drive is a lemon, it will fail sooner.  Otherwise, you might get lucky with your new drive.</description>
		<content:encoded><![CDATA[<p>If the drive is a lemon, it will fail sooner.  Otherwise, you might get lucky with your new drive.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: STORM &#124; Home Blog &#187; Google Teaches Us Five Things About Hard Drive Death</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-48834</link>
		<dc:creator>STORM &#124; Home Blog &#187; Google Teaches Us Five Things About Hard Drive Death</dc:creator>
		<pubDate>Sun, 08 Apr 2007 04:24:35 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-48834</guid>
		<description>[...] Failure Trends Study(pdf) [via StorageMojo]     (2 votes, average: 3.5 out of 5) &#160;Loading ... Print This Post &#124; EMail This Post [...]</description>
		<content:encoded><![CDATA[<p>[...] Failure Trends Study(pdf) [via StorageMojo]     (2 votes, average: 3.5 out of 5) &nbsp;Loading &#8230; Print This Post | EMail This Post [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: se</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-30961</link>
		<dc:creator>se</dc:creator>
		<pubDate>Tue, 27 Feb 2007 14:54:18 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-30961</guid>
		<description>Just a note... in our educational data center we had a short period where we turned off the servers when there was a 3 or more weeks holiday. We stopped doing that because most drive problems began just days after the holidays. Restarting the server farm resulted often in several crashed disks. ( in the Conner dominated years, a hit with a screwdriver sometimes made drives available again. But only to make a quick copy of the data.)

Nowadays the servers keep running during those weeks and besides the used power there are no negative side-effects.</description>
		<content:encoded><![CDATA[<p>Just a note&#8230; in our educational data center we had a short period where we turned off the servers when there was a 3 or more weeks holiday. We stopped doing that because most drive problems began just days after the holidays. Restarting the server farm resulted often in several crashed disks. ( in the Conner dominated years, a hit with a screwdriver sometimes made drives available again. But only to make a quick copy of the data.)</p>
<p>Nowadays the servers keep running during those weeks and besides the used power there are no negative side-effects.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Prof.  John</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-30393</link>
		<dc:creator>Prof.  John</dc:creator>
		<pubDate>Sun, 25 Feb 2007 06:26:29 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-30393</guid>
		<description>First of all, don't be overwhelmed by "best" papers.  At least one CMU paper has won a FAST best paper award in the recent a few years, and you can guess why from the PC members.  Another thing is it is not uncommon to see flawed conclusions in FAST's best papers, e.g., the FAST 2004 best paper from NetApp on RAID-6 ( more important now) is fundamentally wrong in technical points, because the PCs simply didn't (or want to) find some obvious qualified reviewers.  This is a club system ( and I have to disclose that I am a member of them, so no complains from me ...)

That being said, this year papers from CMU and google do have their indisputable values. They show by real system data that disks are not so reliable after all. One of course wouldn't be so naive to believe his/her disk can last hundreds of years in real systems.  ( A sideline, you can find some statistical data on CD and DVD lifetime from a post at Storage Advisor's blog, another good storage site. )

A more important question an IT manager faces is how much reliable is enough and how to realize this in system set-up, e.g. google guys choose triplication. Disks evolve, in a few years hopefully, we will see reasonable priced holographic HDs with much bigger capacity and lifetime, but as many people here already said, storage systems fail not only due to disks, but too many other factors, thus the architecture issue always will be there. One way to go may be more smartly clusterred systems, a favorite theme here. Again, triplication is not a good solution in long term, even for google, simply due to too much management, space and power costs.  RAID-5 can be easily extended to a clustered system. How about RAID-n ( n &#62; 5 )? The truth is so far, technical we ( both academia and industry ) don't have a sound solution. Current existing RAID-6 or RAID-7 needs too much CPU and thus causes high I/O latency and low thoughput. This is part of the reason array vendors are not pushing you RAID-6 products. Any one here can provide real RAID-6 experiences?   

Back to data loss: a piece of data can lose even though the host disk in whole still functions. This is the so-called sector error. The google paper mentioned it a bit by "scan error". This is much more common and happens much more frequently from whole disk failure. The effect though is the same, if your data happens to be stored on that failed sector. In this sense, the CMU and google papers are just a beginning. If they are good, they should continue to explore the sector error statistics, which certainly will be much harder. This year's ACM SIGMETRICS 2007 will have a paper from NetApp on this and hopefully this group of people have done a better job than the FAST 2004 group. ( The paper's title is : An Analysis of Latent Sector Errors in Disk Drives, see http://www.cs.cmu.edu/~sigm07/ )

Oh, well,  I will stop here to see if any one from EMC or NetApp or other start-ups can provide their views on systems ...</description>
		<content:encoded><![CDATA[<p>First of all, don&#8217;t be overwhelmed by &#8220;best&#8221; papers.  At least one CMU paper has won a FAST best paper award in the recent a few years, and you can guess why from the PC members.  Another thing is it is not uncommon to see flawed conclusions in FAST&#8217;s best papers, e.g., the FAST 2004 best paper from NetApp on RAID-6 ( more important now) is fundamentally wrong in technical points, because the PCs simply didn&#8217;t (or want to) find some obvious qualified reviewers.  This is a club system ( and I have to disclose that I am a member of them, so no complains from me &#8230;)</p>
<p>That being said, this year papers from CMU and google do have their indisputable values. They show by real system data that disks are not so reliable after all. One of course wouldn&#8217;t be so naive to believe his/her disk can last hundreds of years in real systems.  ( A sideline, you can find some statistical data on CD and DVD lifetime from a post at Storage Advisor&#8217;s blog, another good storage site. )</p>
<p>A more important question an IT manager faces is how much reliable is enough and how to realize this in system set-up, e.g. google guys choose triplication. Disks evolve, in a few years hopefully, we will see reasonable priced holographic HDs with much bigger capacity and lifetime, but as many people here already said, storage systems fail not only due to disks, but too many other factors, thus the architecture issue always will be there. One way to go may be more smartly clusterred systems, a favorite theme here. Again, triplication is not a good solution in long term, even for google, simply due to too much management, space and power costs.  RAID-5 can be easily extended to a clustered system. How about RAID-n ( n &gt; 5 )? The truth is so far, technical we ( both academia and industry ) don&#8217;t have a sound solution. Current existing RAID-6 or RAID-7 needs too much CPU and thus causes high I/O latency and low thoughput. This is part of the reason array vendors are not pushing you RAID-6 products. Any one here can provide real RAID-6 experiences?   </p>
<p>Back to data loss: a piece of data can lose even though the host disk in whole still functions. This is the so-called sector error. The google paper mentioned it a bit by &#8220;scan error&#8221;. This is much more common and happens much more frequently from whole disk failure. The effect though is the same, if your data happens to be stored on that failed sector. In this sense, the CMU and google papers are just a beginning. If they are good, they should continue to explore the sector error statistics, which certainly will be much harder. This year&#8217;s ACM SIGMETRICS 2007 will have a paper from NetApp on this and hopefully this group of people have done a better job than the FAST 2004 group. ( The paper&#8217;s title is : An Analysis of Latent Sector Errors in Disk Drives, see <a href="http://www.cs.cmu.edu/~sigm07/" rel="nofollow">http://www.cs.cmu.edu/~sigm07/</a> )</p>
<p>Oh, well,  I will stop here to see if any one from EMC or NetApp or other start-ups can provide their views on systems &#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Al</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-29276</link>
		<dc:creator>Al</dc:creator>
		<pubDate>Fri, 23 Feb 2007 04:10:06 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-29276</guid>
		<description>Drive brands are not as correlated to failure rates as drive model numbers. Major storage vendors may have favorite vendors, but they qualify by disk model. A disk model upgrade generally involves a major firmware change in addition to hardware and manufacturing process changes, and these are subject to the same kinds of problems as any major OS+hardware upgrade. It is not unheard of (though rare) for one model from a reputable disk manufacturer to have ten times as many failures in the field as another similar one.</description>
		<content:encoded><![CDATA[<p>Drive brands are not as correlated to failure rates as drive model numbers. Major storage vendors may have favorite vendors, but they qualify by disk model. A disk model upgrade generally involves a major firmware change in addition to hardware and manufacturing process changes, and these are subject to the same kinds of problems as any major OS+hardware upgrade. It is not unheard of (though rare) for one model from a reputable disk manufacturer to have ten times as many failures in the field as another similar one.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aloke Guha</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-28837</link>
		<dc:creator>Aloke Guha</dc:creator>
		<pubDate>Thu, 22 Feb 2007 05:25:44 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-28837</guid>
		<description>This paper totally misses the correlation between workload and failures rates of SATA/PATA drives since it does not adequately define or, more importantly, cannot control workload.  For the past nearly 3 years, we have been studying the failure rates of drives in our deployed MAID storage used for non-transactional persistent data applications (backup, archive, etc.) where the drives are completely powered off at least 75% of the time. Thus far (results to be published) we are seeing failure rates that are much lower than drive vendor specs -- and definitely much lower than what Google reported.  Properly designed MAID systems not only saves energy but also increases drive life!</description>
		<content:encoded><![CDATA[<p>This paper totally misses the correlation between workload and failures rates of SATA/PATA drives since it does not adequately define or, more importantly, cannot control workload.  For the past nearly 3 years, we have been studying the failure rates of drives in our deployed MAID storage used for non-transactional persistent data applications (backup, archive, etc.) where the drives are completely powered off at least 75% of the time. Thus far (results to be published) we are seeing failure rates that are much lower than drive vendor specs &#8212; and definitely much lower than what Google reported.  Properly designed MAID systems not only saves energy but also increases drive life!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rob</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-28672</link>
		<dc:creator>Rob</dc:creator>
		<pubDate>Wed, 21 Feb 2007 22:26:16 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-28672</guid>
		<description>Some slashdotter did a little analysis of their own, and noted that google say "more than 100,000" drives were analysed. They managed to come up with a number of 830,000 for this survey, which makes this data even more impressive.

http://slashdot.org/comments.pl?sid=222978&#38;cid=18058500</description>
		<content:encoded><![CDATA[<p>Some slashdotter did a little analysis of their own, and noted that google say &#8220;more than 100,000&#8243; drives were analysed. They managed to come up with a number of 830,000 for this survey, which makes this data even more impressive.</p>
<p><a href="http://slashdot.org/comments.pl?sid=222978&amp;cid=18058500" rel="nofollow">http://slashdot.org/comments.pl?sid=222978&amp;cid=18058500</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ehud</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-28635</link>
		<dc:creator>Ehud</dc:creator>
		<pubDate>Wed, 21 Feb 2007 20:43:38 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-28635</guid>
		<description>StorageReview.com has a reliability database that includes manufacturers and models. But no raw data is available, and I'm not sure how they arrive at their figures.</description>
		<content:encoded><![CDATA[<p>StorageReview.com has a reliability database that includes manufacturers and models. But no raw data is available, and I&#8217;m not sure how they arrive at their figures.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: DaveC</title>
		<link>http://storagemojo.com/2007/02/19/googles-disk-failure-experience/#comment-28592</link>
		<dc:creator>DaveC</dc:creator>
		<pubDate>Wed, 21 Feb 2007 19:00:38 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=378#comment-28592</guid>
		<description>Does anyone have data re. brand vs. failure rate? That statistic that Google chooses to guard? Surely the data recovery folks would have something to say about brands...</description>
		<content:encoded><![CDATA[<p>Does anyone have data re. brand vs. failure rate? That statistic that Google chooses to guard? Surely the data recovery folks would have something to say about brands&#8230;</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.727 seconds -->
