<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Everything You Know About Disks Is Wrong</title>
	<atom:link href="http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/feed/" rel="self" type="application/rss+xml" />
	<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/</link>
	<description>Data storage info &#38; analysis</description>
	<lastBuildDate>Sun, 01 Aug 2010 02:16:15 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Tim</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-3/#comment-208996</link>
		<dc:creator>Tim</dc:creator>
		<pubDate>Mon, 12 Apr 2010 13:08:13 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-208996</guid>
		<description>Further to Tracy Valleau

The industry is moving towards using AFR (Annual Failure Rate).  The reason is that MTBF is really confusing, and AFR gives the consumer a better idea of what the number is.  an AFR of 0.87% is equivalent to MTBF of 1,000,000. the equation is AFR = 1-exp(-8760/MTBF)

Both of these measures are POPULATION statistics.  One would expect from a large population that a small fraction might be faulty or break earlier than expected.  Most people can intuitively understand that about 1% of disks might fail in a single year, or there is a 1% chance of a disk failing in a year.  They also do not link this failure rate with the disks lifetime.  As such AFR is much more sensible metric for this type of information.  and AFR=0.87% is exactly the same as MTBF of 1,000,000 hours.

This statistic also in no way defines how long a disk will last.  That is the useful life value (say 30,000 POH (power on hours)).  This will be linked to the warranty period, wear-out etc.

On a slightly different note....  The paper did not measure disk failures, rather, &quot;disk replacements&quot;.  There is a difference between the two, namely mis-diagnosis.  This may also help explain why she got a autocorrelation.  If I incorrectly replace a disk that is faulty, I still leave the root cause of the problem, and am likely to repeat the same mistake a week or so latter.... hence the autocorrelation result.  

My hypothesis is that the autocorrelation seen is caused by mis-diagnosis.  Unfortunately I do not have the data to prove/disprove that hypothesis.</description>
		<content:encoded><![CDATA[<p>Further to Tracy Valleau</p>
<p>The industry is moving towards using AFR (Annual Failure Rate).  The reason is that MTBF is really confusing, and AFR gives the consumer a better idea of what the number is.  an AFR of 0.87% is equivalent to MTBF of 1,000,000. the equation is AFR = 1-exp(-8760/MTBF)</p>
<p>Both of these measures are POPULATION statistics.  One would expect from a large population that a small fraction might be faulty or break earlier than expected.  Most people can intuitively understand that about 1% of disks might fail in a single year, or there is a 1% chance of a disk failing in a year.  They also do not link this failure rate with the disks lifetime.  As such AFR is much more sensible metric for this type of information.  and AFR=0.87% is exactly the same as MTBF of 1,000,000 hours.</p>
<p>This statistic also in no way defines how long a disk will last.  That is the useful life value (say 30,000 POH (power on hours)).  This will be linked to the warranty period, wear-out etc.</p>
<p>On a slightly different note&#8230;.  The paper did not measure disk failures, rather, &#8220;disk replacements&#8221;.  There is a difference between the two, namely mis-diagnosis.  This may also help explain why she got a autocorrelation.  If I incorrectly replace a disk that is faulty, I still leave the root cause of the problem, and am likely to repeat the same mistake a week or so latter&#8230;. hence the autocorrelation result.  </p>
<p>My hypothesis is that the autocorrelation seen is caused by mis-diagnosis.  Unfortunately I do not have the data to prove/disprove that hypothesis.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: &#187; RAID limitations &#8211; an interesting read</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-3/#comment-205687</link>
		<dc:creator>&#187; RAID limitations &#8211; an interesting read</dc:creator>
		<pubDate>Fri, 02 Oct 2009 17:38:59 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-205687</guid>
		<description>[...] http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/ [...]</description>
		<content:encoded><![CDATA[<p>[...] <a href="http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/" rel="nofollow">http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Should There Be A Tape Backup Drive in Your Future? ~ Revelations From An Unwashed Brain</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-3/#comment-201680</link>
		<dc:creator>Should There Be A Tape Backup Drive in Your Future? ~ Revelations From An Unwashed Brain</dc:creator>
		<pubDate>Tue, 19 May 2009 21:02:18 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-201680</guid>
		<description>[...] his own website, Mr. Harris attempts to give the reader a quick education on the problems of drives, and what you think you might know is probably [...]</description>
		<content:encoded><![CDATA[<p>[...] his own website, Mr. Harris attempts to give the reader a quick education on the problems of drives, and what you think you might know is probably [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tracy Valleau</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-2/#comment-199321</link>
		<dc:creator>Tracy Valleau</dc:creator>
		<pubDate>Fri, 13 Feb 2009 05:00:40 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-199321</guid>
		<description>I often get asked about MTBF (Mean Time Between Failure) and it&#039;s amazing how many &quot;industry people&quot; don&#039;t understand it.

And for those who have already figured out that their 1.5M MTBF drives don&#039;t last 150 years, but are not sure what that MTBF thing is... here&#039;s a quickie:

Why your hard drive doesn&#039;t last 150 years.

(There are about 8700 hours in a year, but to make this example simple, let&#039;s call it 10,000.)

Here&#039;s how MTBF works: it&#039;s an aggregate of many units based on expected life of a single unit.

Let&#039;s say you have a hard drive that is warranted to last 3 years, or 30,000 hours.

You put it in a server, and behold, it lasts 3 years. You take it out and put in a new one, and that also lasts 3 years. So you replace it with a new one, and that too.... well, you get it.

Let&#039;s say you keep doing that and finally, on the 50th unit, only two years into it&#039;s life, it breaks.

You now have 3 years or 30,000 hours per unit, times 50 units = 1,500,000.

And that&#039;s your MTBF.

So anyone who says &quot;Wow! MTBF of 1.5 million hours! that mean this thing will last (1.5M / 10000) 150 years!&quot; -clearly- doesn&#039;t know what they&#039;re talking about.

(MTBF is more complex than my example, including &quot;infant mortality&quot; and &quot;wear out&quot; phases; &quot;theoretical&quot; vs &quot;operational&quot; MTBF and so on, but the gist of what&#039;s here is correct.)

Cordially,

Tracy Valleau

&quot;Don&#039;t believe everything you think.&quot;</description>
		<content:encoded><![CDATA[<p>I often get asked about MTBF (Mean Time Between Failure) and it&#8217;s amazing how many &#8220;industry people&#8221; don&#8217;t understand it.</p>
<p>And for those who have already figured out that their 1.5M MTBF drives don&#8217;t last 150 years, but are not sure what that MTBF thing is&#8230; here&#8217;s a quickie:</p>
<p>Why your hard drive doesn&#8217;t last 150 years.</p>
<p>(There are about 8700 hours in a year, but to make this example simple, let&#8217;s call it 10,000.)</p>
<p>Here&#8217;s how MTBF works: it&#8217;s an aggregate of many units based on expected life of a single unit.</p>
<p>Let&#8217;s say you have a hard drive that is warranted to last 3 years, or 30,000 hours.</p>
<p>You put it in a server, and behold, it lasts 3 years. You take it out and put in a new one, and that also lasts 3 years. So you replace it with a new one, and that too&#8230;. well, you get it.</p>
<p>Let&#8217;s say you keep doing that and finally, on the 50th unit, only two years into it&#8217;s life, it breaks.</p>
<p>You now have 3 years or 30,000 hours per unit, times 50 units = 1,500,000.</p>
<p>And that&#8217;s your MTBF.</p>
<p>So anyone who says &#8220;Wow! MTBF of 1.5 million hours! that mean this thing will last (1.5M / 10000) 150 years!&#8221; -clearly- doesn&#8217;t know what they&#8217;re talking about.</p>
<p>(MTBF is more complex than my example, including &#8220;infant mortality&#8221; and &#8220;wear out&#8221; phases; &#8220;theoretical&#8221; vs &#8220;operational&#8221; MTBF and so on, but the gist of what&#8217;s here is correct.)</p>
<p>Cordially,</p>
<p>Tracy Valleau</p>
<p>&#8220;Don&#8217;t believe everything you think.&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kmann</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-2/#comment-197240</link>
		<dc:creator>Kmann</dc:creator>
		<pubDate>Fri, 22 Aug 2008 18:01:30 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-197240</guid>
		<description>The Bianca Schroeder paper is excellent, but I saw something very interesting in the paper that seems to have gone unnoticed here,

Table 2. -- &quot;Node outages that were attributed to hardware problems broken down by the responsible hardware component.&quot;

Component (HPC1)        
CPU              44%
Memory           29%
Hard drive       16%
PCI motherboard   9%
Power supply      2%

Fully 82% of the failures were related to &quot;solid state&quot; components.

This in spite of the fact that the system population included 3,406 disks and 784 servers. DRAM was almost twice as likely to cause a failure and the CPUs were three times more likely to cause an outage. Moreover, 784 motherboards produced 9% of failures while 3,400 disks produced only 16%.

And this is a very high-end system, presumably &quot;top-shelf&quot; DRAM, CPU and motherboard components.

Also, from the text:

&quot;...we have analyzed failure data covering any type of node outage, including those caused by hardware, software, network problems, environmental problems, or operator mistakes. The data was collected over a period of 9 years on more than 20 HPC clusters and contains detailed root cause information. We found that, for most HPC systems in this data,
more than 50% of all outages are attributed to hardware problems... Consistent with the data in Table 2, the two most common hardware components to cause a node outage are memory and CPU.&quot;

So much for the myth of &quot;solid state&quot; reliability.

For some perspective, while CPU makers stopped publishing MTBF many years ago, and DRAM manufacturers have to my knowledge never published them, most motherboard manufacturers do publish -- typically in the 100,000 hour range. So...if 784 motherboards produced 9% of failures, and 3,400 disks only produced 16%, then it seems that perhaps the numbers published by the disk drive makers are, in relative terms, not so wildly off the mark. It would appear (from a system/sub-system perspective) that disks are relatively much more reliable than the &quot;solid state&quot; components. 

I wonder how people would react if they actually knew the MTBF numbers on stuff like DRAM and CPUs? Perhaps we should all remember that silicon DOES &quot;wear out&quot; (in a manner of speaking).

All this makes me wonder why everyone assumes that Flash SSD is going to be so much more reliable than other silicon. Are we to believe the ridiculous MTBF claims of the SSD makers (Intel sez 2,000,000 hrs), given the numbers on DRAM?

It will be interesting to see the results on the first large-scale deployments of flash-SSD. Unfortunately it will probably be five or more years that the &quot;free ride&quot; for SSD continues before folks begin to realize that solid-state in not necessarily more reliable than mechanical disks...and very frequently (in the case of DRAM and CPUs) less reliable!</description>
		<content:encoded><![CDATA[<p>The Bianca Schroeder paper is excellent, but I saw something very interesting in the paper that seems to have gone unnoticed here,</p>
<p>Table 2. &#8212; &#8220;Node outages that were attributed to hardware problems broken down by the responsible hardware component.&#8221;</p>
<p>Component (HPC1)<br />
CPU              44%<br />
Memory           29%<br />
Hard drive       16%<br />
PCI motherboard   9%<br />
Power supply      2%</p>
<p>Fully 82% of the failures were related to &#8220;solid state&#8221; components.</p>
<p>This in spite of the fact that the system population included 3,406 disks and 784 servers. DRAM was almost twice as likely to cause a failure and the CPUs were three times more likely to cause an outage. Moreover, 784 motherboards produced 9% of failures while 3,400 disks produced only 16%.</p>
<p>And this is a very high-end system, presumably &#8220;top-shelf&#8221; DRAM, CPU and motherboard components.</p>
<p>Also, from the text:</p>
<p>&#8220;&#8230;we have analyzed failure data covering any type of node outage, including those caused by hardware, software, network problems, environmental problems, or operator mistakes. The data was collected over a period of 9 years on more than 20 HPC clusters and contains detailed root cause information. We found that, for most HPC systems in this data,<br />
more than 50% of all outages are attributed to hardware problems&#8230; Consistent with the data in Table 2, the two most common hardware components to cause a node outage are memory and CPU.&#8221;</p>
<p>So much for the myth of &#8220;solid state&#8221; reliability.</p>
<p>For some perspective, while CPU makers stopped publishing MTBF many years ago, and DRAM manufacturers have to my knowledge never published them, most motherboard manufacturers do publish &#8212; typically in the 100,000 hour range. So&#8230;if 784 motherboards produced 9% of failures, and 3,400 disks only produced 16%, then it seems that perhaps the numbers published by the disk drive makers are, in relative terms, not so wildly off the mark. It would appear (from a system/sub-system perspective) that disks are relatively much more reliable than the &#8220;solid state&#8221; components. </p>
<p>I wonder how people would react if they actually knew the MTBF numbers on stuff like DRAM and CPUs? Perhaps we should all remember that silicon DOES &#8220;wear out&#8221; (in a manner of speaking).</p>
<p>All this makes me wonder why everyone assumes that Flash SSD is going to be so much more reliable than other silicon. Are we to believe the ridiculous MTBF claims of the SSD makers (Intel sez 2,000,000 hrs), given the numbers on DRAM?</p>
<p>It will be interesting to see the results on the first large-scale deployments of flash-SSD. Unfortunately it will probably be five or more years that the &#8220;free ride&#8221; for SSD continues before folks begin to realize that solid-state in not necessarily more reliable than mechanical disks&#8230;and very frequently (in the case of DRAM and CPUs) less reliable!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jered Floyd</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-2/#comment-197222</link>
		<dc:creator>Jered Floyd</dc:creator>
		<pubDate>Wed, 20 Aug 2008 21:08:37 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-197222</guid>
		<description>Robin,

A bit of a late comment here, but I think what&#039;s even more interesting than bogus MTBFs for drives is the interesting difference in bit error rate for SCSI/FC vs. SATA drives.  I just wrote an article on this, &lt;a href=&quot;http://permabit.wordpress.com/2008/08/20/are-fibre-channel-and-scsi-drives-more-reliable/&quot; rel=&quot;nofollow&quot;&gt;Are Fibre Channel and SCSI Drives More Reliable?&lt;/a&gt;  It turns out that they are, at least for RAID, and not for the reason you might suspect!  I think there&#039;s a false market segmentation going on here...

Jered Floyd
CTO, Permabit Technology Corp.</description>
		<content:encoded><![CDATA[<p>Robin,</p>
<p>A bit of a late comment here, but I think what&#8217;s even more interesting than bogus MTBFs for drives is the interesting difference in bit error rate for SCSI/FC vs. SATA drives.  I just wrote an article on this, <a href="http://permabit.wordpress.com/2008/08/20/are-fibre-channel-and-scsi-drives-more-reliable/" rel="nofollow">Are Fibre Channel and SCSI Drives More Reliable?</a>  It turns out that they are, at least for RAID, and not for the reason you might suspect!  I think there&#8217;s a false market segmentation going on here&#8230;</p>
<p>Jered Floyd<br />
CTO, Permabit Technology Corp.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: wgh</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-2/#comment-109191</link>
		<dc:creator>wgh</dc:creator>
		<pubDate>Fri, 24 Aug 2007 04:47:37 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-109191</guid>
		<description>Joe Claborn said (on February 21st, 2007 at 6:41 am):  Is this right? A MTBF of ‘only’ 300,000 hours translates in 34 years. Our disk drives seem to last about 3 years. Why the difference? 
---
I&#039;ve skimmed the above thread but didn&#039;t see anyone note that MTBF (and to a degree MTTF) should be divided by the number of drives that are in your environment to estimate how often you&#039;ll see a single drive within the environment fail. Yes, as you&#039;ve mentioned, the MTBF numbers suggest 34 yrs to fail for one drive, but if you have 10 drives in your environment you can expect one of them to fail in about 3.4 years. Just as when you have 10 men working construction there&#039;s 10 times the probability of one of them getting sick on any given day. When working in a &quot;big iron&quot; shop with thousands of RAID devices, this is (usually) taken into account. Those who say triplicate the data instead of using RAID appear to me to not be faced with needing up to date accurate data available in one location, without time available (due to SLAs) to restore or even time to fail over to a seperate set of drives. Many in mainframe environments have come to heavily rely on no down time to restore or fall over to other drives, that is unless the situation is very dire (of a disaster type). If one were to &quot;simply&quot; have three copies, as someone suggested above, then which one do you update? All three? Doing so and waiting for validation of completion of I/O would typically cause response times on heavily I/O burdened systems to degrade beyond acceptability. To not wait on validation opens a window to potential corruption to any copies that were not being synchronously updated (synchronous updates are expensive). Thus RAID. Yes, drives will fail and drives will be replaced. But a well laid out RAID array will still give the needed response times during failures, even at peak transaction time... again, I said if they&#039;re &quot;well laid out&quot;.  And yes, if the data is mission critical, such RAID arrays should be copied to another location... for the event of a disaster (including at a minimum, lightening).</description>
		<content:encoded><![CDATA[<p>Joe Claborn said (on February 21st, 2007 at 6:41 am):  Is this right? A MTBF of ‘only’ 300,000 hours translates in 34 years. Our disk drives seem to last about 3 years. Why the difference?<br />
&#8212;<br />
I&#8217;ve skimmed the above thread but didn&#8217;t see anyone note that MTBF (and to a degree MTTF) should be divided by the number of drives that are in your environment to estimate how often you&#8217;ll see a single drive within the environment fail. Yes, as you&#8217;ve mentioned, the MTBF numbers suggest 34 yrs to fail for one drive, but if you have 10 drives in your environment you can expect one of them to fail in about 3.4 years. Just as when you have 10 men working construction there&#8217;s 10 times the probability of one of them getting sick on any given day. When working in a &#8220;big iron&#8221; shop with thousands of RAID devices, this is (usually) taken into account. Those who say triplicate the data instead of using RAID appear to me to not be faced with needing up to date accurate data available in one location, without time available (due to SLAs) to restore or even time to fail over to a seperate set of drives. Many in mainframe environments have come to heavily rely on no down time to restore or fall over to other drives, that is unless the situation is very dire (of a disaster type). If one were to &#8220;simply&#8221; have three copies, as someone suggested above, then which one do you update? All three? Doing so and waiting for validation of completion of I/O would typically cause response times on heavily I/O burdened systems to degrade beyond acceptability. To not wait on validation opens a window to potential corruption to any copies that were not being synchronously updated (synchronous updates are expensive). Thus RAID. Yes, drives will fail and drives will be replaced. But a well laid out RAID array will still give the needed response times during failures, even at peak transaction time&#8230; again, I said if they&#8217;re &#8220;well laid out&#8221;.  And yes, if the data is mission critical, such RAID arrays should be copied to another location&#8230; for the event of a disaster (including at a minimum, lightening).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stephen Foskett, Pack Rat</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-2/#comment-103304</link>
		<dc:creator>Stephen Foskett, Pack Rat</dc:creator>
		<pubDate>Fri, 03 Aug 2007 14:06:23 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-103304</guid>
		<description>&lt;strong&gt;Specialized Hard Drives: Worth the Effort?...&lt;/strong&gt;

Lately, there has been a lot of buzz in the enterprise storage arena about whether so-called &#8220;enterprise drives&#8221; are really any better than plain-Jane hard drives in Enterprise applications.  This came to a head with the controversial findi...</description>
		<content:encoded><![CDATA[<p><strong>Specialized Hard Drives: Worth the Effort?&#8230;</strong></p>
<p>Lately, there has been a lot of buzz in the enterprise storage arena about whether so-called &#8220;enterprise drives&#8221; are really any better than plain-Jane hard drives in Enterprise applications.  This came to a head with the controversial findi&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: A Dutch Library</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-2/#comment-71522</link>
		<dc:creator>A Dutch Library</dc:creator>
		<pubDate>Fri, 01 Jun 2007 10:53:46 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-71522</guid>
		<description>Well, it&#039;s a bit of a late reply seeing the date that this discussion started, yet I thought it couldn&#039;t harm to add my own advise. We&#039;re all interested in making our data persistent which is quite a challenge due to media detoriation and rapid media obsolescence. The topic interested me and I&#039;m currently graduating by performing research on it for a library who is interested in digital preservation. There are many difficulties with digital preservation of which this particular one is just a minor (almost easy) part. I will save you the whole reasoning behind my conclusion since it&#039;s not yet finished (and there are probably limits to the textsize that you can post :)) but the conclusion might be helpful to some of you:

A few assumptions:
-The target storage system needs to be able to contain 10 TB worth of data
-The storage system needs to be scalable
-The storage system needs optimal data security vs. costs. (of course data triplication is nice, but most of us, libraries including, don&#039;t have that much money)
-The storage system needs to be web-accessible
-The storage system needs to be disaster-proof

If you are searching for something that should fit these needs as well, this is probably your best solution:

Two seperate servers stored at seperate locations (cheapest way of avoiding data-loss through distasters).  Configure the first server for RAID5EE (hot spare integration) and the second for RAID60 (SAN). Use 500GB enterprise drives for your first server and 500GB nearline drives for the SAN. Make the first server backup daily to the SAN. Perform nightly checkdisks so you can determine when new spare drives should be ordered. And last, but not least, make sure you have the money to buy a whole new server environment within 7 years.

That isn&#039;t anywhere near cheap, but it&#039;s most cost-effective almost 100% guarantee for preserving your data. This configuration doesn&#039;t necessarily have to be optimal for the next generation of hardware you will buy.

Perhaps noone is helped with this, but I&#039;ll be happy if it just helps Someone. Just some (nearly offtopic) sidepoints, for cheap home RAID&#039;s, check the Intel Matrix RAID solution. For future archiving, pay attention to holographic storage development. I&#039;ll save you the other random findings of my study :)</description>
		<content:encoded><![CDATA[<p>Well, it&#8217;s a bit of a late reply seeing the date that this discussion started, yet I thought it couldn&#8217;t harm to add my own advise. We&#8217;re all interested in making our data persistent which is quite a challenge due to media detoriation and rapid media obsolescence. The topic interested me and I&#8217;m currently graduating by performing research on it for a library who is interested in digital preservation. There are many difficulties with digital preservation of which this particular one is just a minor (almost easy) part. I will save you the whole reasoning behind my conclusion since it&#8217;s not yet finished (and there are probably limits to the textsize that you can post <img src='http://storagemojo.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> ) but the conclusion might be helpful to some of you:</p>
<p>A few assumptions:<br />
-The target storage system needs to be able to contain 10 TB worth of data<br />
-The storage system needs to be scalable<br />
-The storage system needs optimal data security vs. costs. (of course data triplication is nice, but most of us, libraries including, don&#8217;t have that much money)<br />
-The storage system needs to be web-accessible<br />
-The storage system needs to be disaster-proof</p>
<p>If you are searching for something that should fit these needs as well, this is probably your best solution:</p>
<p>Two seperate servers stored at seperate locations (cheapest way of avoiding data-loss through distasters).  Configure the first server for RAID5EE (hot spare integration) and the second for RAID60 (SAN). Use 500GB enterprise drives for your first server and 500GB nearline drives for the SAN. Make the first server backup daily to the SAN. Perform nightly checkdisks so you can determine when new spare drives should be ordered. And last, but not least, make sure you have the money to buy a whole new server environment within 7 years.</p>
<p>That isn&#8217;t anywhere near cheap, but it&#8217;s most cost-effective almost 100% guarantee for preserving your data. This configuration doesn&#8217;t necessarily have to be optimal for the next generation of hardware you will buy.</p>
<p>Perhaps noone is helped with this, but I&#8217;ll be happy if it just helps Someone. Just some (nearly offtopic) sidepoints, for cheap home RAID&#8217;s, check the Intel Matrix RAID solution. For future archiving, pay attention to holographic storage development. I&#8217;ll save you the other random findings of my study <img src='http://storagemojo.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Web Development Stuff &#187; Blog Archive &#187; StorageMojo » Everything You Know About Disks Is Wrong - TheV247.com</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-2/#comment-67668</link>
		<dc:creator>Web Development Stuff &#187; Blog Archive &#187; StorageMojo » Everything You Know About Disks Is Wrong - TheV247.com</dc:creator>
		<pubDate>Mon, 21 May 2007 18:48:33 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-67668</guid>
		<description>[...] StorageMojo » Everything You Know About Disks Is Wrong Everything You Know About Disks Is Wrong February 20th, 2007 by Robin Harris in Enterprise, Clusters [...]</description>
		<content:encoded><![CDATA[<p>[...] StorageMojo » Everything You Know About Disks Is Wrong Everything You Know About Disks Is Wrong February 20th, 2007 by Robin Harris in Enterprise, Clusters [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ted Fay</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-2/#comment-67493</link>
		<dc:creator>Ted Fay</dc:creator>
		<pubDate>Mon, 21 May 2007 04:56:09 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-67493</guid>
		<description>Annoymous,
Regarding you comment &quot;Are you saying we should go back to the ST-506 for reliability?&quot;

Of course not.  Radically different technologies, as you know.

Packing twice the blocks on the same physical spindle as onother drive built with the SAME TECHNOLOGY will and does result in twice the number of bad blocks for the same physical damage to, or inperfection in the platter. 

There is no free lunch, and you do indeed get what you pay for.  It doesn&#039;t show up in this study, because this study doesn&#039;t take into account the primary advantage of enterprise diks, which is twice the phyical media allocated to each block using the same platter technology as their consumer grade cousions.  

Even if FC, SAS and SATA all do inded have similar rates of failure for their mechansisms, which I wouldn&#039;t doubt, if you&#039;re willing to pay for RAID redundancy, why not media redundancy teh blocks on your platter?  

Apart from the advantages on the contoller board of FC or SAS, what your paying for is twice the saftey of the data contained on those blocks.  If you don&#039;t care about what lives on those blocks, I guarantee you someone will when they go missing. :)

Just my two cents.

-ted</description>
		<content:encoded><![CDATA[<p>Annoymous,<br />
Regarding you comment &#8220;Are you saying we should go back to the ST-506 for reliability?&#8221;</p>
<p>Of course not.  Radically different technologies, as you know.</p>
<p>Packing twice the blocks on the same physical spindle as onother drive built with the SAME TECHNOLOGY will and does result in twice the number of bad blocks for the same physical damage to, or inperfection in the platter. </p>
<p>There is no free lunch, and you do indeed get what you pay for.  It doesn&#8217;t show up in this study, because this study doesn&#8217;t take into account the primary advantage of enterprise diks, which is twice the phyical media allocated to each block using the same platter technology as their consumer grade cousions.  </p>
<p>Even if FC, SAS and SATA all do inded have similar rates of failure for their mechansisms, which I wouldn&#8217;t doubt, if you&#8217;re willing to pay for RAID redundancy, why not media redundancy teh blocks on your platter?  </p>
<p>Apart from the advantages on the contoller board of FC or SAS, what your paying for is twice the saftey of the data contained on those blocks.  If you don&#8217;t care about what lives on those blocks, I guarantee you someone will when they go missing. <img src='http://storagemojo.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Just my two cents.</p>
<p>-ted</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ted Fay</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-2/#comment-67490</link>
		<dc:creator>Ted Fay</dc:creator>
		<pubDate>Mon, 21 May 2007 04:41:05 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-67490</guid>
		<description>Bob, 

Of course I&#039;m talking about data corruption due to bad blocks, and the fact that only drive-wide hardware failures were taken into account in this study is the basis of my point.

Robin tried to dismiss my point as being architectural and not real world, yet my whole point is that this study misses some critical aspects of real world experience, which is that when you go to fetch data, and you can&#039;t get it because the blocks are bad, or you can&#039;t rebuild a portion of the data after a failure because the block are bad, then whoever needed that data is going to consider it to be a failture, regardless of whether the RAID controller labels the disk as failed or not.

Data corruption = failure.  Anyone who tells you diffrent is trying to sell you something.

-ted</description>
		<content:encoded><![CDATA[<p>Bob, </p>
<p>Of course I&#8217;m talking about data corruption due to bad blocks, and the fact that only drive-wide hardware failures were taken into account in this study is the basis of my point.</p>
<p>Robin tried to dismiss my point as being architectural and not real world, yet my whole point is that this study misses some critical aspects of real world experience, which is that when you go to fetch data, and you can&#8217;t get it because the blocks are bad, or you can&#8217;t rebuild a portion of the data after a failure because the block are bad, then whoever needed that data is going to consider it to be a failture, regardless of whether the RAID controller labels the disk as failed or not.</p>
<p>Data corruption = failure.  Anyone who tells you diffrent is trying to sell you something.</p>
<p>-ted</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: From a cost perspective..</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-2/#comment-54855</link>
		<dc:creator>From a cost perspective..</dc:creator>
		<pubDate>Mon, 23 Apr 2007 13:11:48 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-54855</guid>
		<description>For those taking all this information/comments/thoughts into consideration for real world applications, some cost data to consider...

On a current &quot;big iron&quot; application, we made the chang from SATA drives to Fibre drives before implementation this past year.  Storage costs increased exactly 100% for the same amount of storage, not 400 to 600% as has been suggested.  So, if you&#039;re thinking of doubling or &quot;tripling&quot; up on SATA, look at the costs also.

Facility costs on &quot;big iron&quot; projects are huge.  The costs to double, or triple, up the space to stand up SATA and the added costs for cooling these drives over a period of years can be staggering.

Now if your just looking at a simple &quot;one for one&quot; replacment Fibre with SATA, with the same size of storage in the end, then it&#039;s worth looking into because storage costs could be reduced by half.

As an example our costs could be reduced from $4 million to $2.  I&#039;ll be taking a look, and will have to make a complicated business decision.</description>
		<content:encoded><![CDATA[<p>For those taking all this information/comments/thoughts into consideration for real world applications, some cost data to consider&#8230;</p>
<p>On a current &#8220;big iron&#8221; application, we made the chang from SATA drives to Fibre drives before implementation this past year.  Storage costs increased exactly 100% for the same amount of storage, not 400 to 600% as has been suggested.  So, if you&#8217;re thinking of doubling or &#8220;tripling&#8221; up on SATA, look at the costs also.</p>
<p>Facility costs on &#8220;big iron&#8221; projects are huge.  The costs to double, or triple, up the space to stand up SATA and the added costs for cooling these drives over a period of years can be staggering.</p>
<p>Now if your just looking at a simple &#8220;one for one&#8221; replacment Fibre with SATA, with the same size of storage in the end, then it&#8217;s worth looking into because storage costs could be reduced by half.</p>
<p>As an example our costs could be reduced from $4 million to $2.  I&#8217;ll be taking a look, and will have to make a complicated business decision.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: clockwinder</title>
		<link>http://storagemojo.com/2007/02/20/everything-you-know-about-disks-is-wrong/comment-page-2/#comment-43224</link>
		<dc:creator>clockwinder</dc:creator>
		<pubDate>Tue, 27 Mar 2007 16:40:20 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/?p=383#comment-43224</guid>
		<description>Permanent data storage?? The hard part is getting rid of stuff you no longer need. I have lived with failures of 9-track tape, Dat tape, winchester technology drives, CD platters, 80-column punch cards, and punched paper tape.  Information Week a number of years ago published a survey on longevity of storage media (not quite the same thing as disk drive longevity).  Worst was cheap mag tape.  Then hard disk.  Then high-quality CD ( guessed at reliable for 50-75 years).  Most reliable was acid-free paper, good for probably 500 years or more.  In this case, we have actual examples!
Gigabytes per page?  It depends... dont throw the books away yet, folks!</description>
		<content:encoded><![CDATA[<p>Permanent data storage?? The hard part is getting rid of stuff you no longer need. I have lived with failures of 9-track tape, Dat tape, winchester technology drives, CD platters, 80-column punch cards, and punched paper tape.  Information Week a number of years ago published a survey on longevity of storage media (not quite the same thing as disk drive longevity).  Worst was cheap mag tape.  Then hard disk.  Then high-quality CD ( guessed at reliable for 50-75 years).  Most reliable was acid-free paper, good for probably 500 years or more.  In this case, we have actual examples!<br />
Gigabytes per page?  It depends&#8230; dont throw the books away yet, folks!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
