<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: CERN&#8217;s data corruption research</title>
	<atom:link href="http://storagemojo.com/2007/09/19/cerns-data-corruption-research/feed/" rel="self" type="application/rss+xml" />
	<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/</link>
	<description>Data storage info &#38; analysis</description>
	<lastBuildDate>Fri, 19 Mar 2010 09:23:11 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: NDMP The Protocol, It&#8217;s All About The Storage&#8230; &#171; Bob Porras &#8211; Blog</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-206235</link>
		<dc:creator>NDMP The Protocol, It&#8217;s All About The Storage&#8230; &#171; Bob Porras &#8211; Blog</dc:creator>
		<pubDate>Mon, 26 Oct 2009 20:53:21 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-206235</guid>
		<description>[...] are the OpenSolaris projects ADM and MMS which are bringing storage archive management to the ZFS file system.&#160; Couple all these activities together make one extremely busy storage community [...]</description>
		<content:encoded><![CDATA[<p>[...] are the OpenSolaris projects ADM and MMS which are bringing storage archive management to the ZFS file system.&nbsp; Couple all these activities together make one extremely busy storage community [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: MatrixStore &#187; Blog Archive &#187; Top Ten Differences between Disk-based Archive &#38; Disk-based Storage</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-166769</link>
		<dc:creator>MatrixStore &#187; Blog Archive &#187; Top Ten Differences between Disk-based Archive &#38; Disk-based Storage</dc:creator>
		<pubDate>Sun, 27 Jan 2008 15:04:54 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-166769</guid>
		<description>[...] e.g.: http://storagemojo.com/2007/09/19/cerns-data-corruption-research/ [...]</description>
		<content:encoded><![CDATA[<p>[...] e.g.: <a href="http://storagemojo.com/2007/09/19/cerns-data-corruption-research/" rel="nofollow">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Email checksums &#171; FastMail.FM Weblog</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-156304</link>
		<dc:creator>Email checksums &#171; FastMail.FM Weblog</dc:creator>
		<pubDate>Mon, 17 Dec 2007 02:45:55 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-156304</guid>
		<description>[...] Most people don&#8217;t think corruption is an issue, but recent research by CERN has shown that with today&#8217;s large hard drives, this is a potentially serious problem, with an estimated corruption rate of 3 files in every TB of data. In most cases, corruption of data is a silent problem that people don&#8217;t realise has happened until they need the data. [...]</description>
		<content:encoded><![CDATA[<p>[...] Most people don&#8217;t think corruption is an issue, but recent research by CERN has shown that with today&#8217;s large hard drives, this is a potentially serious problem, with an estimated corruption rate of 3 files in every TB of data. In most cases, corruption of data is a silent problem that people don&#8217;t realise has happened until they need the data. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emmanuel Florac</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-142512</link>
		<dc:creator>Emmanuel Florac</dc:creator>
		<pubDate>Wed, 07 Nov 2007 13:01:22 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-142512</guid>
		<description>DataDirect Networks S2A9550 uses a form of RAID-4, and validate checksums on the fly while reading and writing. It uses some sort of CRC checksumming, and validates all the data path from the controller to the disk, too; the CRC allows it to use 1 or 2 parity channels.
I don&#039;t know how it actually manage parity errors at the drive level, though.</description>
		<content:encoded><![CDATA[<p>DataDirect Networks S2A9550 uses a form of RAID-4, and validate checksums on the fly while reading and writing. It uses some sort of CRC checksumming, and validates all the data path from the controller to the disk, too; the CRC allows it to use 1 or 2 parity channels.<br />
I don&#8217;t know how it actually manage parity errors at the drive level, though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bill Todd</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-130163</link>
		<dc:creator>Bill Todd</dc:creator>
		<pubDate>Sat, 06 Oct 2007 14:38:23 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-130163</guid>
		<description>The only RAIDs that commonly validate parity on every read are RAID-3 (and IIRC RAID-2) configurations - and not even all of them do.  The common RAIDs (1 and 5) virtually never validate the entire stripe on every read, because people who desire that behavior should be using RAID-3 instead (choosing RAID-1, -4, or -5 indicates their preference for improved performance; if they want RAID-3-style validation, RAID-3 will perform better due to its spindle synchronization).  Furthermore, even when RAID-3 validates the entire stripe, if it discovers a problem it can&#039;t establish whether the problem is with some portion of the data or with the recorded parity - all it can do is propagate the error up the chain.

Read-after-write validation is occasionally offered as an integrity option but is seldom used - again, for performance reasons.  And it can&#039;t correct data trashed by a wild write:  it can only ensure that the correct location is also written.

The virtues you claim for ZFS are surpassed in WAFL, which collects updates in its NVRAM and then coalesces them to write back to disk as a single request - and it doesn&#039;t have to do so as often, since the NVRAM stabilizes the data without requiring as frequent &#039;syncs&#039; to disk.  Your closing comments above are so vague (and apparently confused) that I&#039;ve really got to wonder whether you know what you&#039;re talking about in this area - but I&#039;m always willing to be educated if I&#039;m missing something, so if you&#039;d like to be more specific feel free to continue.

- bill</description>
		<content:encoded><![CDATA[<p>The only RAIDs that commonly validate parity on every read are RAID-3 (and IIRC RAID-2) configurations &#8211; and not even all of them do.  The common RAIDs (1 and 5) virtually never validate the entire stripe on every read, because people who desire that behavior should be using RAID-3 instead (choosing RAID-1, -4, or -5 indicates their preference for improved performance; if they want RAID-3-style validation, RAID-3 will perform better due to its spindle synchronization).  Furthermore, even when RAID-3 validates the entire stripe, if it discovers a problem it can&#8217;t establish whether the problem is with some portion of the data or with the recorded parity &#8211; all it can do is propagate the error up the chain.</p>
<p>Read-after-write validation is occasionally offered as an integrity option but is seldom used &#8211; again, for performance reasons.  And it can&#8217;t correct data trashed by a wild write:  it can only ensure that the correct location is also written.</p>
<p>The virtues you claim for ZFS are surpassed in WAFL, which collects updates in its NVRAM and then coalesces them to write back to disk as a single request &#8211; and it doesn&#8217;t have to do so as often, since the NVRAM stabilizes the data without requiring as frequent &#8217;syncs&#8217; to disk.  Your closing comments above are so vague (and apparently confused) that I&#8217;ve really got to wonder whether you know what you&#8217;re talking about in this area &#8211; but I&#8217;m always willing to be educated if I&#8217;m missing something, so if you&#8217;d like to be more specific feel free to continue.</p>
<p>- bill</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: xfer_rdy</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-129941</link>
		<dc:creator>xfer_rdy</dc:creator>
		<pubDate>Sat, 06 Oct 2007 04:08:56 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-129941</guid>
		<description>Well Bill, to be honest with you, I was really hoping you didn&#039;t mean the disk drive.  Your are correct in the fact the can write in the wrong spot occasionally.  Been there too many times in the last few years I&#039;ve personally experienced that feature. But, a well designed raid system &quot;should&quot; catch that type of issue. Unfortunately because of performance degradation, some raid sytems do not validate the parity on each read or read after write operation. They only validate read data (off the disk) if they have a failed or removed drive from the raid set.  You can ask, &quot;what happenes if all the drives fail the same exact way?&quot;.  Then you have a design flaw, and all the error detection on earth may not help.  

ZFS is my current favorite not due to write anywhere, which I feel is a very risky behavior for a file system, but because of the low transaction overhead.  Its one of the things they have gotten right. I personally file systems should behave very predicatibly, the less variance, better chance for reliability.  One major advantage is write allocation coalescingand meta data written to anywhere on the disk, but I think meta data writes should be under a little closer control for workload, capacity, performance planning, consistent workaround for undocumented features and repeatable operation.  I tend to treat application servers more like big dedicated embedded systems. Especially for synchronous clustering and remote data synchronization. Treating a server and its systems like a general purpose computer, well, then you may as well put your apps on a desktop.  I&#039;m not saying that WAFL is junk, it just doesn&#039;t scale linearly as I like. I have a preference for very linear scaling servers. Its a personal perference - no I don&#039;t have a real favorite for linear scaling either ( I get asked that often after being on a soapbox).</description>
		<content:encoded><![CDATA[<p>Well Bill, to be honest with you, I was really hoping you didn&#8217;t mean the disk drive.  Your are correct in the fact the can write in the wrong spot occasionally.  Been there too many times in the last few years I&#8217;ve personally experienced that feature. But, a well designed raid system &#8220;should&#8221; catch that type of issue. Unfortunately because of performance degradation, some raid sytems do not validate the parity on each read or read after write operation. They only validate read data (off the disk) if they have a failed or removed drive from the raid set.  You can ask, &#8220;what happenes if all the drives fail the same exact way?&#8221;.  Then you have a design flaw, and all the error detection on earth may not help.  </p>
<p>ZFS is my current favorite not due to write anywhere, which I feel is a very risky behavior for a file system, but because of the low transaction overhead.  Its one of the things they have gotten right. I personally file systems should behave very predicatibly, the less variance, better chance for reliability.  One major advantage is write allocation coalescingand meta data written to anywhere on the disk, but I think meta data writes should be under a little closer control for workload, capacity, performance planning, consistent workaround for undocumented features and repeatable operation.  I tend to treat application servers more like big dedicated embedded systems. Especially for synchronous clustering and remote data synchronization. Treating a server and its systems like a general purpose computer, well, then you may as well put your apps on a desktop.  I&#8217;m not saying that WAFL is junk, it just doesn&#8217;t scale linearly as I like. I have a preference for very linear scaling servers. Its a personal perference &#8211; no I don&#8217;t have a real favorite for linear scaling either ( I get asked that often after being on a soapbox).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bill Todd</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-129847</link>
		<dc:creator>Bill Todd</dc:creator>
		<pubDate>Fri, 05 Oct 2007 23:44:57 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-129847</guid>
		<description>Methinks you don&#039;t understand what lost and wild writes are:  they&#039;re not the result of the file system failing to write or writing in the wrong place, they&#039;re the result of the *disk* failing to write or writing in the wrong place (and failing to report any error as a result).

These are indeed data integrity deficiencies in the block interface.  A lost write cannot be detected by mechanisms such as DIF, nor can the fact that a wild write did not update the block it was supposed to - but at least when the block that a wild write updated in error is subsequently read the DIF mechanism should catch that.

I&#039;m also curious about precisely what worries you about WAFL - especially since ZFS (which you seem to prefer) also uses a very similar &#039;write-anywhere&#039; approach.

- bill</description>
		<content:encoded><![CDATA[<p>Methinks you don&#8217;t understand what lost and wild writes are:  they&#8217;re not the result of the file system failing to write or writing in the wrong place, they&#8217;re the result of the *disk* failing to write or writing in the wrong place (and failing to report any error as a result).</p>
<p>These are indeed data integrity deficiencies in the block interface.  A lost write cannot be detected by mechanisms such as DIF, nor can the fact that a wild write did not update the block it was supposed to &#8211; but at least when the block that a wild write updated in error is subsequently read the DIF mechanism should catch that.</p>
<p>I&#8217;m also curious about precisely what worries you about WAFL &#8211; especially since ZFS (which you seem to prefer) also uses a very similar &#8216;write-anywhere&#8217; approach.</p>
<p>- bill</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: xfer_rdy</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-129786</link>
		<dc:creator>xfer_rdy</dc:creator>
		<pubDate>Fri, 05 Oct 2007 20:57:37 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-129786</guid>
		<description>Hi Robin, glad you liked the reference. Sorry it took so long to get back on the wire, the investor in a new project backed out at the last minute, so I caught scrambling to either find a new investor or another consulting contract. I still need to pay the mortgage... 

Back onto T10 DIF.... The DIF is a block data integrity scheme targeting the transfered between the CPU/system memory and the storage medium.  DIF wasn&#039;t intended to enforce or detect data integrity outside of the block interface. It also works well for volume to volume block copies. Its not intended to detect missing blocks, ie the file system forgot to request or write data and the app reads stale data from the buffer. File based data integrity checks need to occur at another layer in the system. This case one size does not fit all.  

ZFS is my new favorite technology of the year.  More so than WAFL, no insult to the WAFL advocates. Working with different remote data synchonization technologies since the mid &#039;80&#039;s any &quot;anywhere&quot; file system sends up redflags and flashbacks of long torch(turous)  nights.  When the sun is shining, everyone is your friend, but once the storm hits... with marginal interconnects between data stores, live systems that cannot go down, hard SLAs and long resync periods,  life quickly becomes interesting.  Its better than nothing at all.</description>
		<content:encoded><![CDATA[<p>Hi Robin, glad you liked the reference. Sorry it took so long to get back on the wire, the investor in a new project backed out at the last minute, so I caught scrambling to either find a new investor or another consulting contract. I still need to pay the mortgage&#8230; </p>
<p>Back onto T10 DIF&#8230;. The DIF is a block data integrity scheme targeting the transfered between the CPU/system memory and the storage medium.  DIF wasn&#8217;t intended to enforce or detect data integrity outside of the block interface. It also works well for volume to volume block copies. Its not intended to detect missing blocks, ie the file system forgot to request or write data and the app reads stale data from the buffer. File based data integrity checks need to occur at another layer in the system. This case one size does not fit all.  </p>
<p>ZFS is my new favorite technology of the year.  More so than WAFL, no insult to the WAFL advocates. Working with different remote data synchonization technologies since the mid &#8217;80&#8217;s any &#8220;anywhere&#8221; file system sends up redflags and flashbacks of long torch(turous)  nights.  When the sun is shining, everyone is your friend, but once the storm hits&#8230; with marginal interconnects between data stores, live systems that cannot go down, hard SLAs and long resync periods,  life quickly becomes interesting.  Its better than nothing at all.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bill Todd</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-128375</link>
		<dc:creator>Bill Todd</dc:creator>
		<pubDate>Tue, 02 Oct 2007 10:23:19 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-128375</guid>
		<description>(A third submission that disappeared without a trace:)

As I&#039;ve observed before, calling ZFS &quot;the only game in town&quot; in this respect really isn&#039;t fair to WAFL.  WAFL provides the same level of end-to-end validation that ZFS does:  the main difference is that by virtue of running only on a file server rather than as a local file system WAFL can&#039;t provide end-to-end protection all the way to client RAM - but it does provide end-to-end protection from server RAM to disk and back again, and the normal network checksums provide protection from client RAM to server RAM and back again, so the validation is scarcely weaker than local ZFS protection.

- bill</description>
		<content:encoded><![CDATA[<p>(A third submission that disappeared without a trace:)</p>
<p>As I&#8217;ve observed before, calling ZFS &#8220;the only game in town&#8221; in this respect really isn&#8217;t fair to WAFL.  WAFL provides the same level of end-to-end validation that ZFS does:  the main difference is that by virtue of running only on a file server rather than as a local file system WAFL can&#8217;t provide end-to-end protection all the way to client RAM &#8211; but it does provide end-to-end protection from server RAM to disk and back again, and the normal network checksums provide protection from client RAM to server RAM and back again, so the validation is scarcely weaker than local ZFS protection.</p>
<p>- bill</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robin Harris</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-126273</link>
		<dc:creator>Robin Harris</dc:creator>
		<pubDate>Wed, 26 Sep 2007 01:02:48 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-126273</guid>
		<description>All,

The 10^7 number underscores the urgency of end-to-end data protection. So far, ZFS seems to be the only game in town.

The specific causes of the data corruption at CERN don&#039;t seem that interesting given that they seem to be spread out over the entire I/O system. The data chain is so long and subject to so many kinds of error that errors are pretty much baked into the system. If you weren&#039;t CERN would you even figure out the WD/3ware problem?

This is on top of the fact that RAID 5 no longer protects our large drives from URE&#039;s. The whole data protection model is up for a serious re-think.

xfer, thanks for the T10-DIF reference. Hadn&#039;t heard of that, even though I agree with Bill&#039;s comment. Half a loaf is better than none.

Rob, good catch on the CERN using the PDF. Honestly though I save a lot of interesting web pages as PDF&#039;s myself and use an iPhoto like PDF viewer to cruise through them.

Robin</description>
		<content:encoded><![CDATA[<p>All,</p>
<p>The 10^7 number underscores the urgency of end-to-end data protection. So far, ZFS seems to be the only game in town.</p>
<p>The specific causes of the data corruption at CERN don&#8217;t seem that interesting given that they seem to be spread out over the entire I/O system. The data chain is so long and subject to so many kinds of error that errors are pretty much baked into the system. If you weren&#8217;t CERN would you even figure out the WD/3ware problem?</p>
<p>This is on top of the fact that RAID 5 no longer protects our large drives from URE&#8217;s. The whole data protection model is up for a serious re-think.</p>
<p>xfer, thanks for the T10-DIF reference. Hadn&#8217;t heard of that, even though I agree with Bill&#8217;s comment. Half a loaf is better than none.</p>
<p>Rob, good catch on the CERN using the PDF. Honestly though I save a lot of interesting web pages as PDF&#8217;s myself and use an iPhoto like PDF viewer to cruise through them.</p>
<p>Robin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bill Todd</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-125696</link>
		<dc:creator>Bill Todd</dc:creator>
		<pubDate>Mon, 24 Sep 2007 13:00:43 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-125696</guid>
		<description>The usual problem with approaches like the T10-DIF is that they still don&#039;t catch &#039;wild&#039; or &#039;lost&#039; writes, since they write the validation information along with the data that it validates.  Only a mechanism which writes some reasonable form of &#039;checksum&#039; separately from the data that it protects (and then checks it on every subsequent read) can handle these kinds of errors (which admittedly don&#039;t fall into the same category of errors characterized by BERs).

Both ZFS and its older progenitor WAFL do provide this form of separate checksuming (truly end-to-end in ZFS&#039;s case, only end-to-end-within-server in WAFL&#039;s case).  It&#039;s interesting that the CERN study reportedly stated that this form of protection doubles the disk write overhead, since reasonable implementations can have far lower average impact than that.

- bill</description>
		<content:encoded><![CDATA[<p>The usual problem with approaches like the T10-DIF is that they still don&#8217;t catch &#8216;wild&#8217; or &#8216;lost&#8217; writes, since they write the validation information along with the data that it validates.  Only a mechanism which writes some reasonable form of &#8216;checksum&#8217; separately from the data that it protects (and then checks it on every subsequent read) can handle these kinds of errors (which admittedly don&#8217;t fall into the same category of errors characterized by BERs).</p>
<p>Both ZFS and its older progenitor WAFL do provide this form of separate checksuming (truly end-to-end in ZFS&#8217;s case, only end-to-end-within-server in WAFL&#8217;s case).  It&#8217;s interesting that the CERN study reportedly stated that this form of protection doubles the disk write overhead, since reasonable implementations can have far lower average impact than that.</p>
<p>- bill</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jealous</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-125345</link>
		<dc:creator>jealous</dc:creator>
		<pubDate>Sun, 23 Sep 2007 15:52:56 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-125345</guid>
		<description>How do they get their corruption levels so low? 
-jealous</description>
		<content:encoded><![CDATA[<p>How do they get their corruption levels so low?<br />
-jealous</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: xfer_rdy</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-124486</link>
		<dc:creator>xfer_rdy</dc:creator>
		<pubDate>Fri, 21 Sep 2007 04:36:44 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-124486</guid>
		<description>Robbin, 

Your right, the errors experienced at CERN are not unusual and many times are not repeatable.  I have 20TB and another 10Tb array, I see bad data juju once a month.

The 3 * 10^7 BER, is not unexpected. I&#039;m glad to see someone has finally blogged and brought the issue out of the closet. Its also good to see an end user has finally come out and disclosed the  data errors they&#039;re actually experiencing.  

There is a lot of work done in the telco and satcom to model and correct different types of errors now experienced in the data storage arena.  

Sometimes in the open systems world, unaddressed critical issues like data integrity is treated like the &quot;crazy aunt locked in the basement&quot; --- no one wants to talk about.  In the past, the open systems computer industry typically responds to this particular issue by passing the buck... Drive mfgs say &quot;its the raid controller&#039;s problem&quot;, the raid controller mfgs push it off to the host computer systems, who in turn, push the problem off to the OS, where it either gets pushed back down the food chain or passed up to the applications to deal with. No one in the industry really wants to take ownership of this issue, its a complex problem, it requires participation from several diffent compnent vendors and requires significant changes to products lines. In other words.... its expensive to fix. 

Are there any heroes in this story of corruption and buck passing intrigue ? That is besides Robin and his blogs. 

In 2003, the T10 group took on an effort to standardize an &quot;end to end&quot; error detection scheme. The called it the T10 Data Integrity Field standard, commonly referred to as T10-DIF. It doesn&#039;t correct the data corruptions, but at least they can be detected, until the data capacities get larger.  T10-DIF uses an additional data field (DIF) generated by the host and accompanies the data from the host application to the disk and is returned to the host during data read operations. If any discrepancies occur between the DIF and the data during read operations, the host should able to detect it.  The spec is still a work in progress in some areas. Although not complete, T10-DIF still provides significant value to storage data integrity issues.  

The good new about T10-DIF ? - almost all 2GB fibre channel and Infini-band, all 4GB fibre channel HBAs and some midrange disk arrays support it.  Also, there is a newly formed Data Integrity Initiative (Oracle/Emulex/LSI/Seagate) that will try to iron out the rest of the technical issues and try to insure all their products will interoperate at some level.

The bad news of T10-DIF...  not yet supported by SAS, iSCSI and not supported by any low cost disk arrays.  Also, their&#039;s no independent authority validating that T10-DIF operates properly across products and platforms and there is no specification classifying different levels of DIF support ( a marketing jackbox).  Most of these issues are temporary and should be worked out in the next 24 months.


We should see T10-DIF rolling out in most midrange arrays in the next 12 to 24 months.  When is it going to reach the lower end of the markets, when the three 800 pound gorillas demand it.  Is it going to help your 1 TB drive in your PC ? Sorry, you&#039;ll have to wait another 10 years.</description>
		<content:encoded><![CDATA[<p>Robbin, </p>
<p>Your right, the errors experienced at CERN are not unusual and many times are not repeatable.  I have 20TB and another 10Tb array, I see bad data juju once a month.</p>
<p>The 3 * 10^7 BER, is not unexpected. I&#8217;m glad to see someone has finally blogged and brought the issue out of the closet. Its also good to see an end user has finally come out and disclosed the  data errors they&#8217;re actually experiencing.  </p>
<p>There is a lot of work done in the telco and satcom to model and correct different types of errors now experienced in the data storage arena.  </p>
<p>Sometimes in the open systems world, unaddressed critical issues like data integrity is treated like the &#8220;crazy aunt locked in the basement&#8221; &#8212; no one wants to talk about.  In the past, the open systems computer industry typically responds to this particular issue by passing the buck&#8230; Drive mfgs say &#8220;its the raid controller&#8217;s problem&#8221;, the raid controller mfgs push it off to the host computer systems, who in turn, push the problem off to the OS, where it either gets pushed back down the food chain or passed up to the applications to deal with. No one in the industry really wants to take ownership of this issue, its a complex problem, it requires participation from several diffent compnent vendors and requires significant changes to products lines. In other words&#8230;. its expensive to fix. </p>
<p>Are there any heroes in this story of corruption and buck passing intrigue ? That is besides Robin and his blogs. </p>
<p>In 2003, the T10 group took on an effort to standardize an &#8220;end to end&#8221; error detection scheme. The called it the T10 Data Integrity Field standard, commonly referred to as T10-DIF. It doesn&#8217;t correct the data corruptions, but at least they can be detected, until the data capacities get larger.  T10-DIF uses an additional data field (DIF) generated by the host and accompanies the data from the host application to the disk and is returned to the host during data read operations. If any discrepancies occur between the DIF and the data during read operations, the host should able to detect it.  The spec is still a work in progress in some areas. Although not complete, T10-DIF still provides significant value to storage data integrity issues.  </p>
<p>The good new about T10-DIF ? &#8211; almost all 2GB fibre channel and Infini-band, all 4GB fibre channel HBAs and some midrange disk arrays support it.  Also, there is a newly formed Data Integrity Initiative (Oracle/Emulex/LSI/Seagate) that will try to iron out the rest of the technical issues and try to insure all their products will interoperate at some level.</p>
<p>The bad news of T10-DIF&#8230;  not yet supported by SAS, iSCSI and not supported by any low cost disk arrays.  Also, their&#8217;s no independent authority validating that T10-DIF operates properly across products and platforms and there is no specification classifying different levels of DIF support ( a marketing jackbox).  Most of these issues are temporary and should be worked out in the next 24 months.</p>
<p>We should see T10-DIF rolling out in most midrange arrays in the next 12 to 24 months.  When is it going to reach the lower end of the markets, when the three 800 pound gorillas demand it.  Is it going to help your 1 TB drive in your PC ? Sorry, you&#8217;ll have to wait another 10 years.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rob Mueller</title>
		<link>http://storagemojo.com/2007/09/19/cerns-data-corruption-research/comment-page-1/#comment-124133</link>
		<dc:creator>Rob Mueller</dc:creator>
		<pubDate>Thu, 20 Sep 2007 03:48:17 +0000</pubDate>
		<guid isPermaLink="false">http://storagemojo.com/2007/09/19/cerns-data-corruption-research/#comment-124133</guid>
		<description>One interesting question not answered is about transient errors. If you read a block from disk and get an error, do you still get an error if you read it again?

The way ZFS uses checksums on all blocks really helps with the integrity. In generally reading over all the ZFS documentation, they&#039;ve thought hard about integrity issue and seem to have covered a lot of bases others are missing.

One bit of irony with the link you have to CERN above. Where was the world wide web created? At CERN as an &quot;information management system&quot; exactly for distributing information like this (http://en.wikipedia.org/wiki/World_Wide_Web#History). So what format is the paper at CERN in? PDF of course. *sigh*</description>
		<content:encoded><![CDATA[<p>One interesting question not answered is about transient errors. If you read a block from disk and get an error, do you still get an error if you read it again?</p>
<p>The way ZFS uses checksums on all blocks really helps with the integrity. In generally reading over all the ZFS documentation, they&#8217;ve thought hard about integrity issue and seem to have covered a lot of bases others are missing.</p>
<p>One bit of irony with the link you have to CERN above. Where was the world wide web created? At CERN as an &#8220;information management system&#8221; exactly for distributing information like this (<a href="http://en.wikipedia.org/wiki/World_Wide_Web#History" rel="nofollow">http://en.wikipedia.org/wiki/World_Wide_Web#History</a>). So what format is the paper at CERN in? PDF of course. *sigh*</p>
]]></content:encoded>
	</item>
</channel>
</rss>
