Comments on: Why aren’t disk reads more reliable?

By: Johannes

Johannes — Fri, 16 Oct 2009 06:52:14 +0000

This has nothing to do with RAID-5. If disk was as unreliable as you claim we would have major problems every day. A lot of people transfer 12 Tb of data from disks without any RAID protections and doesn’t experience any read errors. Why?

The answer is that URE should be applied per sector since URE is after ECC and multiple read retries of a sector. The error rate is then 1 in 512 bytes (sector size) * 10^14. That’s one per 51 petabytes read data!

I’ve written about this in my blog in Swedish but Google can translate it quite good for you: http://translate.google.com/translate?hl=sv&sl=sv&tl=en&u=http%3A%2F%2Fwww.teknikhemmet.se%2Fblog.php%2F2009%2Fraid-5-fungerar-fortfarande%2F

By: Turning the Page on RAID | Stephen Foskett, Pack Rat

Turning the Page on RAID | Stephen Foskett, Pack Rat — Sun, 14 Sep 2008 07:03:40 +0000

[…] issues. As drives have become larger, the tiny chance of an unrecoverable media error compounds, becoming a certainty. Even dual-parity will not be able to guarantee data protection on the massive disks predicted for […]

By: Ryan

Ryan — Fri, 03 Aug 2007 20:38:49 +0000

It’s even worse than just rebuilds… A ‘normal’ RAID 5 doesn’t check parity on host read operations. On a large array, that means there is a good chance of being handed bad data without knowing it.

There is at least one vendor (check who supplies many of the the top500 super computers list including #1 for a clue on who) that does:
8+1 with parity constancy verification on all reads as long as it is in 8+1 mode
8+2 where it will /correct/ bit errors on as the correct data goes to the host, flag it, and write the correct bits back out to the drive.

They also do partial drive rebuilds for drives that ‘lag’ for a bit, vs having to rebuild full capacity of 1TB drives when slow for just a moment, it lets you bring the drive up and only rebuild what has changed while the drive was ‘away’.

By: Bill Todd

Bill Todd — Wed, 25 Jul 2007 02:04:58 +0000

I suspect that disk manufacturers understand the trade-off between density and reliability pretty well, and that at most only in marginal areas do they weigh capacity too heavily over reliability.

E.g., if you halved the capacity, you’d have to attain sufficiently greater reliability to make the RAID-5 array more reliable than a RAID-10 array (because otherwise you could leave things as they are and the user could make that decision). And that’s just considering reliability by itself: the RAID-10 array that the higher densities make economically feasible also provides considerably better performance.

Secondly, if you’re using RAID-5 you’re already essentially saying that performance takes a back-seat to economy, so moving to RAID-6 is no big deal (at least as long as RAID-6 is sufficiently commoditized to avoid costing significantly more, which it certainly can be in software-RAID situations).

Thirdly, many common configurations (specifically, those requiring off-site replication) can prop up RAID-5 in the URE department: if you replicate disk-to-disk (actually, it can be even more flexible than this as soon as someone creates the right product) then having duplicate RAID-5 arrays at two sites (or even just a plain data copy at the backup site) drives exposure to UREs back down into the negligible category.

So RAID-5 will continue to occupy a useful niche in the pantheon of replication strategies, and to the degree that this niche narrows RAID-6 will be the beneficiary. And disk manufacturers will continue to increase capacities (or build smaller-form-factor disks that trade capacity for increased performance) rather than complicate their product lines by adding a significant dimension of ‘reliability’ to the existing dimensions of capacity and performance (since reliability can – and in fact to some degree always must – be better addressed at the system level).

By: Open Systems guy

Open Systems guy — Sun, 22 Jul 2007 18:28:38 +0000

“My point in that post is that as SATA disk drive capacity continues to increase, and the unrecoverable read error (URE) rate remains constant, the time will come – 2009? – when every RAID 5 disk failure will be likely to encounter a URE during rebuild.”

As disks become more dense, it will indeed become harder to manage because all the other non-capacity specifications (throughput, IOs per second) stay about the same. Many people end up buying smaller, faster disks for the most important data. Even if a couple of 1TB drives in a RAID would do the job, most business that are serious about their storage will buy smaller, faster drives to reduce the chance of an URE.

The next leap in drive technology will probably not be storage density, it will probably be access speed (sequentially and randomly speaking). Flash shows promise for this, as does holographic technology.

Holographics are interesting because the only theoretical limit would be the sensitivity of the laser receptors and the speed of the servos that have to move them.

By: Robert Pearson

Robert Pearson — Sun, 22 Jul 2007 10:22:31 +0000

RE: “Backups are only obsolete if one of the unique locations you mentioned is offsite. This is often necessary for business continuity reasons.”

Thanks for the feedback , David.
I have really enjoyed your site since I discovered it here on StorageMojo.

In an effort to be brief, the real point I was referring to has to do with this.
I have been working on the iSCSI â€œSpeed Limit of the Information Universeâ€ numbers. A corollary to throughput is the â€œgreenâ€ cost. Is it â€œgreenerâ€ to write slower? How slow is too slow?

There might be a point in the not too distant future when the â€œgreenâ€ cost of creating, replicating and de-storing Information is the dominant cost?
In an Energy World gone madâ€¦

If I have an online copy for Local Disasters then a 4th online copy that is geographically dispersed may not be of much value relative to the “green” cost.
A removable copy, stored in a geographically dispersed, but considered highly safe from Disasters site, may have lower “green” cost, i.e. Flash or DVD versus tape, and be more effective by being totally flexible.
Have Complete Backups, Will Travel!

By: the storage anarchist

the storage anarchist — Fri, 20 Jul 2007 20:00:33 +0000

A 6+2 RAID 6 group requires the same number of spares as two 3+1 groups, so you really won’t have to buy more storage to get the same usable – at least, not with most storage arrays that support 8 or more disk drives. Symmetrix supports R5 3+1 & 7+1, or R6 6+2 and 14+2; other systems are perhaps even more flexible. And in massively cached high-end arrays, the response time for R5 and R6 are virtually indistinguishable except under the heaviest paint-peeling workloads (which few systems operate under for any significant length of time).

So if your assertions come true, the answer is simple: RAID 6 everywhere.

By: Joerg M.

Joerg M. — Fri, 20 Jul 2007 19:49:32 +0000

ZFS helps with this problem in an addtional way. Selective Resilvering in case of a drive failure. You donÂ´t have to do a complete resync of the harddrive when itÂ´s not completly filled. It only resyncs the used parts of the harddrive.

@Steven: The disc scans doesnÂ´t help you. The data on the rotating rust may be correct, but something in the way from the rust to the SATA plug can corrupt the data. Most of the time this is the source of unrecoverable errors.

By: Wes Felter

Wes Felter — Fri, 20 Jul 2007 19:16:04 +0000

Reliability should increase when vendors move to 4KB sectors which use more efficient ECC.

By: Steven

Steven — Fri, 20 Jul 2007 14:53:46 +0000

Actually, “most” enterprise class disk vendors are consistently doing media scans of both the live volume blocks as well as the parity. This allows for areas that have the traditional URE to be corrected long before a disk failure. I do see your point though. I think that as SATA disks continue to increase in size, even media scans will take too long to complete in a timely fashion. Controller manufacturers like LSI have this as a tunable within the array management software. I think the places where you are going to run into problems are in the grow it at home NAS solutions that are based on commodity hardware platforms that have no ability to do this level of proactive protection.

By: David Magda

David Magda — Fri, 20 Jul 2007 14:47:18 +0000

Robert,

Backups are only obsolete if one of the unique locations you mentioned is offsite. This is often necessary for business continuity reasons.

There’s also archiving, which while technically different from backups, often uses the same infrastructure (e.g., Legato, NetBackup, tapes, etc.).

By: David Magda

David Magda — Fri, 20 Jul 2007 14:37:09 +0000

The (personal) take of someone at Sun:

My stance to this topic is a little bit different. In my personal opinion, Filesystems and RAID technology without strong checksum will be impracticability. You doesnÂ´t need exaclty RAID6 when you have different means to ensure data integrity.

Of course RAID-Z doesn't save you from a dual drive loss, it can ensure that URE can be recovered from during rebuild.

By: PJ

PJ — Fri, 20 Jul 2007 13:47:09 +0000

>Endlessly pushing capacity as the only metric only guarantees an ever faster treadmill.

As you pointed out, however, this treadmill is about to come to an end; so to an extent pushing capacity was/is fine, in the category of “let’s solve problem A (capacity) before we move on to solving problems we don’t have yet (reliability).” The trick is, as you point out, to switch to other metrics once capacity is considered a ‘solved problem’.

By: Robert Pearson

Robert Pearson — Fri, 20 Jul 2007 06:57:09 +0000

In the 1997-1998 time frame, another support guy and myself tried to convince Management that we should invest in Storage profiling software from the Storage vendor. The purpose would be so we could monitor the “State of Health” of all disk drives in all Storage from that vendor.
The goal was to identify failing or “candidates for failing” disk drives before they failed and forced a rebuild. We ran all RAID5.

We proposed that doing this would give us data, over time, that would allow us to replace disk drives before they became a problem based on firmware and operating information. The example we used was 30% or 1/3 of the drives every 36 months. To be really safe we recommended that we buy drives in Mass Quantities and replace 1/3 of them every 12 months. This would vary depending on the drive specs. Each generation is different.

We both found other jobs and left after we were told our “Careers?” were over.

The Cost/Benefit Analysis I did showed the Benefits ROI was about 10 times the TCO to do this. The biggest problems were scheduling the down-time and the threat this “appeared” to present to the Backup Group.

There were other factors that made this scenario good for that environment. I only recommend it for environments where the ROI/TCO ratio is determined to be high enough.

Backups are becoming more obsolete.
The only scenario that makes sense, to me, for eCommerce or eBusiness is to make each write to three unique locations. I actually believe a fourth write to SSD, Flash or some solid state “removable media” is going to become necessary. Particularly if you do the “Pace Layering” analysis of your Managed Units of Information and integrate that with the “Long Tail” ROI/TCO.
You might be very surprised what you learn?

Even RAIDVD would be good, if it were fast enough. The removable is only for Disaster Recovery. Most shops will need online Recovery for everyday Local Disasters.