I’ve been ranting about data loss on Storage Bits. Data loss makes me irate because I see regular folks who know nothing about computers struggling with the fallout and it is so unnecessary.

The stimulus was a fine PhD thesis IRON File Systems (pdf) by Vijayan Prabhakaran, now of Microsoft Labs, exploring how commodity file systems corrupt data by injecting errors into ext3, ReiserFS, JFS, XFS and NTFS and then recording their responses.

Dr. Prabhakaran built an error-injection framework that enabled him to control what kind of errors the file system would see so he could document how the FS handled them. These errors include:

  • Failure type: read or write? If read: latent sector fault or block corruption? Does the machine crash before or after certain block failures?
  • Block type: directory block; super block? Specific inode or block numbers could be specified as well.
  • Transient or permanent fault?

Sure enough, he found a lot of bugs in the file systems, even though, due to its proprietary nature, he couldn’t get as deep into NTFS as the others.

From our analysis results, we find that the technology used by high-end systems (e.g., checksumming, disk scrubbing, and so on) has not filtered down to the realm of commodity file systems. Across all platforms, we find ad hoc failure handling and a great deal of illogical inconsistency in failure policy, often due to the diffusion of failure handling code through the kernel; such inconsistency leads to substantially different detection and recovery strategies under similar fault scenarios, resulting in unpredictable and often undesirable fault-handling strategies.


We also discover that most systems implement portions of their failure policy incorrectly; the presence of bugs in the implementations demonstrates the difficulty and complexity of correctly handling certain classes of disk failure. We observe little tolerance to transient failures; most file systems assume a single temporarily-inaccessible block indicates a fatal whole-disk failure. We show that none of the file systems can recover from partial disk failures, due to a lack of in-disk redundancy.

This is what the EMC Centera is running on. Feeling better?

As hardware gets more reliable, software is a bigger problem
Software is always buggy, and with Moore’s Law, we have more software at more levels of the storage stack. File systems need to be the enforcers of data integrity in the storage stack since only file systems know where every block is and what every block is supposed to have in it.

The marketing problem
From my small-town perch, working with computer naifs, I know that most folks have absolutely no idea if a problem is caused by a lame file system or not. So how do you make people care?

I don’t think you can. People don’t care about whether their car has a timing belt or a timing chain, until they realize 2 things: first, it costs money to replace a belt and; second, timing chains don’t require replacement. Most folks will never put the two together.

All the vendor can do is add up all the features, like timing chains, electronic ignitions and platinum-tipped spark plugs and offer “no tune-ups for 100,000 miles.” People understand that, especially if you remember when a tune-up every 3,000 miles was common.

Sell the benefit, not the technology.

The StorageMojo take
One of the things I love about my other blog is that it exposes me to something closer to consumer thinking. On the one hand there are folks who understand some things about the technology – such as “clean power is good” – and don’t get, say, why a file system should be concerned with disk drive problems. It is partly education and partly cognitive.

But I think I also see something else: an emotional need for storage confidence; an unwillingness to confront the idea that storage systems fail. At one level I get it. Paranoia is time-consuming and not very productive.

But unlike CPU’s and networks, storage is all about persistence. For all its faults the industry cares deeply about that. How do we tap into the consumer’s concern for persistence in a way that spurs action rather than denial? I’m hoping Apple is coming up with some good ideas as they prepare to roll out Time Machine and ZFS.

Comments welcome, as always. I didn’t try to evaluate Vijayan’s architectural solution as that is beyond my competence. Somebody want to take a look at it and give us the pros and cons?