A post last month in ACM’s Queue raised a disturbing point around block-level deduplication in flash SSDs: it could hose your file system.
De-dup is a Good Thing, right?
Researchers found that at least 1 Sandforce SSD controller – the SF1200 – does block-level deduplication by default. Many file systems write critical metadata to multiple blocks in case one copy gets corrupted. But what if, unbeknownst to you, your SSD de-duplicates that block, leaving your file system with only 1 copy?
Yup, corruption of 1 block could wipe out your entire file system. And since all the “copies” point to the same corrupted block, there’s no way to recover.
Most Unix superblock-based FSs and ZFS could be pooched by loss of a single block. NTFS also mirrors critical metafile info and could be vulnerable as well.
To be fair, AFAIK no one has reported this failure in the wild, so it is conjecture today. That said, it may have happened to people who didn’t realize what went wrong.
But in the world of storage, if something can happen it will, usually at the worst possible time. Have you seen a total data loss on an otherwise functioning SSD?
The StorageMojo take
I’ve made calls to a number of vendors to get their responses, including Sandforce, Intel, Texas Memory Systems and OCZ. With any luck we’ll soon have a 1st pass on who does what to your data.
Don’t panic: not all SSD controllers do this. Texas Memory Systems controllers don’t, partly because they don’t use MLC flash and partly because minimizing capacity use and maximizing data availability are conflicting goals, and they chose the availability over capacity.
Also note that the SF-1200 is offered as a consumer grade controller. Not clear what Sandforce does with the rest of their line, but their site does repeatedly reference their “DuraWrite” technology which appears to include block-level dedup.
Just last week StorageMojo recommended faster adoption of SSDs in the enterprise – and still does. But this once again underlines the need for mirroring. The sooner we find these issues, the sooner they’ll be fixed.
Watch the comments for vendor info, and I’ll update this post with more info if and when it develops.
Update:Here is the Sandforce response:
In the recent article by David Rosenthal he mentions a conversation with Kirk McKusik and the ZFS team at Sun Microsystems (Oracle). That conversation explains why it is critical that meta data not be lost or corrupted. He goes on to say that “If the stored metadata gets corrupted, the corruption will apply to all copies, so recovery is impossible.”
SandForce employs a feature called DuraWrite which enables flash memory to last longer through innovative patent pending techniques. Although SandForce has not disclosed the specific operation of DuraWrite and its 100% lossless write reduction techniques, the concept of deduplication, compression, and data differencing is certainly related. Through all the years of development and OEM testing with our SSD manufacturers and top tier storage users, there has not been a single reported failure of the DuraWrite engine. There is no more likelihood of DuraWrite loosing data than if it was not present.
We completely agree that any loss of metadata is likely to corrupt access to the underlying data. That is why SandForce created RAISE (Redundant Array of Independent Silicon Elements) and includes it on every SSD that uses a SandForce SSD Processor. All storage devices include ECC protection to minimize the potential that a bit can be lost and corrupt data. Not only do SandForce SSD Processors employ ECC protection enabling an UBER (Uncorrectable Bit Error Rate) of greater than 10^-17, if the ECC engine is unable to correct the bit error RAISE will step in to correct a complete failure of an entire sector, page, or block.
This combination of ECC and RAISE protection provides a resulting UBER of 10^-29 virtually eliminates the probabilities of data corruption. This combined protection is much higher than any other currently shipping SSD or HDD solution we know about. The fact that ZFS stores up to three copies of the metadata and optionally can replicate user data is not an issue. All data stored on a SandForce Driven SSD is viewed critical and protected with the highest level of certainty.
Readers: how does that sound to you?
Update 2: Oddly enough, the Sandforce web site specifies the SD-1200 controller at
ECC Recovery: Up to 24 bytes correctable per 512-byte sector
Unrecoverable Read Errors: Less than 1 sector per 1016 bits read
which is about where many enterprise disk drives spec’d – and quite a bit less than 10-29. Hmm-m.
End update 2.
Spoke to James Myers of Intel. He said that no current Intel SSD uses any form of compression, including dedup. He also cautioned against making too much of the risk: after all, you’d have to have an unrecoverable read error AND it would have to be that 1 critical block. Perhaps, he suggested, file systems that do use multiple copies of critical FS metadata could slightly alter the copies to eliminate the possibility of deduplication.
End update 3.
Courteous comments welcome, of course. TMS has been advertising on StorageMojo for a couple of years.