Can flash SSDs be trusted?

by Robin Harris on Monday, 20 June, 2011

IT pros are always skeptical about new technology. Is it surprising that flash SSD’s are getting the gimlet eye?

The big worry seems to be endurance. Nobody wants to buy an expensive SSD and have it fail after a year on the job.

But IT infrastructures are designed to manage endurance failures. LTO tape, for example, is specified for a few hundred head passes. Yet tape is the paragon of data persistence.

Hard drive failure rates aren’t low enough for any of us to consider storing important data on one without backup. So why are IT pros so skittish about flash SSD’s?

Experience
Or rather, lack of experience. Flash SSDs are evolving rapidly, with new generations arriving every 12 to 18 months.

It takes time for experience with new models to percolate. In the meantime, bad experiences with earlier generation drives continue to circulate.

Vendor secrecy about failure rates and modes doesn’t help. Until the Bianca Schroeder/Google/CMU disk drive studies were released 4 years ago, we had no independent large-scale reliability data.

I hope it won’t take 20 years before we get that information on SSD’s. How about it, vendors?

Reliability
SSD’s may turn out to be more reliable than hard drives but I won’t believe it until I see independent data. The lack of moving parts is a plus but about half the failures and this drives come from the electronics not the spinning bits. SSDs have most of the same electronics.

SSD equivalent of a head disk assembly
Plane failures are a major trouble spot. Each die consists of two planes. These planes are prone to sudden failure, wiping out half the data on a die.

Most chip carriers contain multiple stacked dies, so a plane failure will remove anywhere from a quarter to an eighth of the chip’s total storage. Most flash controllers lay out the data in ways similar to a RAID array to guard against data loss.

What to look for
Since Maxtor’s well-deserved demise we’ve had reasonable parity between disk drives and disk drive vendors. But that is not the case with the still maturing flash drive market.

Storage Newsletter recently published a list of 85 SSD vendors, most of whom none of us have heard of. Many are focused on the embedded systems market, but also because the SSD market barriers to entry are small: buy controller chip; buy flash on the spot market; gen up a PC board and voilà you are in the SSD market.

But flash that ends up on the spot market at rock-bottom prices is often marginal. The big buyers, like Apple, get first dibs on the best.

SSDs made with spot-market flash and a no-name – USB thumb drive? – controller will have a lot more problems. Which is to say that in today’s SSD market brandnames count.

Other things to look for are a guarantee of total write capacity. Another is a statement on the amount of over provisioning the drive has.

Even better: a five-year guarantee such as Seagate popularized with disks and that Intel just started offering on one of its SSD lines.

The StorageMojo take
I have been as skeptical as anyone on SSDs – read some of my earliest posts – but the time for skepticism has passed. Of course, perform careful evals on any new IT product. But the best flash SSD’s are ready for the enterprise today.

And here’s an even more radical conclusion: the best consumer SSD’s are ready for the enterprise as well. Using any SATA drives in your enterprise?

The key: how is the SSD architected into the system? If it is storage tier the data has to be protected just like a RAID array. If it is a cache you have more flexibility – as long as the data is also on disk.

Yes, it’s more difficult to separate the wheat from the chaff in the SSD market today. But there are quality products available today.

Courteous comments welcome, of course.
Started thinking about this is result of the research project I did a few months ago. Leading-edge storage managers with workloads that would benefit enormously by flash SSD’s weren’t seriously evaluating them today. Big surprise. What do you think?

{ 11 comments… read them below or add one }

Sanguy June 21, 2011 at 3:30 pm

Also due to the lower cost of entry to become a SSD vendor as Robin points out the quality bar has fallen quite a bit – even Maxtor had to have some level of quality control when building mechanical drives, but many of these SSD vendors are just slapping a pretty sticker on a reference-design product from Sandforce or other controller vendor.

With many of the SSD vendors just using an ‘off the shelf’ controller ‘cookie cutter product’ the desire to differentiate their products with special features often turns to the firmware as it’s the cheapest and easiest way to do so.

This can manifest itself in many ways – custom firmware development which can be good and bad. In the case of large companies with good quality processes (Intel’s 510 series using a Marvell controller with custom firmware) it can be a non-issue, but unfortunately there are many more examples of vendors releasing buggy crap as they don’t have the resources and processes to properly regression test which clearly is a risk to the data put on such a device. I for one don’t want my Debbie does Dallas collection trusted to firmware that’s been built, tested, and released all on the same day.

The other common, and equally concerning, pattern is releasing alpha/beta products/firmware as ‘production’ to get a jump on competitors selling the identical product. OCZ has recently done this with the Vertex-3 drive – and a quick trip to their forums one quickly sees the suffering of the users due to this action. While OCZ decided to rush product out without proper test cycles to get the early sales jump, some of their competitors openly stated they would not ship due to quality issues. One needs to carefully evaluate these sorts of vendor antics.

Obviously the companies concerned with product quality are at a disadvantage as they appear to not move as quickly, but at the same time hopefully someone recognizes them for the stability of their products.

Jacob Marley June 21, 2011 at 9:17 pm
Rocky June 22, 2011 at 7:44 am

Independent hard drive studies showed much worse silent bit corruption than widely believed.

Are SSDs magically immune to silent bit corruption?

Buggy hard drive or controller firmware (or specific combinations) caused many of the problems, and that was with 1/10 the suppliers we have in the SSD market now.

How long must we wait until Facebook, Google, Amazon, or CERN gain hundreds of thousands of SSD-years of experience and write a paper titled “Bit Rot Deja Vu”?

David June 22, 2011 at 2:54 pm

I’m sure we already used similar things like PCI-E boards stuffed with DRAM in our Fujitsu servers. . . max out at 12GB or so, if I recall correctly.

For DB deployments, I might consider it; for web server or storage, no.

SSD makes sense in notebooks and some desktops, but we’re not too concerned about heat and noise and battery life in the data center so much as cost per GB (or TB, I suppose) aka data density.

And, don’t forget about TRIM implementation.

unknownvariable June 23, 2011 at 8:23 am

Robin,

When you say SSDs are ready for the enterprise, are you referring to the MLC consumer drives (say Intel 510), or the SLC drives Dell sells in it’s servers/arrays for $1500 ea (as an example)?

Robin Harris June 23, 2011 at 8:54 am

unkv, that question is analogous to asking “are consumer SATA drives ready for the enterprise?” The answer to the latter question is clearly yes – though some diehards would still disagree – but the more interesting question is: where are consumer SATA drives appropriate for the enterprise?” Bulk storage, sure. But what about RAID? In some cases, yes, but others, no. OK, then how about the RAID ready SATA drives?

Looking at the technology and the use cases, there is no doubt that high quality consumer SSD can be usefully employed in the enterprise, much as today’s SATA drives are. Not everywhere – enterprise SSDs have some important features – but in many cases.

white label online backup June 23, 2011 at 11:52 am

The trouble is that SSD’s have not been out for long enough, and with a stable enough implementation, for a good analysis of what to expect. There are wild variations within SSD’s in terms of controllers, chip technology, and the like, all of which will affect the longevity.

The majority of controllers and drivers are also not optimized for SSDs, which can also affect longevity and performance (e.g. TRIM isn’t present in many situations even several years after SSD’s became available!)

In september 2008 IDC wrote a study (Sponsored by toshiba) that talked to longevity, and the story wasn’t wonderful for an enterprise server deployment, but things have improved since then.

Here’s the key: use the tool in the right way and you’re fine. RAID up SSDs for redundancy for your DB, and have a budget to replace them as needed, and you’ll get better performance, and that’s it.

Horses for courses.

unknownvariable June 23, 2011 at 12:07 pm

Hi

While analogous.. it is like asking the question 10-15yrs ago when the answer might not be so obvious. :)

Thanks for the reply.

Jean June 30, 2011 at 5:22 am
Greg Reiter September 16, 2011 at 8:03 pm

Okay, this is sort of a follow up to my SSD dilemma earlier in the year when I was comparing my OCZ 256 Vertex II SSD versus my Seagate Momentus XT Hybrid drive. After running the OCZ as a system drive in my MBP and using the OWC Data Doubler in my Optical Bay inside of the laptop I came to a conclusion; The OCZ drive started my applications quickly, and of course fast boot times. I had all of my virtual instruments on my Momentus XT. I quickly discovered that by switching the two drives around that I had much better overall performance. The Momentus XT provided very similar boot times and opening of applications. The virtual instruments worked much better on the SSD. Since then I ditched my Vertex II for a Crucial M4 512gb drive which is great at 6gbs!

Now here’s the key: Wear Leveling! The OCZ Vertex II did not have this, as most SATA II SSD’s do not. The latest generation seem to have some flavor of this wear leveling. Time will tell, but just the fact that they figured this out is very hopeful for future functionality.

Okay, to bring my comment closer to this subject; I recently bought up a few 2tb drives at 5900 RPM to host all of my massively bulky video and audio data. After research, the word is that the slower the RPM, the lower the failure rate. Then I have a bunch of 1tb 7200 rpm hard drives for all of my video and audio project work. And I run my two Crucial M4 6gbs 512’s for my virtual instruments. It seems to be the magical combination at this moment in time. However the OCZ PCIe SSD storage is showing reads as fast as 740mbs! Zounds! These will eventually find a place in my Mac Pro for certain!

I think that the classic hard drive’s days are numbered, in my opinion.
The dollar per GB is beginning to make sense, but needs a little way to go (hehehe).

All SSD systems will be the future, and the reliability rate will improve as it seems that this technology will only get better and better in time.
I think the key to long SSD life will simply be reasonable cooling to keep the chips from cooking.
Hard Drives are noisy (most of them), spinning, whirring, heavy things, that be something we used to talk about soon.

I can’t wait for 2tb SSDs and larger! My current 2tb drives will be opened up and turned into art work.

-my two cents! :)

sei August 20, 2012 at 8:25 am

[b]Greg[/b]: The problem with SSDs isn’t cooling, but the fact that flash inherently loses charge over time, and that rewrites degrade the cells. They are good for speed, but not so good for archival use.

At the basic flash level the trend so far is actually reduced durability with newer generations. At the higher level this is compensated for with more error correction.

Prices aside, a major reason current SSDs aren’t going to replace HDDs is that the data is much more volatile. The more rewrites the flash has undergone, the shorter the data is retained. My understanding is that the current JEDEC standard for SSDs requires, for flash cells at their end of rated rewritable life, just 1 year of retention for consumer drives. I don’t know what it is for new cells.

Maybe once prices approach HDD levels the trend will shift toward better endurance and retention, but it hasn’t happened so far.

Leave a Comment

{ 1 trackback }

Previous post:

Next post: