Architecting & integrating flash into enterprise storage

by Robin Harris | Thursday, May 16, 2013 | Architecture, Disk, Enterprise, SSD/Flash/NVRAM | 5 comments

Have you ever noticed that it is difficult to get good information about how flash works? The vendors know but they’ve never been terribly forthcoming.

For example, how does flash wear out? When most things break you lose their contents. But once flash stops working your data is still there. Huh?

And the fact that flash is a wearing medium spooks many people. How should we think about flash? Can we live with a wearing medium?

Or write amplification? How does that work? What can be done to reduce it?

That’s why it was a pleasure to sit down with Rob Ober of LSI. Rob is an LSI Fellow and system architect with deep technical knowledge of flash and how it interacts with systems and applications.

Rob holds dozens of patents and is articulate and open. Plus he’s a very nice guy.

I distilled down what I learned and some of Rob’s key points into a StorageMojo video white paper that LSI commissioned. If you are curious about flash, how it works, how it fails and how it can be turned into an enterprise class storage medium, you’ll find the video informative.

At least I did my level best to make it so, including video from Wilson Canyon, one of my favorite local hikes. Here’s the video:

The StorageMojo take
As a thought experiment I sometimes wonder about how storage would be different if IBM had invented flash back in 1956 instead of the RAMAC disk drive. What it reads were fast and free while writes were expensive?

That’s essentially the problem we’re trying to solve today. Except today we have an installed base of a couple billion disk drives and decades of driver, OS and application development all predicated on disk performance.

We’re still in the early days of flash integration, even though forward-leaning architects have been working on it for 6 years or more. Thanks to flash – and cloud – storage has never been more vibrant or exciting.

Courteous comments welcome, of course. Feel free to ask about anything in the video that wasn’t clear or didn’t go deep enough. Your questions help me understand what you find valuable.

5 Comments

James B. on Friday, 17 May, 2013 at 9:02 pm

Hi Robin.

I thought the video was watchable and helpful at explaining SSD write issues.

A suggestion would be to go a little deeper. I wish the video explained the difference between SLC and MLC flash w.r.t. write issues and consumer/enterprise positioning.

Thanks, James.
rockmelon on Friday, 24 May, 2013 at 12:38 am

Hi Robin,

you warned us that it was a commissioned piece, so we got what what we deserved. A bit of a pitch for the Sandforce technology, I thought.

As the previous commenter mentioned, there was no mention of SLC vs MLC NAND, and therein lies an amazing development – the switch from SLC to MLC in Enterprise NAND. Sure, Sandforce/LSI compression helps with write amplification, but all the top tier vendors are now using MLC for enterprise deployment. You can still buy SLC, but I believe a big chunk of the NAND is now configurable for how many voltage comparators you would like to use – i.e. can be configured with one comparator for 1-bit/cell or 2 for 2-bits/cell.

Intel warrant their DC S3700 series for 10 drive writes per day for 5 years.

That’s 8TB of writes per day to Intel’s top capacity 800GB DC S3700. Folks need to analyze the IO profiles of their enterprise apps. They don’t approach anything like this. There is really no need to factor in how compressible your data is. The big names all excel. It’s all about the amount of NAND dedicated to ECC, from which they recover as well as you would hope from ECC errors in DRAM.

Intel, Fusion IO, Sandisk, HGST – they are all up there with enterprise grade MLC without compression (unless some have started recently).

I’m amazed the industry has not turned on a dime towards non-tiered MLC NAND. It’s not a major cost barrier any more and folks like Pure Storage have seen it and are going for it. It makes sense, regardless of whose enterprise NAND you are using. I know a couple of the folks there, that’s all, they think like me and I have no interest to declare, apart from wishing them well.

The (non-IBM plug-compatible) industry has been quick to change transports – SMD, IPI, SCSI, FC, SAS – but are still stuck on mechanical HDDs, sometimes augmented with a flash caching tier. Weird. What I would really like is for enterprise HDD vendors to bring back the multiple parrallel head technology we last saw with IPI-2. HDDs value proposition becomes bandwidth, but now they lag SSDs on performance, though winning on price/performance. Dual/Multiple parrallel head technology will improve their price/performance position.

Most SSDs are designed to go read-only when they hit their write limit, FYI.

I only caught the tail end of the System 360 tape based era, but I see the analogy. Everyone is still fixated on HDD based storage, augmented by tiering with flash for the “forward thinkers”. But the economics show flash to be way ahead of HDDs, which is why almost all recent TPC-C results have switched to flash. HDDs are becoming a nearline/archival medium.

HDDs are the new tape.

Cheers,

Rocky
Robin Harris on Wednesday, 29 May, 2013 at 11:36 am

“you warned us that it was a commissioned piece, so we got what what we deserved. A bit of a pitch for the Sandforce technology, I thought.”

My fault: I was impressed to hear that LSI had sold 5 million MegaRAID cards. So I thought how they integrated flash into those cards would be of broad interest. Plus, Mr. Ober, an accomplished architect, was far more forthcoming than most flash folks about how flash actually works.

I’d like to hear from other readers: how was the video? My goal is to make interesting videos that inform – like a good white paper – while illustrating a particular issue. Since vendors hire me, like they do people who write white papers, the videos have a focus on a particular vendor’s issues.

Robin
KD Mann on Friday, 14 June, 2013 at 6:25 am

Great video Robin.

FYI, there is objective proof for LSI’s claims in the form of the venerable TPC-C and TPC-E benchmarks.

IBM dominates these IOPS driven benchmarks because they figured out the LSI-Sandforce equation even before LSI did.

TPC is particularly relevant to customers because it forces vendors to (a) disclose fully (b) submit to audit, and (c) report the entire system BOM and pricing so that the cost/performance can also be calculated.

http://www.tpc.org/4052

IBM’s LSI-Sandforce result is fully two years old now and nothing has come close in terms of transactions-per-second-per-dollar.

I think the LSI Sandforce acquisition is perhaps the most under-appreciated deal in storage in the last several years.
rockmelon on Tuesday, 18 June, 2013 at 11:03 pm

Hi Robin,

LSI may have sold 5M MegaRAID cards, but that of course would include many millions of plain SCSI/SAS HBAs such as the IBM M1015 with no flash.

The way the flash is integrated in a Nytro card is that it’s just a regular SAS controller ASIC and instead of disk connectors, 4 special form factor Sandisk SSDs mounted.

The other way LSI use flash is with their MegaRAID disk controllers allowing one (?or more) of the disks to be SSDs and nominated to tier the HDDs.

I’m not knocking LSI – their products are good. But they are not way out in front of other enterprise SSD vendors. It’s just that the other vendors only provide specifics of internals under NDA.

Rocky