Flash and the re-architecting of storage

by Robin Harris | Tuesday, May 17, 2011 | Architecture, SOHO/SMB, SSD/Flash/NVRAM | 14 comments

JIm Gray’s comment that disk is the new tape is truer today than it was 8 years ago. We’ve been adding caches, striping disks, modifying applications and performing other unnatural acts to both reduce and accommodate random reads and writes to disk.

Flash changes the calculus of 20 years of storage engineering. Flash gives us abundant random reads – something hard drives are poor at – and reasonable random writes to whatever hot data we choose.

In a feverish burst of design and investment we’ve tried flash everywhere in the storage stack: disks; PCI cards; motherboards; controllers; built-in tiering; and appliances. These products have been focused on enterprise datacenters or very targeted applications where the cost of flash was justifiable.

But clarity is emerging. It isn’t so much where you put the flash as what you ask the flash to do. There are three requirements:

Valuable data. Flash is an order of magnitude more costly than disk.
Often accessed. If not, leave it on disk.
Enables new functionality and/or lowers cost. If it doesn’t, why bother?

The buyer’s burden
These requirements frame a basic point: optimizing for flash requires a systems level approach. Adding flash can make current architectures go faster, but that isn’t the big win.

Buyers looking for an economic edge must make a cognitive leap: the old ways are no longer best. Flash enables efficiencies and capabilities in smaller systems that only costly enterprise gear had a few years ago.

Tiering
Tiered flash solutions are the most common approach today. Tiering software has improved in recent years, making the movement of data between flash and disk safe, fast and granular.

Weâ€™ve started to at least see interest in the midsize enterprise, like the EqualLogic hybrid SAS/SSD array in VDI deployments.

Metadata and cache
The best fit for flash today is metadata and caching. These best meet the requirements for value, access and functionality.

Once metadata is freed from disk constraints we can combine it with caching to build high-performance systems on commodity hardware. The win for innovators is to design new metadata structures and caching algorithms for flash.

They can design the (write) data layouts to best take advantage of the physics of disk and flash, such as with Nimble Storageâ€™s CASL architecture, which combines a large flash cache with full-stripe writes, is one example.

Flash is also an important enabler for low-cost de-duplication because it’s cheaper to keep block metadata – fingerprints or hash codes – in flash than it is in RAM. Some vendors are encouraging the use of de-duplicated storage for midrange primary storage, enabled by flash indexes or caches that make it feasible to reconstruct files on-the-fly.

The StorageMojo take
Shaking off the effects of 50 years of disk-based limitations isn’t easy. Our disk-based orthodoxy is ingrained in architectures and our thinking.

But buyers face a difficult job: evaluating architectures and algorithms to choose products for eval. A shortcut: look for architectures that collapse existing storage stovepipes to reduce cost, total data stored and operational complexity. The three are related and offer the big wins.

In the last 10 years raw disk capacity cost has dropped to less than a 10th of what they were, but the cost of traditional storage systems haven’t. The culprits: operating costs; storage network infrastructure costs; and capacity requirements that have risen faster than management productivity.

The flood of data continues to rise, but cost and complexity doesn’t have to rise with it. We can – and are – doing better.

Courteous comments welcome, of course. I’ve been working with Nimble Storage lately and like what they’ve done.

14 Comments

natmaka on Wednesday, 18 May, 2011 at 5:54 am

“Often accessed. If not, leave it on disk.”… isn’t _latency_ another pertinent parameter, aren’t some rarely needed data more useful if quickly read?
Robin Harris on Wednesday, 18 May, 2011 at 6:14 am

Nat,

In theory, yes. Got any real-world examples?

Robin
- on Wednesday, 18 May, 2011 at 10:45 am

Wasn’t Youtube was said to store data on tape, but cache front of each file to reduce latency?
Jean on Wednesday, 18 May, 2011 at 5:19 pm

On the fly data compression can help data storage on disk.

Strangely enough it never been popular compare to by default on tape. This data compression technology was available on mainframe tier 1 disk via StorageTek 9200 back in mid 90’s. Each ESCON channel had compression chip and as data got in the box it was compress like any tape drive does since years. Back then it was also the world first thin provisioning box leveraging this extra capacity.

With today compression chip are so fast they can even encrypt data as fast as SSD disk can write or read. This way you can protect and add more GB on the same physical disk using a cheap chip on each disk SAS controller.

Compression and dedup will be the next trend on individual disk. Not only at the storage array level like it is today.

Once again old mainframe technology adapted to today’s problem…

One day open systems guy will learn that mainframe seen the same problems and solved it…25 years ago…;-)
Richard on Thursday, 19 May, 2011 at 6:40 am

I tend to find (within the bounds of my experience) Pareto’s Principle applies to lots of data – the 80% of data we don’t need _right_now_ (and usually left untouched for months, perhaps years at a time), so your comment “If not, leave it on disk.” made me chuckle. Screw disk, leave it on *tape*. One things the disk-centric vendors will never do is *pay my electricity bill for me*.

Applying Pareto twice (the bottom 80% of the bottom 80%), returns around 64% of my data that a good HSM file system can move down to tape.

Flash has no problem keeping LTO buffers stuffed so writing out to tape is tremendously reliable and fast and I can leave tiny files and stubs for larger files on disk in the middle (between solid-state and tape – about 16 – 32% of total storage, YMMV) for latency-sensitive reads from archive. It staggers me that people are paying over-the odds for sub-millisecond access times to the majority of data they don’t need _right_now_.

If disk is the new tape – only in as much as “disk is dead”. Solid-state is king and as long as energy prices keep going up, Tape is risen (and innovating! Shock! Horror!)
John (other John) on Thursday, 19 May, 2011 at 2:48 pm

Richard,

Pareto applies, until you suddenly need that outlier in discovery!

And what then, if it is in a unsearchable archive offsite?

Ooops, that is Autonomy’s scaremonger pitch . ..

I’m not saying i work in a litigous environment, either, but it’s amazing how you can throw away profitability because someone said something which at first is throwaway, which later becomes significant. Maybe it’s just a human clue, how they think, and you deal better with them later.

The single answer i have to all of this is:

Keep your people.

Memory works better than any search or archive. Memory gets discussed. Search appliances get disconnected because of utility bills. .

yours,

– john
Document Storage on Friday, 20 May, 2011 at 6:06 am

Any examples how costly it could be in flash..

@ John Memory works better John, But cannot compare Archive, you can pull out all the examples of future.

Regards
Nancy
John (other John) on Friday, 20 May, 2011 at 1:30 pm

Hi Nancy,

yes, you are right. I forget where i put my file on how right you are 🙂

I was tilting at cultural memory, and that idea that aural histories with all the retelling involved are very good ways to understand information structure and our preception of it. I like to think we’ve got a deliberately self-deprecating, almost nasty sense of humor going on here. It puts some off, i can tell you. But it’s pure fun to take an interview, and say, “Okay, I’m now going to take the **** out of us for half an hour, after that, you tell me how we could have avoided shooting ourselves in the foot.” I don’t really think we’re unusual in having many good tales of utter stuipdity to relate! This is double good, because if someone is sniffing us out, the stories get back to our competitors that we are a bunch of lame no hopers 🙂

. . .

Since this was originally about flash storage, adding flash boot drives just allowed us to skip half a upgrade cycle. It’s a bit of accounting trickery, but played very well since Intel just got seriously aggressive on price/perf, and critically, power, which was not happening in our original window. I can now go back to humming and ahhing and pick and choose where to put kit and money. One example of this, is the 510 series Intel drives. We deal in a lot of media, and that kind of sequential read used to mean 10 or 15k RPM drives, tenned up in a sixpack, a fat LSI card, watching out at even humble office floor PDU levels, and let’s not start on noise pollution as these things warm up and stress the aircon. Note to London property devs out there, your per foot ratings for power supply, even at the absolute top end, keeps us in much humbler, owned, digs. Not saying i want a fancy office, but i know that at some point it’s the done thing to have somewhere just a bit smart. Really, seriously, check the specs if you disbelieve me, it’s bring a candle and maybe a push bike dyno, if you use genuine desktop processing.

So, our big wins are purchase scheduling, and power consumption. Because we’re private, we have a super agressive amortization schedule too.

What i’m saying here, is that outside of very intensive jobs, where all that Rob is saying about systems thinking, basically tearing up the DB calls one by one, and lower at the FS thinking, something always on the agenda if we can, i think we already get a triple hit of goodness.

So, it’s good, but i totally agree this is what i would call a demand sell, not a cold call – the user has to understand their case. That might not bode too well, because longer term, flash storage does need tons of detail work. It has to compete with even 7k RPM SATA, when at scale.

very best from me,

– j
Jacob Marley on Friday, 20 May, 2011 at 1:44 pm

@Richard,

Generally speaking, the idea behind HSM is sound and very much valid in a flash enabled world.

Think of flash as another HSM layer between RAM and Disk.

Flash can mask disk latency and/or reduce disk power requirements by using fewer/slower/higher capacity disks.

As the costs of disks continues to drop, the savings/costs of a HSM tape library only makes sense for really, really large data sets.

Flash just pushes that dollar goal posts further down field.
Richard on Monday, 23 May, 2011 at 9:21 am

@Jacob,

Quite so. Of course it’s all “buffering” to some extent – taken to its logical extreme RAM is a buffer between L2 cache & disk, L1 between CPU registers and L2. Perhaps at the far end Tape is a buffer before print and etched granite!
(I’m sure the guy who carved the Rosetta Stone would boast to us “It’s still readable, beat that!” 😉

Indeed I do tend to deal with large datasets, (one of our clients archives 2TB/day) so tape is definitely necessary, it keeps accountants very happy.
Jay Shenoy on Tuesday, 24 May, 2011 at 10:09 am

Flash is not all alike, and there’s the SLC vs MLC debate, which largely seems to be going down to “MLC is necessary for storage as SLC is too expensive, and we will work around the 10X worse endurance”. The idea, presumably, is to ride on the coat tails of consumer flash.

But, just as this is happening, consumer applications are starting to move to TLC, which takes another 10X hit in endurance from MLC (& 100X from SLC). So, the questions are:

– Will storage be a large enough consumer of flash to merit its own flash device optimization? Catch 22 situation here, to become big it has to become cost effective while being good enough…

– Collectively we seem to think the limitations of MLC can be worked around for server storage applications. Does that optimism extend to TLC? If we’re still planning to ride the cost of consumer flash?

There’s a danger/chance that flash remains a niche if there are no clean answers to the cost, availability and suitability of the basic medium….
Jay Shenoy on Tuesday, 24 May, 2011 at 10:10 am

And btw, how do you differentiate between Tiering and Caching?

This is a question that comes up often, and people seem to have different views…
Jacob Marley on Tuesday, 24 May, 2011 at 5:21 pm

@Jay Shenoy,

The answer to your question is Flash can be one or both Tiering and Caching.
Toby on Tuesday, 24 May, 2011 at 6:54 pm

Jay: The difference, at a high level, is the file access.

Tiering would see the file move back to “primary” storage, whereas a caching system would allow the file to be read from a lower level, with a part kept on “primary” storage to lower latency.

The line can be blurred though, as some tiering systems implement some form of caching, or at least allow it to be implemented. The move still happens, but the latency is covered by caching.