JIm Gray’s comment that disk is the new tape is truer today than it was 8 years ago. We’ve been adding caches, striping disks, modifying applications and performing other unnatural acts to both reduce and accommodate random reads and writes to disk.

Flash changes the calculus of 20 years of storage engineering. Flash gives us abundant random reads – something hard drives are poor at – and reasonable random writes to whatever hot data we choose.

In a feverish burst of design and investment we’ve tried flash everywhere in the storage stack: disks; PCI cards; motherboards; controllers; built-in tiering; and appliances. These products have been focused on enterprise datacenters or very targeted applications where the cost of flash was justifiable.

But clarity is emerging. It isn’t so much where you put the flash as what you ask the flash to do. There are three requirements:

  • Valuable data. Flash is an order of magnitude more costly than disk.
  • Often accessed. If not, leave it on disk.
  • Enables new functionality and/or lowers cost. If it doesn’t, why bother?

The buyer’s burden
These requirements frame a basic point: optimizing for flash requires a systems level approach. Adding flash can make current architectures go faster, but that isn’t the big win.

Buyers looking for an economic edge must make a cognitive leap: the old ways are no longer best. Flash enables efficiencies and capabilities in smaller systems that only costly enterprise gear had a few years ago.

Tiering
Tiered flash solutions are the most common approach today. Tiering software has improved in recent years, making the movement of data between flash and disk safe, fast and granular.

We’ve started to at least see interest in the midsize enterprise, like the EqualLogic hybrid SAS/SSD array in VDI deployments.

Metadata and cache
The best fit for flash today is metadata and caching. These best meet the requirements for value, access and functionality.

Once metadata is freed from disk constraints we can combine it with caching to build high-performance systems on commodity hardware. The win for innovators is to design new metadata structures and caching algorithms for flash.

They can design the (write) data layouts to best take advantage of the physics of disk and flash, such as with Nimble Storage’s CASL architecture, which combines a large flash cache with full-stripe writes, is one example.

Flash is also an important enabler for low-cost de-duplication because it’s cheaper to keep block metadata – fingerprints or hash codes – in flash than it is in RAM. Some vendors are encouraging the use of de-duplicated storage for midrange primary storage, enabled by flash indexes or caches that make it feasible to reconstruct files on-the-fly.

The StorageMojo take
Shaking off the effects of 50 years of disk-based limitations isn’t easy. Our disk-based orthodoxy is ingrained in architectures and our thinking.

But buyers face a difficult job: evaluating architectures and algorithms to choose products for eval. A shortcut: look for architectures that collapse existing storage stovepipes to reduce cost, total data stored and operational complexity. The three are related and offer the big wins.

In the last 10 years raw disk capacity cost has dropped to less than a 10th of what they were, but the cost of traditional storage systems haven’t. The culprits: operating costs; storage network infrastructure costs; and capacity requirements that have risen faster than management productivity.

The flood of data continues to rise, but cost and complexity doesn’t have to rise with it. We can – and are – doing better.

Courteous comments welcome, of course. I’ve been working with Nimble Storage lately and like what they’ve done.