Hot data, smart cache

by Robin Harris | Monday, October 5, 2009 | Architecture, Clusters, Enterprise | 15 comments

Okay, weâ€™ve figured out how to produce protected storage for $100 a terabyte . It has wide fan out so the bandwidth is modest. It uses large SATA disks so it isnâ€™t great from an IOPS perspective either.

But it works.

What would it take to turn it into something that the average enterprise could use? Would a high-performance, scalable, high bandwidth, high capacity intelligent cache that automatically moved cool data off to the low cost backing store do the trick?

Several companies are betting it will.

The players
Gear6 has been around for a few years with a commodity-based clustered cache appliance that sits in front of existing filers.

F5 Networks also offers front end â€œintelligent file virtualizationâ€ with their ARX device.

Now a couple of new players are going public:

Avere Systems
Avere Systems is announcing their FXT cluster, an appliance made of storage bricks that each include RAM, SSD or flash, and 15 K. disks. The FX team cluster supports NFS and CIFS.

Tiering within the FXT cluster is automatic by access pattern, frequency and type of data. The data is tiered on-the-fly, with a hot file striped across multiple FXT servers, while cool data is pushed out the back end to file storage.

The FXT boxes support both GigE and 10GigE. In their testing the Avere team and its beta sites have found that for every 50 I/Os to the FXT cluster there is one I/O to the backend filers.

Avere is announcing this week with 2 2U rack mount nodes. Performance Go and 20 3K ops per second on a single node using the spec SFS 08 benchmark where the bandwidth of 1 GB per second on reads and 325 MB per second writes per node. They say they have achieved linear performance scaling to 25 nodes and their 1.0 release.

StorSpeed
StorSpeed says it is delivering the worldâ€™s first application-aware caching solution. Like Avere, Storspeed is offering a clustered front-end cache, but with extremely high performance: 1 million IOPS in a 3 node cluster; and 10GigE wirespeed bandwidth.

They use deep packet inspection to understand and manage traffic and capacity tiering. Expect more data on their web site when they announce later this week.

The StorageMojo take
These large scale, high performance caches are a logical extension of the disk controller model to network storage. Most data is rarely accessed but is too valuable to off line, thus the rationale for tiered storage.

Where tiered storage fails in practice is the intelligence required to put data in the right place: people just arenâ€™t scalable enough to manage it. These caches bring extra intelligence to the problem of automated data movement without forcing wholesale rip-and-replace of existing infrastructure.

Enterprises can save many millions of dollars by keeping the mass of cool-to-cold data on cheap storage while keeping the hot working set on a smart cache. This could be the dawn of a new tier of storage.

Courteous comments welcome, of course. I did some work for Gear6 a couple of years ago but have no other business relationships with these firms. Rats!

15 Comments

Andrey Kuzmin on Monday, 5 October, 2009 at 12:40 pm

What has had always concerned me is why tiered storage, being that obvious an idea and hardly a rocket science technically (for those who own the whole storage stack), never took off. My best guess is economics: there had been no economic incentive for efficient data placement on both customer (mixed bag, economic and technical) and vendor (purely economical) side. May be per-per-use environments like S3 could finally bring in an incentive to revive this area.
Anonymous on Monday, 5 October, 2009 at 1:48 pm

StorSpeed and Avere Systems links are broken.
Robin Harris on Monday, 5 October, 2009 at 7:16 pm

Anon – Fixed. Thanks,

Robin
Ron on Monday, 5 October, 2009 at 11:36 pm

It’s all great & I know u mainly look at price per TB here
but what about the complexity this solution is getting into ?
More rack space, more power consumption, more skills, more
mgmt, more time spent on supporting, more vendors for a
single storage solution…

Not sure at all if at the end run you will save money (not to
speak of hassle) in the long run with such a solution.
Joe Kraska on Tuesday, 6 October, 2009 at 7:41 am

Robin, the last time I spoke with Gear6 about their storage accelerators, they were thinking of taking them off the market. Sort of undermines buyer confidence, eh?

Joe.
nate on Tuesday, 6 October, 2009 at 8:54 am

I don’t see mention of SSD on Avere’s product, they mention solid state but then point to DRAM as that solid state.

Also take a look at grid iron http://www.gridironsystems.com/ …

As for tiered storage never really taking off, I think it’s because there was not much difference between the tiers, talking only ~3-4x performance boost from SATA to 15k FC. And with SSD, modern enterprise arrays simply cannot handle them with remotely the level of density that they can 15k drives.

My company’s last storage refresh we went to a 100% SATA solution, with big caches in the front end, certainly not as fast as these specialized products but it’s good to know we’re headed in the right direction 🙂
Joe Kraska on Wednesday, 7 October, 2009 at 1:28 pm

My companyâ€™s last storage refresh we went to a 100% SATA solution, with big caches in the front end, …
——–
Yah. I’m seeing a trend of various kinds of flash SSD devices as accelerators in enterprise storage in various vendor briefings regarding futures. Once you have a huge SSD cache, why use 15K drives at all? Just SATA (or SAS front-ended 7200RPM drives).

Joe.
KD Mann on Friday, 9 October, 2009 at 8:31 pm

I think the reason that automated tiered storage (going back to HP’s “AutoRAID” and earlier) has never worked is that all of the economic arguments in favor of an intermediate tier neglect to include the technical costs of adding/managing another “layer” of stuff, and he system performance impact of moving stuff back and forth. Meanwhile all the technical analyses fail to see the realities of what it takes to deliver a return-on-investment.

MS UK research group assumed “perfect” intelligence and yet found the cache tier is a money-loser…

http://research.microsoft.com/en-us/um/people/antr/ms/ssd.pdf

25 or so years of research at places like HP Labs has never resulted in a commercially successful product. When the geniuses at Carnegie Mellon’s renowned Parallel Data Lab spent 4yrs building and testing automated “tiered learning architectures”, they started from the same premise — “of course, this is obvious”. They were surprised that they couldn’t make them work, even after trying a slew of sophisticated AI and genetic algorithms.

http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-04-109_abs.shtml

Finally, upon examining the “architectural economics” of a discrete caching tier, it turns out that you get tremendously more bang for the buck either by (a) increasing system DRAM or (b) using more spindles, or (c) both, so its almost impossible for companies to make money in the access gap between DRAM and disk.

http://www.idema.org/_smartsite/modules/local/data_file/show_file.php?cmd=download&data_file_id=2007
Robin Harris on Friday, 9 October, 2009 at 11:20 pm

KD,

Agreed. I laughed when AutoRAID came out – it was obvious that there was no economic benefit. Haven’t run the numbers but it feels like $100/TB disk might put us at the tipping point.

If not $100, then maybe $50/TB. It is coming – soon – and at some point it works.

Robin
Andrey Kuzmin on Saturday, 10 October, 2009 at 11:25 am

KD/Robin,
“adding/managing another â€œlayerâ€ of stuff” is only meaningful when there’s an extra layer and it needs management. Nobody manages L1/2/3 caches in a (presumably Intel) CPU inside your laptop and no upper/lower layer knows it exists, but just imagine a nightmare of your cpu being stripped off caches.

In my first comment I have intentionally said “he who owns the whole stack”. Once you own operating system/file system/volume/drive layer, you’ve got all the usage data one needs to make intelligent placement decisions and all technical capabilities to do it transparently(with zero extra management costs).

ZFS L2ARC made first step by introducing extra level of read cache, and I believe intelligent (meta)data placement is next on their agenda. And they don’t make money in the “access gap”, they make it on an intelligent system delivering more bang for less bucks :).
KD Mann on Sunday, 11 October, 2009 at 2:32 pm

@Andrey,

I was specifically referring to the “technical costs” in terms of adding the SSD hardware and software logic for making automated decisions about what goes in which tier. This was meant in the context of looking at the potential ROI, or more specifically — at what point does a two-tier storage architecture (i.e. SSDCheap HDD) actually have the potential to reduce storage costs compared to a single HDD-only tier.

>>”Once you own operating system/file system/volume/drive layer, youâ€™ve got all the usage data one needs to make intelligent placement decisions…”

Sure you do, but the MSUK research team (my link above) used after-the-fact analysis of application traces to decide in advance exactly what data could benefit from placement on SSD vs. HDD. In other words, they presume the intelligence to make 100% perfectly accurate placement decisisions is just “there” irrespective of how or where it’s done.

>>…and all technical capabilities to do it transparently(with zero extra management costs).

They also presume that the incremental cost of this intelligence is zero.

So…even with a hypothetical perfect intelligence that hypothetically comes for free, there was still no ROI to be found in switching from single HDD tier to two-tiered SSD-HDD architecture .

Without an underlying business case (ROI) for two-tiered architecture, how can there be a business case for the intelligence to manage it?
KD Mann on Sunday, 11 October, 2009 at 2:43 pm

Robin,

I think the “hot data” end is the problem, not the “cold data” end. $100/TB is plenty cheap, what we need is something like $0.25/IOP (SPC-1C/E) and $5/GB from an “enterprise class” SSD cache tier before it’ll fly.

With E-class SSD at $2/SPC-1 IOP and ~$50-$150/GByte today, we’re a long way off…
Andrey Kuzmin on Monday, 12 October, 2009 at 4:39 am

@KD Mann
> at what point does a two-tier storage architecture (i.e. SSDCheap HDD) actually have the potential
> to reduce storage costs compared to a single HDD-only tier
Um :). I always thought we currently run 2-tier architectures (DRAM+HDD), and DRAM costs (and limited available capacity due to chipset constraints) is a factor to be taken into account.

> MSUK research team (my link above) used after-the-fact analysis of application traces
> to decide in advance exactly what data could benefit from placement on SSD vs. HDD
Sorry but it’s difficult to take seriously the analysis of an SSD with 351 write IOPS (your link, Table 4).
KD Mann on Thursday, 22 October, 2009 at 4:26 pm

Andrey Kuzmin wrote “…difficult to take seriously the analysis of an SSD with 351 write IOPS…”

I can understand the difficulty. Here’s another data point…Intel Research did an analysis of Flash vs. HDD for synchronous IO in transactional database applications (SIGMOG’09). The results show that the Intel X25 SSD was only twice as fast as a 10K HDD on random writes…so the 351 IOPS number that Microsoft measured in not that far off. The SSD was about 1/5th the speed of the HDD in small, sequential IOPS.

See Fig. 3(d) and 3(b)

http://www.pittsburgh.intel-research.net/~chensm/talks/Flashlogging-sigmod09.pdf

As you go through the paper, note that the “Ideal” performance scenario is represented by a single HDD with write cache enabled, and that the X25 SSD never manages to exceed about 70% of the HDD w/WCE performance (see Fig. 17, compare “Ideal” vs. SSD).

It’s no secret that NAND Flash Erase/Program (Write) cycles are 100x slower than reads, but most people mistakenly assume that the DRAM write caches on modern SSDs solve the problem. Unfortunately, the most performance-sensitive enterprise applications do a lot of synchronous I/O. When Flash SSD write caches fill up, the whole system slows down to the speed at which writes are completed to Flash. Another feature of synchronous I/O workloads is that they don’t generate deep queues at the device. Without deep IO queues, the multichannel paralellism of modern SSDs can’t be leveraged.
Brian on Wednesday, 4 November, 2009 at 3:09 pm

Does it make sense to be locked into a single vendor or single architecture to implement a tiered storage model like this? I work for a company called AutoVirt and we are hearing about this all the time. Primarily there seems to be too much change in hardware for companies to lock themselves into one caching model or technology for 3-5 years. I just wrote a blog posting about this and would love to hear others’ thoughts on the subject. http://www.autovirt.com/blogs/klavs-blog/entry/13.