SSDs in arrays: the Pure Storage view

by Robin Harris | Monday, March 12, 2012 | Architecture, Enterprise, SSD/Flash/NVRAM | 8 comments

Pure’s Matt Kixmoeller saw the Are SSD-based arrays a bad idea post and, unsurprisingly, responded. The SSD is Key to Economic Flash Arrays is a good post and I urge interested readers to check it out.

Pure has a stellar team with deep experience. Their views are worth considering.

As Matt notes:

This post caught our eye for an obvious reason: Pure Storage did start â€œfreshâ€ to build an all-flash enterprise storage array, and we did decide to use the SSD form factor, after quite exhaustive looks at all the other options. Quite simply, we found that SSDs are the most efficient and economic building blocks from which to build a flash array. Letâ€™s explore why.

After dismissing disk arrays that add flash drives – as I do – Matt focuses on (1) all flash appliances built from raw NAND and (2) flash arrays using flash SSDs.

SSDs are most efficient
Matt argues that SSD-based arrays have 3 key advantages:

Economics. SSDs are a commodity product that raw flash arrays will have a hard time out-engineering.
Flash controller complexity. Matt notes, correctly, that the flash controller is at the heart of argument. Better to use a controller that goes into millions of SSDs or one purpose-built for a single vendor’s array? How will the single vendor be able to keep up?
Servicability. Pure’s use of SSDs enables them to offer a familiar hot-swap experience that higher density designs may not offer. Futhermore, Pure’s data reduction features increase effective density to rival raw flash designs.

In conclusion, Matt makes a couple of more points. First, that SSD form factors will become much more compact, such as Apple’s DIMM-like mini-SATA SSD used in the MacBook Air. Second, that the proof is in the pudding: Pure, he says, has “. . . delivered with break-through performance, at a cost below traditional spinning disk.”

The StorageMojo take
How does Matt’s response stack up to the criteria in the original post? Not that there’s anything magic about them, but . . . .

Latency. No response, which doesn’t mean they’re worse.
SSD bandwidth. No response, but to be fair with enough SSDs you should be able saturate 16Gb Fibre Channel.
Reliability. No direct response. Instead a focus on servicability. More on that below.
Cost. Says Pure is cost-effective using their data reduction technology.
Flexibility. This is the heart of Matt’s argument: due to the commodity volume of the flash controllers flash SSDs will evolve faster – in functionality and cost – than any proprietary solution could. Proprietary flash controllers, he says, will be boat anchors for flash array vendors and are likely to end up controlled by flash manufacturers.

Servicability is an interesting response to the question of reliability. After all, the reason hot swap is important for some components but not others is because they either a)fail often – individually or in aggregate – b)failure compromises the product or c)online expansion, upgrading or reconfiguation is desirable.

Power supplies are routinely hot swappable because they have the lowest MTBF of any major system component. Disks are hot swappable because they come in multiples that reduce their aggregate MTBF while their standardized design makes hot swap cheap. I/O cards are often hot swappable because they are critical and needs change.

SSDs should be hot swappable because their failure rates are at best about half that of disks. But DIMMs, another critical component, especially if you invest in high-capacity ones, aren’t, because they rarely fail.

While I’m not aware of any non-SSD enterprise array vendor whose arrays don’t include hot swap components – love to be educated – which is more important: a short mean time to repair (MTTR) or a long mean time between failures (MTBF)? Because that is the argument about servicability.

I’d like to publish responses from vendors who feel strongly about this issue. Not in the comments, but as a blog post. Any takers?

Courteous comments welcome, of course. I was so impressed with the Pure Storage team that I signed a rare NDA with them last spring to get briefed, the first of 2 visits to their Castro street HQ.

8 Comments

Rick Vanover on Monday, 12 March, 2012 at 6:01 am

Good post as always, Robin. The thing I’m struggling with here on Pure Storage (and I like their technology, truly I do) is that most of our data profiles simply “aren’t there yet”.

What I mean by that is there still will be some cold spots, and in the case of Pure Storage, the cold spots would be on premium resources. But don’t get me wrong, the “problem goes away” figuratively speaking, so that’s good.

Anyways, this is the next big thing. I’m fixed to this space!
Jason on Monday, 12 March, 2012 at 7:23 am

Robin: I’m curious why you’re so fast to dismiss ‘SSDs + HDDs in the same array’.

As for the rest of it: Over the past 20 years we’ve engineered and optimized storage arrays. True it’s around a form factor of 3.5″ drives, but the transition to 2.5″ drives shows that it’s easy to change the form factor of the drive in a shelf. Leveraging some of those optimization wins seems like a no-brainer, rather than starting fresh from ground zero.

Finally: Rick: The ‘hot spot’ argument is a strong one. I’m building a ‘third tier’ of disk storage. It’s not going to be WORM, but it’s going to be close in profile. Very very rare writes, and low reads. If my top-end IO requirements were higher, I could make a strong case for dynamic tiering which EMC and HDS offer… have ~1-3% of your total space in SSD, 10-20% in raid 10 SAS, and the remainder in raid 5 SATA. Move ‘pages’ around the array, rather than whole volumes, and your IO problems should go away (largely).

The other option is to leverage something like Pure for your super IO critical volumes, and a ‘classic’ array for volumes with less demanding IO requirements.

–Jason
Rob on Monday, 12 March, 2012 at 7:27 am

> What I mean by that is there still will be some cold spots

Exactly. Not unrelated… tape. Tape is dead, right? Well not exactly.
Look at the price per gigabyte. And with LTO6, LTO7, etc. the price
per gigabyte continues to go down. Aren’t there now and going
forward room or reasons for cheaper tiers? Especially as more and
more is required to be kept, petabyte numbers not uncommon.

Economically (therefor CFO interest/bottom line) it doesn’t make
sense to NOT use the most cost effective tier for data.

Now having said all that, a purpose built SSD array makes a lot
of sense for a number of applications. cold spots and all. Big dollar
competitive edge situations where faster turn-around means a win,
cost of the system is an insignificant factor. The 1%’ers!
Adrian on Tuesday, 13 March, 2012 at 12:10 pm

I have a few observations to make on this topic counter to some of the claims made by Mr. Kixmoeller and Pure. Thanks Robin for fostering this open discussion.

1. Performance: Pure’s published performance numbers versus TMS, Nimbus, and Violin seem to be the weakest. Pure is less than half the speed in IOps and only advertises its latency as â€œunder 1 msâ€ when other vendors are specific (usually 100-200 microseconds). This comes from the datasheets published on the vendor websites. There seems to be performance reasons to build systems around flash, not SSDs.

2. Efficiency: Using off-the-shelf SSDs may be convenient but it is not the ideal option from an efficiency perspective. Pure needs 8U for 22TB. TMS and Nimbus do that in just 2U. Also Pure’s power consumption is 1300W for 22TB. TMS, Nimbus, and Violin are all less than half of that for the same capacity according to the vendor websites.

3. Serviceability: You do not need “SSDs” to be serviceable. Nimbus, TMS, and Violin all offer hot-swappable flash modules.

4. Economics: Pure seems to be the most expensive flash system on a TB basis. According to Mr. Kixmoellerâ€™s blog, they are $25/GB for MLC memory with dual-controllers. TMS advertises $15/GB for dual-controller EMLC (which is more reliable). Nimbus is around $13/GB for dual controller EMLC as well. Pureâ€™s cost argument is based on aggressive data reduction, but dedupe is not a feature unique to Pure nor SSD/flash in general. So using SSDâ€™s does not seem to have translated into lower cost.

As these factors above are concerned, building something around flash seem to have several significant advantages over repackaging SSDs.
Dave on Wednesday, 14 March, 2012 at 5:41 am

I have been researching the flash array market for many months in anticipation of making a purchase for a K-12 school district. A few comments, if any of this I have wrong feel free to correct me.

1. While dedup/compression is not a feature unique to Pure Storage it must be taken into account when comparing solutions from different vendors. For example, a 10TB Pure Storage solution has usable capacity of 30TB if we assume a 3:1 data reduction when taking into account dedup/compression. If another vendor does not have dedup/compression as part of their solution, 30TB of their solution must be priced against 10TB of the Pure Storage solution.

2. The Nimbus E-class, which is the comparable solution to Pure Storage, consists of two, 2U controllers and one 2U disk shelf. A total of 6U for 10TB of raw storage. The Nimbus published performance figures are with dedup turned off. Performance drops considerably when dedup is turned on and as far as I know they do not have compression.

3. Most of the TMS solutions are 10TB in 2U, their new RamSan-820 which has not shipped yet will have 20TB in 2U.

4. The TMS product is one of the fastest if not the fastest flash solutions on the market. For many applications including high performance computing, high speed transactional processsing, etc. this speed is essential. There is a tradeoff though that all manufacturers make between speed and capacity. By putting in place dedup/compression, products like Pure and Nimbus will sacrifice some performance (300,000 IOPS versus 800,000 IOPS) and maybe a small amount of latency to achieve higher capacity. It all depends what you will be using the array for. The Pure solution seems to be aimed at a different market segment, the virtualization space, than other vendors.

Of course there are many other factors to consider when purchasing storage:
1. Connectivity – what options are available and what fits your environment. Fibre channel, infiniband, 10GB ethernet etc.
2. Software stack – fibre channel luns, iSCSI luns, NFS, CIFS, snapshots, replication, etc.
3. Support – SLAs, hardware break/fix, phone support, where is the support/engineering group located, etc.
4. Will the company be around in 3 years? Venture Capital funded? Gobbled up by another company? Who is leading the company?
Chris on Wednesday, 14 March, 2012 at 10:41 am

To Adrian’s no 4 point. Perhaps Pure’s angle on economics is not their to your door price but their engineering, BOM costs? While I might not come out on top of all benchmarks, if I were to build something in this space, I would take a similar approach. Focus your smarts on the software/OS, mgmt layer and let your suppliers handle the problems further down the stack. The pure NAND play still puts them above the traditionalists and their own tech gives them an exit via acquisition. If acquisition is their target, my guess is that is where the best bang for your VC buck is.

I think Dave’s no 4 point is what rules my own decision making unless I am in a very unique one off situation of needed something more exotic for a particular problem.
Brian S on Thursday, 15 March, 2012 at 12:49 pm

Robin: Your two recent posts about SSD-based arrays are generating good and insightful discussion. Arrays built on architectures designed for SSDs are a good idea. They are an even better idea when they include high availability (HA) and data protection benefits equal to or greater than their HDD counterparts. HA and data protection donâ€™t grab as many SSD headlines as performance and costs, but they are vitally important to enterprise data center managers.

Kaminario published a blog post responding to your original post “Are SSD-based arrays a bad idea?” Hope you will check it out at http://www.theiostorm.com/array-vendors-get-out-of-ssds-way.

Your second post commented on a vendorâ€™s blog that touted SSD-based arrays. There is a lot to agree with in that post especially the claim about opening a running array and swapping out the DIMMS. Although you can call that hot swappable if you want, I agree with the other vendor that this is totally impractical and unusable. Kaminario has a true linear architecture with N+1 availability. You can have a full Data-node (up to 2.4TB each) fail and the system will not fail and will not lose data. You can then hot swap that full component without bringing down the system and without causing any data loss. The system reconfigures itself around the failure and the replacement automatically. No one is trying to pull a DIMM from a running system. Now how is that for MTTR?

On the issue of MTBF, Kaminario uses industry standard blade servers and PCIe cards. If the components are being used by thousands of customers world wide, one could say that they have been thoroughly tested by the market and would most likely have the best MTBF. That is why we chose this path instead of the custom hardware route.

SSD-based arrays are a good solution to help businesses make better decisions faster as applications get larger and more complex. SSDs can also enable better customer experiences and help users gain competitive advantages. With SSD costs coming down, performance going up and data protection capabilities increasingly available, there is no doubt that 2012 is the Year of the SSD.
KD Mann on Monday, 4 June, 2012 at 8:37 am

Robin: I’ll echo Jason’s question above — why so quick to dismiss incorporating of SSDs into HDD-based Arrays?

The idea that a ‘conventional’ RAID controller designed for HDD cannot possibly be optimal for SSD just somehow sounds right…and as such it has become ‘conventional wisdom’. The problem is, there is not a shred of evidence anywhere that it’s true.

In my own experience, (and in every published test I’ve seen) the fastest SSD array controller available today is an optimized HDD RAID controller, namely the LSI SAS2208 “RAID-on-Chip”. This chip is used in many industry leading RAID arrays from IBM to NetApp, and also on many PCIe RAID controllers.

Why is this HDD array controller so fast as a Flash array controller?

Consider for example what is probably the biggest difference between Flash and spinning disk, which is the asymmetry of read vs. write performance. Fundamentally, disks perform reads at the same speed as writes, while Flash is 10-1000x slower in writes than reads (depending on MANY variables). This is a huge “flash specific” problem that must be dealt with, and this underscores the conventional wisdom, but is it really unique to Flash?

Turns out it’s not. Consider; as soon as one creates a RAID-5 or RAID-6 array from spinning disks, the very same kind of read/write performance asymmetry emerges and must be dealt with.

Accordingly, hardware RAID controllers like the aforementioned LSI product have been engineered over the last 20+ years to perfect the art of dealing with “fast-read-slow-write” performance asymmetry in underlying arrays of spinning disks.

Now in this context it’s easy to see; the problem of speeding up intrinsically slower writes to match read speeds is essentially the same problem regardless of whether the underlying arrays are made up of parity groups of spinning disks, or likewise slow-writing arrays of NAND Flash chips.

This is just ONE example, but I think my point is clear.

I have heard many stories from startup vendors that rely on the conventional ‘wisdom’ — the premise that the RAID stack needs to be completely re-architected for Flash. It’s just not true.

Startup vendors: prove me wrong. There are many customers out here in the market who can benefit greatly by incorporating SSDs into existing SAN infrastructure. If you assert this is not true, the burden of proof is on you.

Let’s see some hard evidence, please.