Think: if NAND flash storage arrays were being developed today, what is the chance that we’d put the flash into little bricks and then plug a bunch of them into a backplane? So why do it now?
It is a truism of design that when a new technology is developed, we use it to build what we have today. It is only in later generations that we realize the new possibilities enabled by the technology. And those generations can be long, even in computers.
For all out talk about the rapid pace of computer innovation, the market for the tried-and-true is much larger than the one innovators fight over.
Why SSD-based arrays are a bad idea
To be clear, this discussion covers storage arrays built with standards-based (i.e. SATA, SAS, 2.5″ or similar) SSDs.
- Latency. Low compared to disks, but substantial compared to flash. SAS/SATA stacks were never optimized because disk latency was the big problem.
- SSD bandwidth. There are wider options, especially close to the CPU.
- Reliability. SSDs replace the head/media assembly in disk drives with NAND chips. The rest of the SSD has all the tender bits of a regular disk – bits that account for about half of all disk failures. Compare DIMM and disk replacement rates.
- Cost. SSDs cost 50%-100% more than the raw flash, even after using all the high-volume disk components. Mounting directly on PC boards, like DIMMs or PCIe cards, is much more cost effective.
- Flexibility. The good news with SSDs is that they take advantage of the huge tech infrastructure that supports disks. But that’s the bad news too, if an optimized clean-sheet architecture is the goal.
How big an issue is cost? DRAM on a DIMM is ≈98% of the DIMM’s cost, where the flash in an SSD ≈50%-65% of the cost. And since flash costs are dropping faster than the other component costs, so will its percentage of SSD cost.
Given the high cost of flash media compared to disk, efficient media usage is a major issue. Will flash SSDs pass that test?
A less important but related metric: rackspace. SSDs are inefficient users of racks, taking perhaps 2x the space of non-SSD flash arrays per TB. Few customers will care, but the ones who do write big checks.
The StorageMojo take
The massive technological momentum behind SSD-based arrays make them a popular option for both vendors and customers. After 20 years of RAID arrays, customers get the model. There’s a large raft of hardware and software support for disk drives that SSDs can use.
That cuts time-to-market and development cost. Given the performance advantages of SSDs over disks it is an easy win for customers even if the architecture is sub-optimal.
The squeeze comes later: if non-SSD architectures have significant advantages the SSD-based arrays will lose market share and gross margin. Flash-based SSDs make sense for many applications where their cost is a small percentage of the total solution.
Building storage arrays from SSDs is opportunistic, not strategic. It isn’t the future for high-end storage, but less-demanding mid-markets may not care.
Courteous comments welcome, of course. I’m really interested in any holes in the logic of this analysis. Please weigh in.
Great article and I agree. It will be nice if EMC Project Thunder proves us right 🙂
Hi Robin,
Enjoy your blog. Just to clarify, are you making a distinction between SSDs and Flash memory with respect to form-factor and placement?
What customers like about RAID is the guarantee (i.e. promise) of uptime in the event of a failure.
Disks fail and we know that so we plan for that. Are SSDs so much more reliable than disks that we shouldn’t worry about uptime? I don’t think so. Even if they never fail, FLASH SSDs will wear out and require replacement.
A SAS or SATA package gives you some guarantee that when the SSD fails or wears out, you’ll be able to replace it with another SAS or SATA SSD. In all likelihood, the original model you purchased will be obsolete by then. but because it uses an industry standard interface, you are covered.
Until card based SSD can come up with an industry standard interface and addresses the uptime need and can promise that you’ll be able to replace failed (or worn out) units 3-5 years from now, I think we’re stuck with SAS/SATA interfaces and RAID controllers.
Michelle,
I’m referring to NAND flash packaged into disk drive form factors with disk drive interfaces such as SAS and SATA when I talk about SSDs.
Robin
The problem is PCI-E based SSD have a significant cost in the controller design and driver development and such. One may make a valid argument that the driver development would make PCI-E form factor more expensive then SATA/SAS as in those cases the OS drivers are already handled.
What we really need is someone like Intel to add some type of DIMM-like slots right off the chipset where we can plug in FLASH in a DIMM-like profile and cut all the overhead costs of PCI-E form factors. This interface becomes a defacto-standard with all the usual suspects building modules to plug in. No external controller or firmware needed – all handled by the motherboard chipset. When this happens performance and price become maximized, until then we’re just adding overhead.
John,
That’s an attractive theory, but array vendors are often strict about which drives you can use in their arrays. Most warrantees require that replacement drives be from the array vendor. And those carry a hefty markup.
One can argue about how justified this policy is – my view is that if you want the best possible availability you should listen to the vendor – but swapping in an untested replacement drive isn’t quite as clean as you suggest.
Robin
Blog post I read by Nigel Poulton:
http://blog.nigelpoulton.com/ive-seen-the-future-of-ssd-arrays/
Robin, nice post setting-up a meaty discussion! As you might expect, we at Pure Storage have a slightly different view…we built an all-flash array from scratch, and came to the conclusion that an industry-standard SSD was the right form factor to use to deliver performance without sacrificing availability, serviceability, and rapid innovation. I penned a longer response post you can check out on our blog: http://purefla.sh/zirsr5
Robin,
I agree, I probably oversimplified the obsolescence part of the equation. But really there is no uptime story for SSD other than RAID. Am I missing something?
John
what about fusion io?
Probably good to distinguish between two different markets for storage:
1 – Large, external arrays (the topic of this post)
2 – Internal storage, inside the chassis alongside or directly attached to the motherboard.
For #1 and #2, SSDs are quick-and-easy upgrades right now.
For the future, if you want to take maximum advantage of Flash or similar storage:
#1 will need a new external protocol-and-connector scheme defined, unless Thunderbolt becomes a lot more popular (unlikely).
#2 can use PCIe now (Fusion IO and several others). But PCIe slots are a precious commodity in most chassis. Also, no one has defined a standard protocol for talking to solid-state-storage-on-PCIe, so every vendor invents their own, and switching vendors is painful. PCIe is overkill if all you need to do is move bytes back and forth.
What #2 really needs is an internal version of #1, like SATA vs eSATA.
Given current industry dynamics, the “someone” who needs to implement this is Intel. Good thing Intel believes solid state storage is strategic now.
Let the wait-and-speculate game begin!
Robin asks an interesting question, and gets a fairly wide range of interesting responses, as usual.
We are biased, in that we build SSD arrays in addition to Flash arrays, in addition to spinning rust storage, …
I can give you feedback from the customer perspective, and from this I surmise that the market may be more complex than the original post surmises.
Customers “get” SSDs. They really do. They know that for some workloads, SSDs are much better than spinning disks, and they can make use of them now, without taking a large risk on a particular vendor. Especially not on a small startup which might not be here in a year or two. Not that I haven’t pointed out that this is a problem for large companies as well (them not being here in a year or two, or sufficient change of business to render specific products as very expensive bricks) to many.
Every decision has a cost associated with it, as well as an opportunity cost for the path(s) you didn’t take. There is real risk associated with each path, and that risk also has a value. The question that customers ask, in evaluating this sort of technology is, how much of the very expensive stuff I’ve bought before do I have to throw away to use teh shiny new thing? And does using teh shiny mean that I have to lock myself into this particular vendor?
This is why SSD arrays aren’t a bad thing, but actually quite the opposite. They lower adoption risk for customers. Vendor X creates a flash device in an SSD package, and a year later goes bust, or decides to start making potato chips instead. So now the customer whom has bought into Vendor X’s wares, can slowly, as failure happens in the parts, start migrating to Vendor Y’s wares, with no noticable down time, and at low marginal cost. The risk cost of this pathway is extremely low.
Change this to a completely proprietary design, put flash directly on the motherboard, and yes, you will have a faster, and probably better product. But you have just concentrated your risk in such a way as to have a hard business dependency upon the health of one company.
This is the business argument, not the technological argument.
We’ve built PCIe arrays, SSD arrays, and hybrids. We know where the performance demons live, and have gotten pretty adept at tuning them.
c.f. http://scalability.org/?p=3391 and http://scalability.org/?p=3311 . What I can say is that most of the SSD performance IOP benchmarks we’ve seen are … well … wishful thinking. The PCIe Flash are pretty close to what we measure, within 50% of the marketing numbers for real world cases. SSDs? Not so much. Expect 2-8k IOPs per device for current generation. Some devices use Sandforce controllers which do page compression. Works great for compressible data, not so great for incompressible data. Most of the benchmarks showing staggering IOP rates on the SSDs are using compressible aligned data, and as often as not, their “random” ops are actually sequential (at least within erase blocks, but thats a different story).
Assume 2-8k IOPs per SSD for current crop (8k random reads and writes). Far cry from the 100k+ of PCIe Flash. Far cry from their marketing numbers. Occasionally, you can hit 20-25k IOPs if you get everything right, and have just the right workloads. One of the links above is what happens when you hit that type of workload.
48 of the beasties joined in various types of RAID, can provide nice IOP rates. The two links above are two different configurations we’ve been demoing and selling. And as I pointed out, customers “get” the technology, they understand the risks. And they know how to manage them. An SSD dies (and yes, they do, sometimes at higher rates than spinning disk), replacing it is no different than what they’ve done in the past, and the unit can remain in production.
What customers are telling us, after all of this, is that while they really like the PCIe Flash (we do to, very hard to beat for seeky loads), for use cases that require very high uptimes without massive replication of Flash resources, you have to go an SSD route, to ensure hot swappability. And even if one SSD vendor goes away, another is there to take their place, so your risk profile for deploying this is low. No real vendor lock in.
Again, we are biased.
The real danger to the exotic architectures is that, for many … many … use cases, SSDs in arrays are “good enough”. You can disparage this viewpoint, dismiss it, ignore it if you wish. Doesn’t render it any less correct. And if something is good enough, and hits the other elements (low deployment/operational risk), comes in at a lower or even equivalent price point … few customers would opt for the higher risk pathway without a very important differentiable factor that provides them a benefit that far outweighs this risk.
There are a small number of such customers out there. For many, good enough is, well, good enough.
FWIW: many years ago, I heard very similar arguments in Linux clusters versus large SMP machines, RISC processors versus Intel-like processors, SATA vs SCSI. George Santayana’s aphorism remains as true today as it was when the other wars were raging.
Lots of great points being made here. The one thing that is becoming obvious is that plugging an SSD into a hard drive wire (SATA, SAS or FC) introduces some serious bottlenecks. This is equivalent to having RAM DIMMs with a SAS front-end plugged into a drive slot. It makes no sense. The datapath of traditional disk based DAS/NAS/SAN systems is designed to run with millisecond latencies, whereas RAM and flash run at microsecond latencies. There’s a lot of passion around this topic here at NexGen, if you’re interested in reading more details about these bottlenecks, check out my longer response on our blog site.
These are exactly the fixed stars that I set my eyes on while riding the volume/efficiency curves of mainstream flash products. Always find it useful to keep the best case scenarios in mind.
An interesting data point. For a commodity SSD, reading a small random block of mapped data takes about 250us while reading an unmapped (empty) SSD takes about 70us. Gives a feel for how much of the latency is due to drivers/interfaces versus the organization of the flash inside the drive.
At Tintri, we use the most cost effective flash that is available with good reliability and performance and are not wedded to any particular form factor.
Sun/Oracle did exactly that. They developed a clean-sheet architecture to implement NAND flash in computers while addressing the issues of latency, bandwidth, cost. I am surprised Robin did not mention it 🙂
“Oracle’s Sun Flash Module is the world’s first enterprise-quality, open-standard Flash device designed and built to an industry-standard SO-DIMM form factor”: http://www.oracle.com/us/products/servers-storage/storage/disk-storage/043963.html
Does anyone here have thoughts they would like to share about SCSI express (-which is referenced in the post by Marcus). Will it effectively bridge the gap between SSD’s and internal flash or do we need to replace SCSI entirely?
Are SSD based arrays a bad idea?
As with many things involved with moving, processing and storing binary digital data, it depends. Sure, there are some scenarios as others have pointed out where putting a drive form factor SSD is not the most applicable solution. Likewise, there are times where SSD in a drive form factor in a storage array or storage system or appliance make perfect sense, just as there is a place for server side PCIe target and cache cards.
Thus, it depends, however in general, no, SSD in storage arrays or appliances or systems are not a bad idea for many environments.
Here is the first in a two part series that sheds more light with additional perspectives including when, where, why along with what to use regarding SSD. The question after all is not if, rather when, where and with what.
Why SSD based arrays and storage appliances can be a good idea (Part I)
http://storageioblog.com/?p=2823
Cheers
gs
Here are the counter arguments for why Flash in a drive form factor is better than Flash in DIMM card form factor.
1. The DIMM card form factor is currently delivered in 2 forms:
a) On mother board inside a server (DAS). eg. Fusion IO
b) Monolithic appliances whose capacities extend to 100TB.
eg. Violin Memory
[1a] has the following problems:
– limited space on mother board
– DAS makes delivery of storage applications like backup & disaster
recovery much more complicated.
– Operational costs are higher even as Cap costs are lower. The
Google, Facebook, iCloud private cloud model does not work for
enterprises who primary business is something other than IT and
so cannot afford to hire 1000s of engineers to support their IT
infrastructure. Google, Facebook and iCloud are in the business of
selling IT infrastructure (directly or indirectly). They start with
a baseline operations cost that are high. So they look for ways to
skimp on CAPEX costs.
[1b] has the following problems:
– In the event of failure, you loose the entire appliance.
– Costs associated with impaired service is a lot higher when dealing
with 100TB versus 1 TB.
– Tier-1 environments require the system to operate 24/7/365. A
simplistic appliance model that is just focused on reduced capex
cost achieved by greater density in smaller form factors is very
short sighted thinking as OPEX constitutes 80% of TCO.
Drive form factor architectures for Flash allow for a smaller Field replaceable units. This improves the Reliability, Availability and Serviceability experience.
Robin, this is a great post.
In my view, however, the question of how flash is attached is not as germane as might seem at first. Today the benefits of SAS/SATA interface outweigh the overheads. It is likely that attachment technologies will evolve, and storage vendors can evolve their products with it.
The more important debate is whether arrays should use only flash or a combination of flash and hard disks. And, if using a combination of flash and disk, whether flash should be used as an endpoint of storage or as a cache.
I have both responded to your question on the form factor as well as touched on this bigger debate at my company’s web site: http://www.nimblestorage.com/blog/are-ssd-based-arrays-a-bad-idea/ .
Roy, you say
>[1b] has the following problems:
>– In the event of failure, you loose the entire appliance.
no, you don’t. thats the whole reason for putting it in that form factor in the first place is to make it serviceable so you don’t lose the ARRAY (it’s not an ‘appliance’, thank you very much) you just lose the module, and then you replace the module.
Lot’s of interesting points here but ultimately it comes down to the implementation and the goal. There are three main types of Solid State storage devices. Those that are pure solid state, thoses that are traditional Arrays that can use Solid state as a drive class, and Hybrids designed to use some solid state to deliver a balanced architecture for a specific use case.
All have their use case depending on the customers frame of reference and application. There will be successes with all flavours. No doubt there will be evolution in the future as we get technologies like NVMe and SCSI Express but the more important point in designing a storage system is to have a clear use case in mind and not design around technology for technologies sake.
What we see at Starboard storage is that in the mid-market customers want to consolidate workloads for savings. They want unstructured and structured data. They are seeing huge growth in unstructured workloads but also have traditional application and virtual machines.
Solid state is only important in that context if it provides the right price/performance equation for those workloads. Too many people build silos for high-performance applications and large scale VM workloads and use solid state because it is a pure performance play. Ultimately this a technical exercise and not a market driven solution. That is when SSD Arrays are a bad idea.
Hi Robin,
I see two holes in the logic here.
Re: “… if NAND flash storage arrays were being developed today, what is the chance that we’d put the flash into little bricks and then plug a bunch of them into a backplane? So why do it now?”
The answer is twofold. First, because Flash wears out eventually and must be replaced, it needs to be easily replacable. Every major server and storage vendor has included language in their warranty statements to the effect that Flash based products are treated as a “consumable”.
Remembering that Flash endurance is getting worse as density increases, this becomes more of an imperative over time.
Secondly, and more fundamentally, NAND Flash is intrinsically a “Block Device” and in an architectural sense it must therefore be treated as ‘disk’, simply because there is no “update-in-place” capability as there is with DRAM. There is plenty of marketeering happening these days around the idea that (for example) putting Flash on a PCIe bus somehow gets around the need to do block level IO. It’s all nonsense.
A block device needs a block IO interface. This is true whether it’s disguised under a pile of “secret sauce” or not.
Regarding performance, you probably have access to the SNIA SSS Perfomance test results from vendors you work with. Compare any of them to the LSI/Sandforce implementations of their SAS2208 ROC connected to an array of Sandforce SSDs. If you do you’ll find that none of these performance-based arguements hold water.
Flash is specifically *not* a block device, that’s why there’s often a FTL in front of it that pretends to be a block device.
Block devices are assumed to have update-in-place, whereas the TRIM command is just a hack that was bolted on 40 years later, that still doesn’t solve this fundamental issue.
The terminology for a proper flash handling interface is MTD. These kind of devices are usable under Linux for example with UBI and UBIFS.
SSD arrays are a great way to ease IO bottlenecks and are a good stopgap measure to solve performance related design issues.
As SSD becomes larger in capacity and lower in price they will ultimately replace mechanical HDDs.