Pure Storage, a well-funded ($55M) valley startup, came out of hiding last week with a startling claim: enterprise flash that is cheaper1,2,3 than disk.
1Cheaper after compressing and deduping the data.
2Cheaper after using almost all the flash capacity, which you can’t do with disks because performance suffers.
3Cheaper compared to the most expensive disk-based enterprise storage you can buy.
Your mileage will vary
4 years ago an EMC VP predicted that flash would replace high-end disks in 2010. That didn’t happen. Why?
After 5 years of hype, enterprises are still leery of flash. Endurance, reliability, data integrity, security, integration – all unanswered questions. At least by the other IT guys in town, even if vendors think they’ve nailed it.
So people buy the known quantity: high-end drives.
The StorageMojo take
Kudos to Pure’s marketing for making a bold, attention-grabbing statement. Too often marketing falls back on the trite-and-true “faster, better, cheaper.”
But IT wants to solve old problems while not introducing new ones. The 10x performance boost would be enough, if IT believed.
Pure’s challenge – as well as other companies with similar products – is to convince IT that not only is flash ready for primetime – but that compression and dedup are too.
And once they do that, why not use them with disks, as well? As Nimble Storage has found, inline compression is now easily handled in software on multi-core chips.
With raw SATA drives down to 3¢/GB, storage vendors have ample opportunity to squeeze costs out. The flash/disk competition will be good for all of us.
Courteous comments welcome, of course. The storage high-end is more active than its been in 15 years. Good!
“Cheaper after using almost all the flash capacity, which you can’t do with disks because performance suffers.”
Since when can you use all of flash capacity and not have performance suffer? Maybe they mean performance will still be faster than disk even though it will be slower than flash if it was 40-70% full or something.
3PAR posted SPC-1 numbers more than 2 years ago for their F400 platform showing 95% capacity utilization(data is mirrored though the mirrored data is used for parallel reads), you could get that to 99% if you opt’d to not allocate any space for disk failures. So you can get to high utilization on disks with the right architecture… Changing to RAID 5 (3+1) would drop the I/O performance by only about 8-10%, while recouping a bunch of space. The nice thing is you can run mixed RAID levels on the same disks(and even change the RAID on the fly).
Another flash company makes similar claims to Pure storage, 2 people have asked me about them in recent days but I had never heard of them before –
Nimbus data. Though they don’t include details on their cost savings either, their page says “check below for our full report” but they lack any level of detail – what array did they price, what raid level, disk size, etc etc.. They also make the claim of saving you $$ by including all of the software with the system, which is great too — assuming the software is any good (just look at equallogic) – which, unlike hardware performance is often hit or miss, I mean have you ever seen a review of a storage system, networking system or whatever that reviewed the software stack on top ? It’s pretty rare.
and by software stack I mean more than just what features does the thing offer. After I sat through a presentation on Equallogic(a bit more than a year ago) I was surprised how bad their software stack was, sure they hit many of the high level “must have” type features but once you dug deeper into what they could actually do it fell apart.
also…
4
Cheaper when HA = 2 controllers + 1 array for flash and 2c +2a for disk. Love to learn the logic here. Ref: http://www.purestorage.com/blog/how-pure-storage-delivers-all-flash-storage-at-below-the-price-of-spinning-disk/
Think all this commentary misses a key point. Let’s say you’re successful at compressing/dedup’ing data to a 5X ratio. That means a 450GB SAS drive is now housing almost 2.5TB, and a 2TB SATA drive a full 10TB! You think you’re going to be able to on-/off-load all that data at any kind of acceptable rate (maybe better put: handle all the I/O that “normal” use would generate in such cases), particularly the “active data” case for the typical SAS scenario? In the case of flash, on the other hand, this actually starts to use (still likely without stressing) the flash I/O capacity. I think PureDisk makes an entirely valid case.
If Pure is $5/GB at 80% dedupe, doesn’t that make Nimbus Data $2/GB at 80% dedupe?
@george
Depends on how much better one dedups vs the other.
Not all dedups are the same for a given dataset.
Not all datasets dedup easily, some don’t dedup at all and a few don’t dedup period.
I can see the appeal of all flash based arrays for applications where latency is the biggest bottleneck.
Otherwise, I see flash as…
+ another cache tier,
+ a faster journal/logging medium
+ or a combination of both.
I think folks are missing the point. The claims made by Pure Storage are a bit unfair in comparing compressed/deduped Flash with vanilla disk.
But if you are currently using vanilla disk, it’s a valid contrast.
I punish storage for a living, so I am a bit of a compression/dedupe skeptic.
Pure Storage have some pretty seasoned engineering talent, like Coz,
the volume manager architect/anarchist for VxVM which was an integral part of the success of your billion dollar baby, the A5000 at Sun. And its predecessor, the SSA100.
Flash is all about latency, much less about IOPS. One thing the folks at Pure Storage have got right is the focus on latency, the likes of which no disk can deliver. Sub-millisecond latency for random reads is the game changer. Conventional disk arrays can do this for writes, but those fast writes compete with reads in the array backend, hurting read latency.
Pure Storage are taking a really good approach to IO performance by regarding > 1ms latency as an operational error and appearing to quote IOPS that can be achieved at <= 1ms latency.
Remember the bad old days when disk vendors quoted IOPS which they don't seem to do any more ? If you asked how these incredible IOPS were achieved, they would be reported as the IOPS measured using a 100MB seek range, 100% read, using whatever queue depth was required to deliver a 100ms response time.
The claims Pure Storage make about IOPS are actually very modest, which I expect is due to their sub-milliseconds stance. Unless they have saturated their Westmere CPUs with compression/dedupe, I would expect they could deliver many more IOPS. But that would be at disk-style latency and there is no point in operating Flash in this manner because your forfeit the big Flash differentiator, the 10X latency reduction.
No affiliation with Pure Storage apart from having worked with or met a couple of their engineers in previous lives.
P.S. Disk vendors should revisit an approach discontinued about 20 years ago. With disk capacities increasing at a rate higher than their sequential bandwidth, time to read/write the drive capacity is getting ridiculous. Those parallel head transfer drives like the CDC Sabre dual parallel head 911MB IPI-2 drives were the mutt's nuts.
I haven’t read the details of Pure’s solution so forgive me of this is ignorant spiel, but why would you want to use some vendor-specific/proprietary compression/dedupe technology?
Why not build Blackblaze pod and put ZFS-based appliance in front of it? You get the benefit of huge number of drives(ssd and/or spinning) with the benefits of ZFS’ open standards on the fly compression and dedupe..
@Mxx,
COMPRESSION/DEDUP:
There are a number of open standards for compression.
There is no open standard for dedup, just each vendor’s implementation locked up by their related patents.
From a proprietary vs open standards, what makes ZFS (or just ZFS’s dedup) any “better” than any other vendor’s dedup ?
PUREDISK SOLUTION:
rockmelon’s post above yours hit the nail on the head w.r.t. what PureDisk’s solution offers.