Why are array architectures so old?

by Robin Harris | Monday, August 25, 2014 | Architecture, Enterprise | 8 comments

25 years ago I was working on DEC’s earliest RAID array. When I look at today’s “high-end” arrays, it’s shocking how little architectural change the big iron arrays have embraced.

The industry is ripe for disruption, only part of which is coming from cloud vendors. 21st century problems demand 21st century architectures.

Here’s a list of the obsolete technologies still embraced by most large arrays.

RAID. After it turned out that the governing assumption behind RAID was not accurate – non-correlated failures – and as drive sizes increased, RAID data losses and rebuild times are both too high. Rebuild times have rendered traditional RAID 5 and 6 enterprise arrays functionally obsolete.

Active/active controllers. CPU cycles used to be costly. Now they’re cheap. If performance and availability are paramount, triple controllers – active/active/active – or better are the way to go.

Low density drive enclosures. Why do we need to install drives at a moment’s notice? RAID. Drive failures in a stripe – even with RAID 6 – threaten data integrity due to failure correlation and silent data corruption. Get rid of RAID in favor of modern redundancy techniques and drive replacements can be handled by calm, awake people.

Fiber channel drives. With the rise of much faster SSDs with dual SAS connectors there is little reason for fiber channel drives. Dump ’em.

Hot spares. Parking an expensive and wearing resource in an expensive slot – and not using it – was a good idea only compared to the alternatives. Getting rid of RAID means that all of your unused capacity can be used for fast replication, not just dedicated drives.

Back up. Enterprise storage should be engineered to make back up unnecessary. We know how to do it, it’s been done, data sizes are exploding and back up windows are imploding.

Custom hardware. Hardware, formerly a differentiator, is now a boat anchor. Low-volume, costly, and little benefit. FPGAs and ASICs make sense for the bleeding edge, but if you aren’t using high-volume hardware for 98% of your kit, you’re last century.

The StorageMojo take
Storage has always been conservative – almost as conservative as the backup and archive. Your data is, after all, why you invest in infrastructure.

But the “new” technologies that have rendered old architectures obsolete are now 10 years old. Times have changed.

Object storage is powering the world’s largest storage systems – and they aren’t using RAID either. That makes high-density drive enclosures – 45-60 drives in 4U – feasible.

Chunk data across enough drives or use SSDs and you don’t need fibre-channel drives. With advanced erasure codes and fast snapshots you can lose backup too – and the expense of the systems it requires. Archiving remains a different problem.

All this and more without custom hardware. Server-hosted storage works as Google, Amazon and Azure have proven.

In five years these old architectures should be on the ash heap of history. But they won’t, because too many buyers buy what makes them comfortable, rather than what maximizes utility at the lowest cost.

But as those folks retire or get eased out, the market change will accelerate.

Courteous comments welcome, of course. What’s your favorite obsolete array technology?

8 Comments

Mark on Monday, 25 August, 2014 at 9:29 am

I think you are correct: Erasure Coding > RAID; Hybrid SSD/HDD media > FCAL HDD; Multicontroller > Failover; and Deferred Maintenance > Hot Swap.

That said, the legacy hardware and software limit the advancement of the big players. That is why XIV and 3Par had more innovative architectures than DS8000 and EVA. But I would say things are evolving towards the above design points, and give things time, and all storage systems will evolve.

I think the bigger issue is HDDs will reach a point where the bottleneck of drive speed prevents any useful increase in storage density. Essentially, HDDs become like optical drives at that point. Fine for a streaming write or read, but useless for any online data storage. Erasure Coding will not be enough to overcome drive rebuild times to allow online data storage. Hybrid designs, with SSD caching, will not be enough to accelerate 9-12 TBs behind a 175MBps and 60IOPS bottleneck per spindle. You might need 2TB of SSD for each 9-12 TB SATA spindle to give acceptable performance. This is the endgame for SATA drives as online storage, and the only thing keeping SAS alive is it went to the SFF format. My gut says SATA will hit this point in a few years, no later than 2020, and I give SAS about five years beyond that. Hybrid SAS/SSD will replace hybrid SATA/SSD, but hybrid SAS/SSD will be pressured by lower cost all SSD using TLC SSD in the next few years.
mike on Monday, 25 August, 2014 at 4:00 pm

Robin, that’s correct when looking on the legacy products.
However XIV and 3PAR have been successfull (not enough IMO) in educating the market on how a modern storage system should look like.
HP has done a great job with 3PAR and I also hear the big blue guys talk more and more about XIV lately so it seems like IBM is finally in agreement with you.
jan on Monday, 25 August, 2014 at 4:04 pm

I guess because it still works well enough and overall reliability continued to improve.

RAID+HOTSPARES: true to a point yes but it still works well enough I don’t think its going away for a while.

ERASURE CODING: yes its coming to the fore but no supplier suggests running databases or similar on it yet. (In the “cloud” maybe they do but that’s for people who don’t care about latency).

BACKUP: Are you just trying to be provocative 🙂 . That’s not backup!

FC DRIVES: Agree but SSDs still aren’t that cheap unless you really need the iops.

ACTIVE/ACTIVE/ACTIVE: Totally agree – don’t understand why more companies haven’t done something like 3PAR – even the up and coming flash startups have stuck with this Active/active thing.

CUSTOM HARDWARE: thought this had mostly already gone – your favourite company EMC now call themselves a software company .

Personally I think looking at the bigger picture – data management – is key. Put your archive data, some file/unstructured data and possibly email (the bulk of your data) on object storage and keep RAID for your databases and other performant /latancy data for now.
Anton on Tuesday, 26 August, 2014 at 7:55 am

A little on custom hardware: I think you are forgetting that if you need a fast, modern, future-ready, etc etc storage system, in what form this will eventually be, you also need support from whoever you buy it. Unless you are as big as Google or Facebook, you probably cannot do without it.

And supporting a software-only architecture on a wide variety of commodity hardware is not easy – look at how long it took Microsoft to get Windows stable on most systems. Apple (like many of the “old” storage vendors) designed their own hardware and software and therefore, only had to make sure that their OS runs well on a few different systems. And that’s a lot easier for support.

As for active/active/active/… I totally agree. Backups are a different thing though. Here you need to educate people on which part of their data is so valuable that you actually need to back it up. Personally, I don’t think a storage box should “do” the backups, but they sure should “help” you with it, if some business application tells it to backup its data.

I would like to add one more thing to get rid of in a modern system: Replicated data. I see a lot of storage systems duplicate their data to some place else, whether it is on the other side of the campus or on the other side of the globe, and I ask myself: Isn’t there a better way to do this than to spend double the money on a second array?
Rob on Tuesday, 26 August, 2014 at 7:27 pm

Why do you keep trotting out this tired old canard?

“Rebuild times have rendered traditional RAID 5 and 6 enterprise arrays functionally obsolete.”

What is MTDL in a modest sized RAID6?
What is the actual cause (in most cases) of a RAID5 (for instance) rebuild failure? I’ll give you a hint.. it isn’t because a magic genie is watching a clock and suddenly… “time’s up!”

I wish Bill Todd would drop by and give a proper teaching. You ever cross paths with Bill at DEC?
Robin Harris on Tuesday, 26 August, 2014 at 8:53 pm

Rob, if the issues were only MTDL and rebuild failures, I’d agree with you. But rebuilds soak up a lot of controller cycles and disk IOPS. With the new 8 and next year’s 10TB disks, rebuilds will extend to days while performance suffers all the while. If you’re cool with that, fine.

But a much better and proven way exists. More later.

Robin
Rob on Wednesday, 27 August, 2014 at 11:40 am

Maybe I’m not being fair, but I’m latching on to your “functionally obsolete” statement. RAID6 certainly isn’t, MTDL of 100 years last I checked. So if your not losing data and performance is good, it will be around. The array vendors for the most part have masked a lot of the R6 performance issues with tiering and large pools. I’m working an issue now and the pool has 55 disks from SSD, SAS and nearline SAS, tiered. The LUNs have pieces in each tier depending how hot or cool the data is. On RAID rebuild – by default – the “medium” check box is highlighted, nervous nellies can check “low.” If 10/12 member RAID6 is doing 120 MB/sec or so across the RAID set on rebuild, sure it will be over a day – no problem, with little or no performance impact with LUNs hitting many disks and tiers.
Jason Ozolins on Monday, 22 September, 2014 at 11:26 pm

Low density is not associated with RAID, especially. We have Engenio and DDN arrays with between 60 and 84 disks in each 4RU enclosure.

As for reconstruction times, spreading RAID-6 volumes across a large disk pool a la Engenio’s DDP helps a fair bit, assuming you weren’t wanting to run multiple streaming workloads from the pool – the distribution of volumes across a large pool means that serving multiple streams will practically always require head movement. DDP also gets rid of hot spares, and replaces it with spare capacity spread through the drive pool.

Commodity platforms vs ASIC… hmm. The Wheel of Reincarnation turns ever ’round. Sandy Bridge made a huge difference to the I/O capabilities of a Xeon platform. Question is, if you take some ARM core IP, add a lot of PCIe lanes, a bit of dedicated parity mangling logic, then fab it, will you get better bang/buck than Xeon at the volumes you intend to ship? Or even just a better bargaining position with Intel, in getting them to build chips with the right balance of PCIe and cores for your embedded application?

Completely agree about FC disks though. Die, die, die.