XIV: eXtremely Inexplicable Value

by Robin Harris | Tuesday, August 26, 2008 | Architecture, Clusters, Enterprise | 15 comments

I’ve been trying to get my head wrapped around IBM’s new XIV product – and not having much luck. When the acquisition was announced Andy Monshaw, general manager, IBM system storage, said it would “. . . put IBM in the best position to address emerging storage opportunities like Web 2.0 applications, digital archives and digital media.”

Huh?

XIV is a block device. Most of the major players in digital media, NetApp, Isilon and the up-and-coming Omneon in particular, are file based. Yes, there are people in digital media who still buy block storage because of its perceived advantage in on-time frame delivery. More of them are retiring every year.

Data Direct Networks makes screaming fast high-bandwidth block storage that is pretty popular for 2K, 4K and even 8K digital video. But they also offer a filesystem to run on top of that storage if the customer wants.

IBM knows this – or someone does – since they resell DDN.

Does calling it grid make it better?
Let’s say, for the sake of argument, that there is something behind the XIV acquisition beyond sticking it to EMC by hiring the inventor of the Symmetrix. A scalable, cluster-based, high-performance and high availability block storage system is a Good Thing.

So why not market that? Clearly IBM has an issue with product feature deficits for mission critical enterprise applications. After all XIV has only been shipping for two years.

But those issues are fixable. IBM has fixed them many times before. No one fixes them better. Faster maybe, but not better.

So why cripple a product that looks to offer significant advantages over big iron monolithic storage by putting the word “grid” in front of it? After all Grid is dead.

Is the data center suddenly clamoring for grid? Somebody please tell me.

The StorageMojo take
I hope there is a cohesive strategy behind the XIV product. But so far I’m not able to even guess what it might be.

Maybe the decades of warfare between geeks and suits has so totally paralyzed the product marketing function that even the normal IBM facade can’t cover the cracks. It must be something.

XIV is a good idea and long overdue. Lots of good advertised features, including many that work I’m sure.

But there is much weirdness in the system that – if IBM wants a return on their rumored $350 million – needs to get fixed and soon.

Courteous comments welcome, of course. Whoever comes up with the most creative explanation wins a talking orange YottaYotta cube. Cats love it!

15 Comments

Nathan on Wednesday, 27 August, 2008 at 6:41 am

Classic misdirection: IBM has everyone watching them doing something stupid while behind the scenes they’re doing something subtly smart. What smart thing could they be doing, you ask? Simple: the same thing Nintendo did with the Wii.

Nintendo coming out with a new video game console isn’t really news. It’s what they do. It happens very predictably, and people outside the company knew for months what was coming. But then they named it the Wii, and the community let out a collective gasp. People couldn’t stop making fun of it. “Don’t they know what that *means*?” they would ask.Meanwhile, Nintendo sat back and smirked as people made “playing with my Wii” jokes, because the buzz they had created led to sales that over a year later are still making it difficult to find a Wii on store shelves.

“IBM’s XIV creates solid blocks product” doesn’t create news. But “IBM positions XIV as digital media solution” *does* create news, because it looks stupid. Thus, you talk about it, and oh by the way, you mention that it’s a solid blocks product, so why on earth would they advertise it for something traditionally dominated by file-based storage. IBM has now tricked storage analysts into creating buzz *for* them by saying something stupid that entices industry wags to correct them by complimenting them.

Or, on second thought, maybe I’m giving them *way* too much credit…
Wolfgang Voigt ( WoVo ) on Wednesday, 27 August, 2008 at 6:42 am

Now we can see the results of IBM’s overhauling of the XIV Nextra storage system : The name “Nextra” is given up, the Supermicro SC8xx/SC9xx servers are replaced by IBM DS3200- like server enclosures with 12 HDD slots and all the minor components like power supplies, fans, Ethernet switches, UPS modules etc. are brought to IBM standards.
But the most interesting point is the use of the same HDD-equipped server enclosures for the data and interface modules. Interface modules have FC and additional external GigE adapters. Data modules contain only adapters for the
internal GigE connections. This represents an ongoing commodization and is very similiar to the Google File System ( GFS ).
The main question is the useability of commodity hardware components to build sophisticated storage systems. In my opinion three ways of future storage architecture design exist: (1) Pure use of cheap commodity server components, putting all the intelligence in a Application Specific Operating System ASOS – this approach is used by IBM/XIV and Compellent. (2) Use of special programmed FPGAs for critical path functions – this is the method used by BlueArc, DataDirect Networks and Atrato. (3) Develop own ASICs for the critical functionality – this is the lonely way of 3PAR. (3) seems to be the best way at a first glance, but what happened with 3PAR ? They sell now the S-Series storage systems since September 2002 with only minor modifications and are now 3 years behind their original schedule to announce the T-Models in summer 2005. The reason for this is the ASIC development process, which is very time- and resource- consuming, buggy and capital-intensive. So I think it is much better to use the commodity and when necessary the FPGA approach for creating future storage architectures. Therefore the IBM is on the right way, but some improvement cycles of the XIV storage system are necessary, for example true Geocluster support like YottaYotta ( or Moshe Yanai’s new company Axxana ??? ), PCI Express within the interface and data modules, 10GigE within the frames ( not only between the frames ), tiered storage support ( at present negated by XIV ) and so on.
stewey on Wednesday, 27 August, 2008 at 6:56 am

Ok, let me take a crack at this. I’m a lonely storage engineer at a major enterprise company. I don’t sell storage, but I [have] managed, consumed, and engineered many storage products. Which, I’m happy to say, now includes XIV.

This product will cater to companies that require _large_ amounts of ‘block-based’ capacity in a way that’s SO much easier to manage, includes all the bells and whistles (and counting), and is such a great value it will be very hard to pass up. Especially if your CFO has seen how much it costs!!

Who is it targeted at? Good question. Obviously, they are trying to make inroads in the financial markets. Given current economic situations, at least one big one is likely to bite. Once they get that on their resume, things will look good.

Regardless of the claims out there, this thing is fast. But there’s a lot to learn with regards to how well that performance will scale. 50 host, 100 host, 200 hosts, etc… The good news is that those companies looking for fast cheap big storage, this is right up their alley.

Ok, so enough of the selling. There are some flaws with the device that may make it hard to sell to those well-experienced big storage shops out there. The first is the RAID X technology. It’s scary!! It’s well mitigated by disk scrubbing, completely mirrored redundancy, and fast rebuild times. But still scary. Other issues are that it’s so new that few SRM / Monitoring tools support it. Even those made by IBM. That’s a tough one for large enterprise corps. Non-disruptive code upgrades are still questionable. Gen 1 didn’t have it and we’re waiting to learn what Gen 2 will have.

Well, I hope that’s an invormative post. Not sure if it’s YottaYotta cube worthy:)
Richard on Thursday, 28 August, 2008 at 2:31 am

Robin,
It may be that the IBM XIV is too little, too lateâ€¦ unified switched 10G fabric (with native FCoE storage) in the New Generation Data Center architecture changes the gameâ€¦i.e. it is much more efficient to run multiple file servers off virtualized servers, front-ending block level storage. In this scenarioâ€¦multiple storage controllers are well defined and relatively simpleâ€¦. some of the intelligence can now move up-stream to storage servers, set dynamically for the task on hand.

Having to triplicate disks with single point failure backend controllers to drive these disks â€¦ is not really a great innovation in todayâ€™s data center power & uptime climate. Also, we have the complexity of applying multiple backend â€˜controllerâ€™ resources to move data over a 1Gbit switch to deliver fast rebuildsâ€¦. and ignoring disk drive write bandwidth. In addition, there is a â€˜single-point failureâ€™ backend â€¦ single controllers driving 15 disks. These controllers that canâ€™t be hot-swapped â€¦ this is what you get for using â€˜commodityâ€™ server boards as controllersâ€¦ so there is the need to protect the system with complete, multiple â€˜spareâ€™ storage enclosures â€¦ which, in turn, requires multiple disk rebuildsâ€¦. and more power.

All of this is not very well thought outâ€¦. I think it started with the inability to provide fast RAID 6 on commodity hardware and the rest of architectural â€˜innovationâ€™ followed from there.

I remember EMC and NetApp defending (for years) the benefits of RAID1 until they â€˜discoveredâ€™ the need for RAID5 â€¦. plus some very recent statements by EMC personnel that RAID 6 was not required … and now it is a large ‘benefit’.

It is interesting to note â€¦ how so many… so well educated people â€¦are suddenly willing to change their tune. I suggest that we can expect the same from IBMâ€¦ they are now probably retrofitting the system to a 10G fabricâ€¦ trying to figure out how to make their â€˜commodityâ€™ backend controllers hot-swapâ€¦ perhaps RAID 6 protection within the disk enclosure â€¦ this should be interesting.

Please donâ€™t send the cube.
Wes Felter on Thursday, 28 August, 2008 at 12:14 pm

Speaking of commodity hot-swap controllers, I noticed that Intel displayed an x86 Storage Bridge Bay box at IDF. Looks and feels like a storage controller, but Intel did all the design work for you.
TimC on Thursday, 28 August, 2008 at 3:15 pm

@Richard:

What exactly are you talking about?

First, why would I put a fileserver in FRONT of block based storage. Blue Arc and NetApp integrate it, and save you a lot of unnecessary hardware, expense, and management overhead.

EMC and NetApp defending RAID1? What? NetApp has *NEVER* used Raid1. EVER. They used RAID4 from the start, and moved to RAID-DP after that (And I believe were the first to use dual parity).

The REASON they moved to RAID-DP was the integration of larger and larger ATA drives which are more likely to fail, and have longer rebuild times, thus requiring *better* protection.
Richard on Friday, 29 August, 2008 at 6:42 am

TimC,
To eliminate specialized expensive hardware like BlueArc and NetApp.
General-purpose virtualized storage servers can easily run open source … NFS, pNFS, etc., and drive a much simplified RAID-protected storage bricks, over unified switched fabric.

I think you will find that earlier NetApp appliances used general -purpose motherboards and software RAID1 …. later followed by software RAID4. RAID DP is a recent addition…. no, they were not first to demonstrate dual parity.
Nathan on Friday, 29 August, 2008 at 11:08 am

FYI, NetApp’s SyncMirror product is basically RAID 1 between RAID-DP arrays. I don’t think NetApp has ever advocated plain RAID 1 (I believe SyncMirror approximates what the consumer industry calls RAID-16).
Joe Kraska on Saturday, 30 August, 2008 at 9:05 am

I, too, have been puzzled by the positioning of the IBM/XIV offering. They are 2.5X as expensive as the NetApp in Tier 2 (and I mean that optimistically), and while their web pages talk a big performance story, concrete performance data seems to be unavailable.

The system does have some interesting properties. Virtual “RAID 10” as a distributed system is kind of cool, but what would be cooler is something that doesn’t eat the bytes and drive the “per usable GB” off the scale.

The product’s choice of “RAID 10” is defended by Moshe on the grounds that advice in industry is to “mirror everything”. This is fine for what its worth, but me, when I mirror my systems, I like the mirrors to not be on the same system, if you know what I’m saying. Ever had it rain in your server room? (I have!).

I confess my assessment of this product could be myopic: I am very very aware of Tier-2 systems, but tend to be less well informed regarding true first class Tier-1 systems. The target market for this product appears to be folks who want, but cannot afford, Tier-1 SANS.

Joe.
Martin G on Monday, 1 September, 2008 at 3:15 am

As a humble storage manager who manages an estate in the media world; I think that XIV as a block device actually has some potential. The mirror everything approach is exactly what the likes of Omneon do at their file-system level, to the extent that when challenged they will admit that their disk is as efficient as RAID-1 from a capacity utilisation point of view. In fact Omneon allow you to make multiple copies if required for performance, availability etc.

Parity-based RAID is beginning to breakdown, RAID-5 in the terabyte+ disk world is a non-starter; RAID-6 in the 2 terabyte+ disk world is a little scarey don’t you think? So, some kind of mirroring needs to happen! It all comes down to how you do this, how you minimise rebuild-times, maximise performance.

I actually think it is funny that Moshe (who never really got RAID-5 etc) actually pops up at a time when mirroring may come back to the fore. And the irony of EMC pointing fingers at a fully mirrored box considering the crap they used to spew about RAID-5 before they had a workable implementation is absolutely fantastic!!

The XIV array does have a number of weaknesses/issues but I don’t think the fully mirrored situation is one of them. Now non-disruptive code upgrades, 1 GBE backplane, relatively limited capacity and the like of async replication is alot more of an issue.
TimC on Thursday, 4 September, 2008 at 8:37 am

@Richard

I’m sitting next to one of the first boxes they ever produced and it’s RAID4, so I still don’t know what you’re talking about.

Furthermore, if they were advocating RAID1 as you claim, it should be trivial for you to provide us a whitepaper, or document of some sort verifying those facts. I can readily find them for both RAID4 and RAID-DP.

Again, why would I put a second box in front of block storage and add more management overhead when it buys me nothing? Spitting out a bunch of industry coined phrases doesn’t make it a better solution. You must be in marketing jamming all that into one paragraph.

And finally, if NetApp weren’t the first major storage vendor to adopt raid-dp, as you were so quick to brush off, who was? You seemed to have left that tidbit out.
Joe Kraska on Thursday, 4 September, 2008 at 8:35 pm

On the broader topic of RAID levels required: it’s not a specific RAID level that’s required, but rather protection from data loss. I know this may seem a bit… lecturing?… but it’s really the time-phased protection from data loss that is required here, not a specific RAID level. Consider Isilon: they don’t do RAID in any classic sense of the word (although of course can do RAID-6 and “beyond” in effect), but the critical capability they offer for mission critical ops is RAID rebuild rates that are on the order of 20X beyond their competition. These rebuild rates of course reduce the chance of data loss by a factor of 20X in and of themselves, disregarding what protection mechanism they use otherwise.

I think the RAID-6 mantra is a bit overused. It’s protection from data loss that we need, and for most architectures that means minimizing exposure time to the the last allowable failed component… not some specific RAID level.

Joe.
TimC on Tuesday, 9 September, 2008 at 11:36 am

@Joe

Maybe because people actually want performance for a dataset other than streaming media?

That “AWESOME” protection isilon provides falls over and dies as soon as random small file workloads (aka: REAL WORLD workloads) are thrown at it.

Your 20x number is completely unfounded.
Richard on Wednesday, 10 September, 2008 at 10:22 pm

Tim,
This was way back, during their original product development, before their R4-based products and white papers. Its possible that I am confusing their software implementation of Raid 4 with hardware-based controller solutions.

I remember that for a long time, they were publically very negative about the general block-based SAN market until they were in a position to deliver the technology to support it.

Perhaps we should take a look at their record with Raid 5, which may have conditioned this position. Why did they go for so long with Raid 4 backends when it is unsuitable for IO intensive applications? Did they ever, during that time, acknowledge the benefits of Raid 5 ?

Raid DP is a diagonally striped, dual Raid 5 scheme â€¦ just simpler to compute than Raid 6. As I recall, similar algorithms were first researched and published by IBM. It would be interesting to find out if Raid DP developed in-house or purchased from an outside source and then patented ?

I may be wrong, but it seems that the actual RAID 6 controller was first implemented by a small company based in Taiwan, closely followed by a lot of publicity by Intel for their R5/6 hardware-assisted Xscale processors. Open source example of Raid 6 has been around for a while.

No, I am not in marketing.
Joe Kraska on Thursday, 11 September, 2008 at 10:02 pm

I like to point to vendors as exemplars for their approaches in specific areas. Isilon’s ability to conduct raid rebuilds in a distributed fashion and thereby reduce the risk of duplicate failure by reducing time exposure to risk is exemplary. Other vendors could learn from this and such similar approaches.

I’ve had the usual-suspect vendors self-report 20MB/s or thereabouts RAID rebuild rates to me. Isilon reports, for a large cluster, that the RAID rebuild mechanism is affected in a distributed fashion by every head in the system. Rebuild rates are 400MB/s and beyond. That’s 20X conservatively. For a full sized cluster, actual advertised rates are far higher than 400MB/s. And I believe them.

As for your random workload comment, I am sure you are right. Equally, I don’t particularly care. Storage is a tool, and for the proper use case, Isilon’s product is good one.

Lastly: my remark was not “unfounded”. You may believe it is mistaken, but unfounded it is not.

Joe.

Trackbacks/Pingbacks

IBM’s got some ’splainin to do in storage — Storage Soup - [...] the beef”? is the phrase I’ve heard used at the end of the analysts’ analyzing. Robin Harris’s StorageMojo blog…
IBM’s got some explanining to do « Rowan O’Donoghue’s Blog - [...] the beefâ€? is the phrase Iâ€™ve heard used at the end of the analystsâ€™ analyzing. Robin Harrisâ€™s StorageMojo blog…