Xiotech’s ISE: beast or gamine?

by Robin Harris | Sunday, April 13, 2008 | Architecture, Disk, Enterprise | 14 comments

What’s behind the hype?
Congrats to the Xiotech team on generating the most interest at SNW. Their demos were crowded with the curious. Their claims bordered on the implausible, but the credibility of the engineering team kept derision in the corners.

I talked to Ellen Lary, engineering VP, and Steve Sicola, CTO, as well as taping the very helpful Chad. Before going any further, let’s roll the 103 second – less if you skip the credits – tape:

How do they do it?
Darned if I know – they weren’t talking. Reading between the lines:

Systems thinking: each disk drive is more powerful than that 1980’s workhorse VAX 11-780 supermini. Put that intelligence to work!
Clean code: Xiotech has had free run of Seagate’s best thinking – so they’ve gotten rid of the firmware hairballs inside disk drives to create a distributed architecture where components cooperate in a trusted environment instead of competing. Their disks won’t work with your Brand X controller.
Spare no expense: the Xiotech team is going for the gold with a top-of-the-line resource-intensive architecture. If you have to ask how much it costs you can’t afford it.

With 350 IOPS per 15k FC drive claimed – and Sicola said more was coming – this is a lot of bang. When we see some pricing we’ll know about the bucks.

The value proposition
Xiotech’s bet is this: all is forgiven if it kicks butt 7×24 for 5 years. Each ISE is a storage utility writ small. With these building blocks, they promise, you can build an infrastructure whose availability and performance – still the storage ne plus ultra – will beat anything from EMC, IBM or HP.

A worthy goal, indeed.

The StorageMojo take
Just when EMC is assuming that Maui’s new Ãœber-layer will win them the undying cashflow of multinationals, Xiotech comes along and exposes EMC’s feet of clay.

That sucking sound you hear is EMC emptying the datacenter’s coffers to run 7×23.999. If Xiotech can win even 10% of EMC’s business, they’ll be a $1 billion company sooner than they dreamed. And their VCs will be high-fiving in Aspen this winter.

NetApp, IBM and HP should worry as well. It sounded like Xiotech was OEM’ing the ISE to others – if so it makes sense to add them to the product line.

The disk-in-a-box model needed a thorough rethink and kudos to Xiotech for doing it. But many promising – on paper – products have failed. Once Xiotech is shipping and there is independent testing – then we’ll know what they’ve really got.

Comments welcome, of course. The indefatiguable Beth Pariseau homes in on the Atrato/Xiotech nexus.

14 Comments

John Spiers on Monday, 14 April, 2008 at 7:23 am

This new Xiotech product is the biggest bunch of marketing hype Iâ€™ve ever seen. Issues with this product are the following:
â€¢ They operate under the premise that the sealed box of drives will be able to repair itself for 5 years. The truth is that it’s substantially less than this.
â€¢ The self healing algorithms essentially insure that there is not a predictable rate of performance from the array – it can be literally all over the map. Imagine failing a disk surface – you just lost 1/4-1/2 of your performance from that disk, and in the case of a head crash, you practically guaranteed the likelihood of silent data corruption from magnetically charged particulates floating around in the drive, and T10-diff won’t save you.
â€¢ Box contains a bunch of disk spares, which means you’re paying for idle capacity and performance up front. Once you run out of spares the entire system is down, and after 5 years you are assured that this will happen, and you are stuck with replacement.
â€¢ I have data that proves that this box is still exposed to a double disk fault from BER events during reconstruction onto a hot spare, and there is no RAID 6. RAID 10 reduces this substantially, but doesn’t eliminate it and is yet another hit to usable capacity.
â€¢ Most of their improved reliability assumptions are based on having an ideal environment of vibration isolation, power and cooling. Most of today’s enterprise class storage systems have these same chassis designs. What blows these assumptions apart is the varying environment in the data center, which they can’t control.
â€¢ There are serious holes in their mathematical assumptions because they’re based on the way drive guys calculate reliability, and I know the formulas from my past life.
Bottom line: customers pay for an expensive box that can’t compete on a $/GB basis, has a 50% chance of complete replacement before 5 years, 50% chance of complete data loss during the 5 year period, a 100% chance of replacement in the 6th year, with unpredictable performance.
Robin Harris on Monday, 14 April, 2008 at 10:10 am

John, I can see several strategies that would enable Xiotech to back up their claims.
-Spare drives. 40 drives over 5 years should only have a handful of totally dead failures. The rest would be completely or partly “fixed” in situ.
-None of the performance problems you mention are any different from other arrays. Head crashes slow them all down. Totally failed drives – see spare drives above.
-Everybody pays for idle capacity – that’s what 30-50% utilization means. Yes, it is more of a problem for Xiotech customers – but it balances against lower downtime. Instead of buying disks you buy another box.
-I doubt these guys are going to get caught by the unrecoverable read error problem There are multiple ways around it and as we learn more I’m confident we’ll find they chose one.
-Some arrays have similar chassis designs, but the huge majority of TBs go out in a standard 3U boxes from Xyratex and the like.
-Since we haven’t seen pricing – and if you have please share – it is hard to say how much all this wonderfulness is going to cost.

It’s clear that Xiotech is aiming at the enterprise with this box. For many of those folks the promise of never going down will be hard to resist.

Robin
zax on Monday, 14 April, 2008 at 12:20 pm

Xiotech is making a play at a market dominated by huge players with a long history of running critical businesses on their back.

It appears to me that Xiotech is trying to add a new tier at the top. Looking at the growth estimates for storage, how does this fit in? I don’t see facebook, Hulu or any of the other hot companies with gobs of unstructured data running to get 100% uptime storage for X times the cost of 99.99% uptime storage. EMC (I hate to admit it) is making the right play at the right time. Make storage cheaper and more scalable.

Kudos to Xiotech/Seagate from taking on this challenge, but I just dont see financial institutions and other companies with ultra critical data running to ditch their old storage (IBM, HDS, EMC) for Xiotech.
Labrat on Monday, 14 April, 2008 at 1:47 pm

All of the information contained herein is available in Xiotech press releases, published publicly by ESG, SPC, Seagate or deductive reasoning.

Robin: Pricing can found with the SPC results. The 146GB 20 drive model (economy?) costs $20820. The 73GB 40 drive model (performance?) costs $36500. Cost per drive seems to somewhat dominate the market, so the other models (the 4.8 TB ‘balanced’ model – guessing either 40 146GB-drives or 20 300GB-drives, the 16TB ‘capacity’ model – mathematically could be 20 1TB-drives) are probably in the same neighborhood price wise.

John: many of your claims struck me as quite… aggressive, to say the least. “Issues with…” your argument “…are the following:”

“â€¢ They operate under the premise that the sealed box of drives will be able to repair itself for 5 years. The truth is that itâ€™s substantially less than this.”
– I admit I do not have the knowledge of the inner workings of disk drives to dispute your claims. However, one could still ask: Is there any evidence that the given premise is true? Look at the ESG Lab Validation on the ISE: There were two groups of ISEs running drives with a MTBF of 100khrs. 87 ISEs running 2.5″ drives had 0 failures/replacements in .63 operational years. 60 ISEs running 3.5″ drives had 0 failures/replacements in 1.07 operational years. Thats nothing like 5 years, and as time progresses probability for need for replacements goes way, way up, particularly with the ISE. However, Cheetah 15k.6 drives, which are likely to be used (fibrechannel/Seagate, put 2 and 2 together), have a MTBF of 1.6mhrs (perhaps Savvios are similar). This would make the testing roughly comparable to 0/87 systems requiring replacements within 10 years, as well as 0/60 systems requiring replacements within 17 years. Admittedly, that number seems absurd, but evidence points toward that rather than “a 50% chance of complete replacement before 5 years, 50% chance of complete data loss during the 5 year period, a 100% chance of replacement in the 6th year…” So far that statement looks unfounded and invalid. Also, logically, a service event is required when there is not enough sparing for a hard drive to fail without data loss. Requiring replacement and complete data loss are not equivalent.

â€¢”…in the case of a head crash, you practically guaranteed the likelihood of silent data corruption from magnetically charged particulates floating around in the drive, and T10-diff wonâ€™t save you.”
This I know nothing about except what I can mentally picture; I would love to alleviate my ignorance. I can imagine it could make repairing the drive impossible. However, because types of failure aren’t specified, or differentiated in the ESG report, I don’t think your statement affects my previous argument.

“â€¢ The self healing algorithms essentially insure that there is not a predictable rate of performance from the array – it can be literally all over the map. Imagine failing a disk surface – you just lost 1/4-1/2 of your performance from that disk.” & “Box contains a bunch of disk spares, which means youâ€™re paying for idle capacity and performance up front.”
Someone correct me if I am way out of line, but one would be “Paying for idle capacity and performance up front” which is then used to make up for “failing a disk surface”. One failed disk surface gets removed, and the sparing then takes care of it. Thus, I don’t understand your argument at all. And even if my statement there is entirely wrong, the self-healing algorithms affect performance and storage by no more than 20% as the drives deteriorate. I fail to see how this is “All over the map”. Also, even with the hotspare-waste, the system was #1 on SPC-1 and SPC-2 price performance ratings upon its release. If you were to include further performance loss from drive deterioration, it would be #4 on SPC-1 and #3 on SPC-2 ratings as of its release, which is still better than the #10 (of forty-something?) spot by 50% and 80% respectively.

“â€¢ Most of their improved reliability assumptions are based on having an ideal environment of vibration isolation, power and cooling. Most of todayâ€™s enterprise class storage systems have these same chassis designs. What blows these assumptions apart is the varying environment in the data center, which they canâ€™t control.”
The improved reliability assumptions are supposedly based upon the ISE’s chassis design. That is supposedly how vibration and cooling are optimized. While the ISE cannot control the environment in the data center, bad conditions in a data center would devastate any drive array, and it is in the interests of customers to make sure conditions stay excellent.

“â€¢ There are serious holes in their mathematical assumptions because theyâ€™re based on the way drive guys calculate reliability, and I know the formulas from my past life.”
I, as well as others, would surely be fascinated to have some insight as to how it is that drive guys calculate reliability, and what the problems with that are. I’m a math guy, so it would be great to see them.

â€¢ “Bottom line: customers pay for an expensive box that canâ€™t compete on a $/GB basis, has a 50% chance of complete replacement before 5 years, 50% chance of complete data loss during the 5 year period, a 100% chance of replacement in the 6th year, with unpredictable performance.”
“expensive box” – Xio was saying there would be no price premium. “â€¦can’t compete on a $/GB basis”. Previous statement holds. 20-35k for 16TB looks just fine to me. (Paraphrase) ‘Half fail/replace rates through year 5 and total through year 6’. Previously discussed, and determined unfounded. “â€¦with unpredictable performance”. Even if itâ€™s true that the performance is unpredictable (by as much as 25%), it is still remarkably good, leaving your statement misleading at best.
Whether it can really hold up, only time will tell, but its a good story. And if/when a 16TB ISE ever needs replacement, migrating the data off of it could be an astounding nightmare, especially without downtime. But this is nothing new.
John Spiers on Thursday, 17 April, 2008 at 2:05 pm

Ok, I obviously didn’t make myself clear by throwing a lot of high level statements out there. I will elaborate:

I spent 15 years in the disk drive industry and I’m still not sold on this technology. In fact Iâ€™ve considered developing it myself, but as you dig into the details itâ€™s hard to build a business case when itâ€™s full of risky assumptions.

Their data is based on drive reliability models that are from the manufacturer, which I know from personal experience are full of sunny day assumptions, and are being proved wrong on a regular basis. See http://www.cs.cmu.edu/~bianca/fast/index.html

Granted, many of the â€œfailedâ€ drives that are retuned from the field are diagnosed NDF. Data has proved than many of these drives fail erroneously because of their environment. For example, if the disk drive is mounted into a poorly designed carrier or chassis, the rotational vibration of the disk drive in the carrier or those adjacent to it can cause drives to pop errors and fail. When you take it out of the carrier and test it, itâ€™s perfectly fine. Of course most disk drives have incorporated an accelerometer to measure this vibration and can adjust their servo system on-the-fly to compensate for it.

Xiotechâ€™s analysis says if you mount it and cool it appropriately it eliminates these types of failures, thus improving overall reliability. This is very true, but like I said, most of the storage vendors have figured out proper disk drive mounting and cooling years ago, and the sophistication of todayâ€™s disk drive servo and read channel technology prevent many of these failures anyway.

What the disk drive guys don’t talk about are things like thermal asperities, erasure, media corrosion, firmware bugs, contamination, and the many things that escape the factoryâ€™s tests on a regular basis. All it takes is one contamination event, a bad batch of heads or media, or a firmware bug to wreck your reliability calculations. I could write a book about all the things that can go wrong with drives that are not considered for these types of designs. In fact, even the disk drive guys are still trying to figure out what causes many of their field failures that donâ€™t show up during reliability testing.

I believe the box is still exposed to a double disk fault from BER events during reconstruction onto a hot spare, because it appears that there is no RAID 6 or equivalent protection against a 2nd drive failing before its data is successfully reconstructed onto a spare. Background scrubbing and using drives with a BER of 1 in 10^15 or better reduces this risk substantially, but doesn’t eliminate it.

Again, what blows their assumptions apart is the varying environment in the data center, which they can’t control. Having a drive replacement once a month in a standard array in one of these poor IT environments may be Ok, but if the environment takes out one of these closed boxes in 6 months it may not be acceptable.

What really scares me is their claim that they can fail a disk surface in a drive and use the remaining good surface(s). First of all, you just lost 1/6-1/2 of your performance of that disk drive, because all heads read and write in parallel. The more heads you have the faster the disk drive. If that drive is in a striped set, it degrades the performance of the entire set. What scares me most about this is the possibility of a head crash, which can result in magnetically charged media particles floating around the drive flipping bits everywhere. This practically guarantees silent data corruption, and T10-diff may not save you. If a disk drive surface fails in a standard array, the drive is dead, like it should be.

Seagate peddled this to me and everyone else in the industry. I can’t imagine Xiotech, a former Seagate owned company, being the only one “smart enough” to want this technology.

John
Richard on Monday, 21 April, 2008 at 2:03 am

I agree with John.

“If a disk drive surface fails in a standard array, the drive is dead, like it should be”.

The problem with their approach is that a faulty drive can not be easily replaced and they can not allocate global sparing (packaging issues) …. hence the marketing spin. They would be wise to add Raid 6… but this may be a patent or performance related problem.
Bill Todd on Wednesday, 23 April, 2008 at 3:52 am

Shucks – I was hoping that your video included Richie or Ellen, whom I haven’t seen in the last couple of years.

The marketing claims don’t just border on the implausible, they enthusiastically occupy its territory in force (which is, I guess their job, but I don’t have to tolerate it without comment – I’ll pick on their ‘ESG Brief’ first as an example).

Eliminate any impact from disk failures? I don’t think so, though I do believe the allegation that at least *some* glitches that would normally cause a disk to be yanked from a conventional system may be recoverable (of course, all it would require for the conventional system to do this as well is cooperation with Seagate at the same level that Xiotech has).

Eliminate performance impacts and increased data integrity exposure caused by (full or partial) disk failures? Nope: any time you have to rearrange (let alone rebuild) your data to return to full robustness after losing some of your physical storage it’s going to affect the performance of on-going operations and data integrity protection until the reshuffle/rebuild has completed – period.

Eliminate the risks associated with replacing failed disks? Well, yes – but other competitive ‘self-healing’ products can tolerate disk failure without replacement as well: they just retain the option of replacing the disk if you decide to (and I’d personally prefer to have that option available – e.g., so that I can wait until there are enough failed units to be worth replacing and then do so en masse during a period of low activity such that if I make a mistake I can correct it with minimal-to-zero system impact, though this requires either at least brief system down-time or a system that can recognize such special operations and hold off on initiating automatic response to a potential mistake until it can be corrected).

Reduce the need for ever-more-complex RAID implementations? Actually, the opposite: the self-healing facilities in a product like the ISE are considerably *more* complex than conventional RAID – all they do is isolate this complexity from the system manager (which is certainly a good thing in itself, but nothing that a conventional system couldn’t do equally well and arguably more flexibly by reducing hardware vendor lock-in).

And there’s certainly no reason that the ‘unit of failure’ in a conventional RAID need be ‘the entire disk’: that would be downright silly if the only problem was the rare unreadable sector that could be reconstructed from the available redundancy and then rewritten to a new, healthy location.

(As a specific aside, I’d be interested in knowing just how often a single disk surface goes bad without affecting the rest of the disk as, e.g., a head crash likely would: they do make a lot of noise about this, but don’t seem eager to quantify its importance.)

The absolutely inane (and blatantly misleading) quality of the ESG spin is best summed up in its sentence [quote] whereas the disk buying decision has traditionally been “you can pick any two from reliability, performance and capacity” it would be replaced with “it’s reliable, with performance and capacity scaling as needed” [end quote]. But then I guess most marketing drivel is not intended to stand up to any kind of close, technical scrutiny.

Moving on to the ‘ESG Lab’ paper and ‘lab validation’ (many of whose ‘observations’ are lifted word-for-word from the previous one, so I won’t bother trashing them again here), we find ‘expert third-party perspective’ (complete with funded-by-Xiotech objectivity) from a ‘lab’ that (at least in this case, and contrary to the implication that it attempts to convey both in its introductory blurb and later in the body of the ‘report’) appears to have conducted no ‘hands-on testing’ (or even any measurements) at all, but only accepted the data that Xiotech provided to them (plus the results of some industry benchmarks). I don’t mean to question Xiotech’s integrity here, by the way – only ESG’s attempt to pass off vendor-supplied information that it has accepted in an extremely uncritical manner as anything resembling ‘testing’ (or even ‘research’).

(The writers clearly don’t know what they’re talking about when they state that “Mean time between failures (MTBF) is a rating provided by manufacturers to help predict the useful service life of a hard disk drive”, but then this *is* marketing rather than technical material, after all.)

Now, it’s not too difficult to believe that Xiotech’s careful attention to the drive environment has resulted in improved drive reliability – even though there’s reason to suspect that actual (non-NTF) drive failures even in a conventional installation would have been lower than the manufacturer ratings, in part because the test period was only 15 months long rather than encompassing the entire 5-year nominal service life over which those MTBF ratings are calculated, so lacking comparative measurements with a conventional system one should not necessarily conclude that Xiotech’s drive reliability is dramatically better. Among other things, this makes the graph in ESG’s Figure 3 a complete travesty (and intellectually dishonest as all hell): it contrasts its projection of *tested* results for the Xiotech drives in their surrounding environment with *predicted* results for a conventional set of drives in some unknown environment assumed by the disk manufacturer, rather than setting up the two racks side-by-side and performing an actual comparison.

And the ‘zero NTFs’ result may simply be a consequence of returning questionable drives to service until they fail incontrovertibly: when put that way, it becomes clearer that this may be a mixed blessing, and the real question becomes how many of the failed drives *were* returned to service for a while, thus endangering data integrity perhaps somewhat more than it might have been endangered had they been replaced with new drives.

They touch on the question of ‘extended RAID rebuilds’ multiple times (they were also mentioned in the earlier ESG blurb, but I didn’t bother to mention it before), but IIRC Xiotech itself only claimed that it improved conventional RAID rebuild time by something like 43%. Not only is this considerably less than one might hope for, but it’s nothing (once again) that a more conventional system couldn’t achieve with similarly advanced RAID-like approaches (I’m assuming that they’re at least something like the parallel-rebuild mechanisms that I suspect Xiotech used in Magnitude and Richie and Ellen helped introduce in HP’s EVA series).

It does appear that the ISE is well-designed and aggressively priced, but there’s a bit of sleight-of-hand in stating that it “performed nearly as well as the Sun StorageTek 6140 in SPC-2 testing with 50% fewer drives”: first, the Sun system out-performed it by a bit over 18% (not dramatic, but worthy of note), and second, the Xiotech box had only 37.5% fewer drives (it’s true that only 80% of them were actively serving data for the test, but trying to dismiss their existence in this case while actively promoting it elsewhere as the reason you shouldn’t ever have to replace a drive is an attempt to have your cake and eat it too).

Then the writers descend into utter gibberish in comparing IOPS and bandwidth to automotive horsepower and torque, plus manage to confuse the inherent bandwidth limitations of the dual 4 Gbps FC links with disk-bandwidth limits which so far exceed them that the number of disks involved has no relevance, despite their attempt to imply some.

And while I’m not acquainted with the internals of the ‘Jetstress’ benchmark, obtaining only 2904 IOPS from 16 (active) 15Krpm drives at an apparent average queue depth of at least 5 (given the 19 ms. response time) seems decidedly unimpressive: that’s only 181.5 IOPS/disk, which is almost exactly what you’d expect with a queue depth of 0. Even if a fair percentage are writes (where effectively only 8 disks’ worth of throughput would exist in a mirrored configuration), given the substantial queue depth that’s hardly all that much to brag about (and they conveniently don’t introduce any specific comparisons to evaluate). The same observations apply to the preceding test in Table 4 (8 KB random, 67% read).

Perhaps in lauding the performance ‘for a system of this size’ they’re talking about compactness (which it indeed possesses) rather than hardware complement.

It’s really sad how much money and effort is spent producing (and then reading/analyzing) drivel like this – just as the same is true for most of the content on television. I suspect that there are interesting technologies hiding inside the new ISE, but you certainly won’t learn anything substantive about them from Xiotech’s marketing material. Perhaps when I have additional time available I’ll take a look at the patent applications.

– bill
Bill Todd on Thursday, 24 April, 2008 at 2:32 am

Well, I waded through the patent verbiage as long as I could stand it, and found nothing particularly special there (let alone anything that I would consider patentable with respect to general architecture, but we know how that works these days). AFAICT Xiotech’s entire ‘advantage’ in its ISE is in said ISE’s ability to provide reliable virtualized storage to a host sufficiently intelligent to cobble it together into something more globally useful, plus cooperate with other ISEs in migrating data (though my quick scan didn’t expose anything that would indicate that this could proceed without some host direction – e.g., any indication that the LSEs formed a sufficiently intelligent cooperative that they could shuffle things around on their own to make local spares available to more needy compatriots, etc.).

It was not clear how RAID-like mechanisms are implemented within an ISE, so they could be relatively novel in the same sense that I’ve assumed Magnitude and EVA were. If I believed in the value of this particular level of local intelligence I guess I’d be more excited, but for highly-scalable (approaching EB) systems I just don’t think it’s adequate (still requiring too much from the centralized host/coordinator).

I.e., rather than anything resembling a ‘game-changing’ innovation I’d characterize this as a modest (though non-negligible) additional step along a road that I think leads in a somewhat sub-optimal direction (save possibly for the low-level disk-diagnosis-and-repair facilities which could actually break some new ground – though of course not any ground that couldn’t be covered equally well by any suitably intelligent controller). But of course I could be wrong.

– bill
Bert on Saturday, 3 May, 2008 at 6:15 pm

Interesting, and way to go Xiotech One thing I have learned in the business world is that Bigwigs from companies like EMC, LeftHand Networks DO NOT bother to spend the time slamming anotherâ€™s technology unless it worries them. The simple fact that the competition on this BLOG as well as others is spending so much time on trying to counter this one is telling, and I bet a lot of you reading this have noticed that also.

What this is doing is making a lot of CTO’s like myself dig a little deeper. I don’t have Xiotech in my datacenter now, however the pricing I’ve seen does not scare me at all. That coupled with the fact that LeftHand, EMC, and I have now seen Equallogic chiming in telling me not to look at it, well guess what? there is probably a reason they donâ€™t want to let it get rolling. More than even the claims of self healing is the technology they are using called RAGS ( I believe that is it) that intrigues me. For years my close contacts in the disk industry have talked to me about a time when code on the drives would be cleaned up to enhance disk throughput greatly. We came to expect code cleanup, and enhancements in the server processor industry, why are some trying to bash enhancing the disk throughput. The disk industry is a strange one, probably the most competitive left in the IT environment. I say to all be smart and treat your storage vendor decision different, you can tell what is worth looking at by just how much bashing of it you see, it’s more like a gang mentality in this segment, I think that the Xiotech of this gang is dangerous and scaring some pretty large competitors who are trying to take this member down before he gets too strong. Seems to be scaring some fairly small members also (look at this blog).

I think that this array is worth a serious look even if it only did 1/2 of what they
claim. The speed seems like it is going to be impressive, reduce some of my warranty costs, seems like it will scale well, should at least drastically reduce Hard Drive fails… Frineds who do have Xiotech in their It mix tell me it’s been
some of the most reliable, simple interface. Honestly though I have not had much exposure to Xiotech, but they reaction from the big and little dogs in the pen with this ones is telling me somthing interesting is happening, and the other dogs in this pen don’twant me to see it…. 🙂

This is getting fun.
John Spiers on Thursday, 8 May, 2008 at 12:05 pm

Bert,

One could certainly agree with you after reading this chain. Actually, I am worried, worried that customers are going to believe all the marketing B.S. and end up losing data.

There is a bunch of technology the drive guys could deploy that would make solutions like this more real. The dilemma the drive guys face is spending millions on device intelligence when none of their large OEM customers are willing to pay for it. At the end-of-the-day drives are electo-mechanical devices that wear out and sometimes have unpredictable behavior, and their quality is dictated by the deviation from specification of the latest batch of parts or deviations in the manufacturing process.

The test process screens out or “adapts” to most of the defects, but the fault tolerance margins are all over the map. These new self-healing Xiotech arrays should work as advertized if the underlying statistical data and assumptions are correct. My argument is simply that they are not correct and I would love to get on a white board and go through the assumptions and math with them.

If they prove me wrong we would consider pursuing these designs as a LeftHand platform, as we are hardware agnostic and donâ€™t make any of our own hardware today.
Joe S on Wednesday, 17 September, 2008 at 1:43 pm

“One thing I have learned in the business world is that Bigwigs from companies like EMC, LeftHand Networks DO NOT bother to spend the time slamming anotherâ€™s technology unless it worries them.”

That’s ridiculous. Do you really think that when a competitor explains to you why NetApp’s WAFL is horrible for performance that they’re lying ar when they show you that NetApp uses a fraction of the capacity they sell you that they’re making it up? Vendors point out deficiencies in the competition so you will see the value in their solution, not because they are worried about it. You must have hostile relationships with your vendors if you don’t trust them to be advisers.

Xiotech’s “self healing” drive pacs should scare the crap out of customers. How often will you plug a hole in your cars tire before you get a new one? Once? twice? You sure aren’t taking a cross country trip with 3+ plugs in your tire. Think of the ISE as your car; the data is your family and your 24×7 shop is the cross country trip. Keep your self healing voodoo. My family is worth a new tire; my data is worth a new drive. The free 5-year warranty is worth every penny you spend for it.
Eric Schoenfeld on Friday, 24 October, 2008 at 8:57 pm

“Vendors point out deficiencies in the competition so you will see the value in their solution, not because they are worried about it. You must have hostile relationships with your vendors if you donâ€™t trust them to be advisers. ”

This has to be one of the most entertaining comments I’ve read in a while. It appears as if, when Joe S wants to buy a Toyota he visits a Chevy dealership to inquire about Toyota’s capabilities. Then, after the Chevy salesman tells him what great cars Toyota makes, he’ll run over and get one…
jims on Thursday, 12 February, 2009 at 8:05 am

I used Xiotech technology since 2002. All I have to say is their technology rocks! So far whatever claim they have made it has come to be true. I have no reason to doubt this technology either. Hey, 5 years warranty? If all the other vendors are so confident about their products, why dont’e they back them up by a solid warranty? Oh, I forgot! That’s how the other vendors make their money, not on the technology the sell you but on the maintenance! I am sure Mr. John Spiers did not use the maintenance cost to calculate the cost/MB. After all isn’t all about TCO? For an SMB company, Xiotech has the lowest TCO. even if I wanted any other solution, I couldn’t afford it and i’s maintenance.
Alex Grigoriev on Wednesday, 25 February, 2009 at 3:05 pm

John Spiers,

“First of all, you just lost 1/6-1/2 of your performance of that disk drive, because all heads read and write in parallel.”

You must be out of storage field for quite a while. There is no way in hell you can keep different heads aligned at the same time on the same cylinder. Such misalignment was causing infamous “thermal recalibrations” on (long gone) disks with dedicated servo surface, with much coarser track density. On the moders disks with sub-micron track pitch, very little difference in the arms temperature brings the heads out of alignment instantly.

Trackbacks/Pingbacks

Further thoughts on self-healing storage — Storage Soup - [...] The inimitable Robin HarrisÂ summarizes his thoughts on ISE, and gets an interesting comment from John Spiers of LeftHand Networks…
A peek at Xiotech’s ISE from SNW « Storage Effect - [...] Curious about the mysterious ISE from Xiotech?Â A picture paints a thousand words, and a video does it all…