SNW update – Xiotech’s ISE and the dilithium solution

by Robin Harris | Wednesday, April 9, 2008 | Architecture, Disk, Enterprise | 10 comments

It looks like Xiotech is going to cop the “Best Announcement at Spring SNW ’08” prize. See the nifty flash intro.

I did speak to Ellen Lary, Engineering VP last night after going through their mobbed booth. Later today I have an appointment with Steve Sicola, Xiotech’s CTO. I’ll have a more complete report later. Here’s what I’ve gleaned so far.

Remember Atrato?
Interesting stuff:

Sealed unit starting at 1.5 TB. They had a 1 PB system on display in 3 54 RU – i.e. bigger than you use – racks.
5 year warranty and nifty blue LED light. Are we in a data center or a cocktail lounge?
Uses the draft T10 DIF (Data In Flight or Data Integrity Field, Data Integrity Feature – depending on where you read it – evidence that humans have a far greater problem with data integrity than computers do) standard to protect data within the array.
Uses Seagate’s own drive test software to attempt repairs on drives in place. Ellen said that about 70% of drives work normally after a power cycle.
If power cycling doesn’t work, the box can perform a complete reformat of the drive, starting with laying down tracks and proceeding on to what you and I consider “formatting”.
If a particular head is the problem, they can electrically disable that side of a platter while continuing to use the rest of the capacity of the drive.
It is cheaper to put in a couple of extra high-end drives than it is to make a service call. This won’t be true in China of course.

The best announcement that WASN’T made at Spring SNW
A company has figured out how to enable long distance synchronous replication. Here in America we like things big – including our idiots in Washington – and our disasters are no exception.

Hurricanes, earthquakes, volcanos, floods, blizzards, tornados and fires – and purblind ideologues – can lay waste to hundreds or thousands of square miles. So normal synchronous replication distances don’t cut it for gotta-have-it infrastructure.

The still-in-stealth-mode company’s Chief Engineer, Montgomery Scott, explained that by running dilithium crystals a little hot, a special hyperspace “tunnel” is created enabling . . . .

Just kidding. Their actual solution looked good in principle but the devil is in the details. I asked all the hard questions I could think of and they had answers for all of them, so it looks like they have something real.

Look for a fall announce.

The StorageMojo take
Those of you wondering if this year would be more of the same old, same old, fear not. The spirit and fact of invention is still strong in the ever-more-vital storage industry.

Comments welcome, of course. Would you use 1,000 mile synchronous replication if you could get it?

10 Comments

TimC on Wednesday, 9 April, 2008 at 8:18 pm

ISE:

As I’ve said in response to other people’s claims this is disruptive… I see this as one big fail. You and I both know drive vendors GROSSLY overstate their failure rates. Sun/stk tried this once with their blade storage crap. It turns out having to rebuild 5TB of data at a shot when you lose a *brick* completely sucks. And the odds of you losing another disk during that replacement/rebuild is infinitely greater.

So I have two hot spares, I lose one main disk, and one of the hot spares… and then I just cross my fingers and hope I don’t lose another? I’m sorry, it’s great they’ve decided in their lab environment that this works, I don’t buy it.
Chris Ribe on Wednesday, 9 April, 2008 at 8:49 pm

“Would you use 1,000 mile synchronous replication if you could get it? ”

Do I have to buy my own satellite? Can I still write /dev/random to SAN at 400MB/s?

Synchronous replication at 1.6 Mm isn’t difficult, it just makes bandwidth more expensive – more expensive than it is worth in almost all cases.

Alternatively, the proverbial station wagon full of tapes can handle Mm scale synchronous replication at reasonable bandwidth cheaply – if you can handle the 16 hr latencies.
InsaneGeek on Thursday, 10 April, 2008 at 4:45 pm

Xiotech ISE… maybe I’m reading things incorrectly but I’m still trying to find the real value that it brings to me. A number of arrays powercycle drives these days to try and get a failed one back (among other tricks). Color me crazy but what is really so great about reusing a drive that has known failures? They are basically just refurbishing the drive in the array, reformat it, muck with the heads, etc. it’s still a refurbished known *failed* drive. Does this do something to a failed drive that makes it less probable of failure than a new drive? To me it seems a lot of hype with not a lot of benefits. Compared to having a single spare drive on a shelf with a some hot spares in an array, what value does continuing to use a known failed part that has been “refurbished” in a mission critical environment. Maybe if it’s out in BFE where nobody can physically get to it I guess. I suppose they can proactively try to fix a failing drive, but just as simply a lot of arrays proactively spare a drive prior to failing… Additionally a very quick glance (not deep at all), I see no mention of protection against a double failure (spec sheets, glossies, etc) so I can only assume they believe that raid 5 or similar will protect me; which your blog has had a number of posts as to why that may not protect me, high-quality fibre channel drives get UER too (admittedly nowhere near as likely as ATA) as you fix & refix drives over 5 years is that failure rate going to magically decrease?

The only real benefit I guess is that you can now buy an array and possibly carry no maintenance on the system… course what business that would purchase an array >$50k would actually do that with critical business part (don’t you want software upgrades, etc)? So in the end Xiotech’s support cost will go down since they don’t have to come onsite, and they’ll continue to sell support contracts to customers making it a nice little bit of double-dipping profit; while the end user gets to use refurbished failed drives. Especially, if I’m doing my own parts by purchaing a couple of harddrives to sit on a shelf, what’s the point?

Having said all that, it’s a very interesting concept with people at least thinking about trying to self heal; and shake traditional methods up a bit. Maybe there are other things that makes it really earth-shattering cool that the bloggers have just forgotten to tell us about.
Christoph on Saturday, 12 April, 2008 at 8:23 am

Robin,
synchronous WAN replication is readily available even for lower-end storage devices. Netapps whitepaper about synchronous Snapmirror has lots of detailed information about the challenges (http://media.netapp.com/documents/tr_3326.pdf)
My favourite “difficult question” in this context is what happens if there is an interruption in the replication process (e.g. if the network is down for a couple of hours) – how does a system recovery from this problem and is a complete re-sync necessary?
pmwut5 on Monday, 14 April, 2008 at 5:30 am

I would tend to agree with TimC. For to long Storage Vendors have sold the mystical black box of storage placing a friendly arm around customers telling us all is well. I would want a detailed list of all the drive failure codes and then a point in time of all the drives errors that have occurred in a RAID group. Not the usual green icon telling me all is ok.

Replication over 1000 miles – depends on the application and the money. Recently switched on synchronous replication for an EMC frame over 200 miles with 60 servers on, of which 5 application groups complained of poor performance. The amount of data being sent and the speed of light in a vacuum are our only problems. Then again running dilithium crystals a little hot might be a solution one day.
John Spiers on Monday, 14 April, 2008 at 7:39 am

This new Xiotech product is the biggest bunch of marketing hype Iâ€™ve ever seen. Issues with this product are the following:

â€¢ They operate under the premise that the sealed box of drives will be able to repair itself for 5 years. The truth is that it’s substantially less than this.

â€¢ The self healing algorithms essentially insure that there is not a predictable rate of performance from the array – it can be literally all over the map. Imagine failing a disk surface – you just lost 1/4-1/2 of your performance from that disk, and in the case of a head crash, you practically guaranteed the likelihood of silent data corruption from magnetically charged particulates floating around in the drive, and T10-diff won’t save you.

â€¢ Box contains a bunch of disk spares, which means you’re paying for idle capacity and performance up front. Once you run out of spares the entire system is down, and after 5 years you are assured that this will happen, and you are stuck with replacement.

â€¢ I have data that proves that this box is still exposed to a double disk fault from BER events during reconstruction onto a hot spare, and there is no RAID 6. RAID 10 reduces this substantially, but doesn’t eliminate it and is yet another hit to usable capacity.

â€¢ Most of their improved reliability assumptions are based on having an ideal environment of vibration isolation, power and cooling. Most of today’s enterprise class storage systems have these same chassis designs. What blows these assumptions apart is the varying environment in the data center, which they can’t control.

â€¢ There are serious holes in their mathematical assumptions because they’re based on the way drive guys calculate reliability, and I know the formulas from my past life.

Bottom line: customers pay for an expensive box that can’t compete on a $/GB basis, has a 50% chance of complete replacement before 5 years, 50% chance of complete data loss during the 5 year period, a 100% chance of replacement in the 6th year, with unpredictable performance.
Andrew on Thursday, 17 April, 2008 at 6:25 am

InsaneGeek
Yes there is still software maintenance, but the idea is that they don’t have to expand their support infrastructure as they grow the company to keep up with hardware failures and they are passing those savings along to the users in the form of a 5 year hardware warranty. Based on the cost of extended hardware support from some *cough* other vendors I’d say that’s pretty significant. Xiotech was already one of the best performance per dollar vendors with a very easy to manage product and now they have significantly reduced costs and an architecture that’s more scalable. I’d be interested to see what kind of pricing pressure this puts on other vendors maintenance contracts.
StoragePunk on Thursday, 17 April, 2008 at 1:09 pm

I was at this show and I want to know why no one is talking about the ridiculous speeds that Xiotech is getting out of what they say are 20 drives in a ISE. It seems that most of the discussions are around the self healing and warranties they are touting but, Xiotech boasted also that (depending on the drive type in your ISE) it can attain speeds of what some whole systems currently deliver. They had a display up that showed 6 of these ISE’s connected together delivering (via IOMeter on 3 servers) over 600,000 IOP’s. The measure they used (since most were in disbelief I had them clarify) was a 512 byte, 100% read, 100% sequential. Granted this is down hill with the wind at your back but, that result was from (what they said) only 120 drives. That is bordering on solid state territory and is unheard of in the rotational world with that number of drives.
InsaneGeek on Friday, 18 April, 2008 at 12:26 pm

@StoragePunk

My guess would be why nobody cares about the IOP’s numbers is this: We’ve all seen the magic number tricks before by the different vendors. In their literature Xiotech say they get 300 iops per disk to get to 600,000 IOP’s to/from disk means that they need 2000 spindles, or even better to get 600,000 IOP’s from 120 drives is 5000 IOP’s per drive. To quote Southpark “Why would a Wookiee, an eight-foot tall Wookiee, want to live on Endor, with a bunch of two-foot tall Ewoks? That does not make sense!” saying a single disk can support 5000 IOP’s does not make sense either, it’s completely unrealistic, and is not indicative of it’s true performance. It’s pure smoke and mirrors, and I refuse to listen to any of their IOPs, SPEC, SPC, etc numbers because they *all* play games with them. Algorithms can help but this is “it really does break the laws of physics and goes faster than the speed of light” type of marketing speak. I’d love for it to truely be 600k IOP’s from 120 spindles, I truely would (heck everybody would), but you aren’t going to exceed the speed of light.

For example, I can get crazy speeds reading from /dev/zero and writing to /dev/null from a low end server that matches my hind end server but does it really tell me anything about whether I can dump the 32x processor box down to a 2x processor? What would you do if a vendor tried to pitch a new server saying it’ll perform just as fast and putting up a example “dd if=/dev/zero of=/dev/null” to prove it. I’d be laughing so hard I couldn’t stand up anymore (or be so pissed I couldn’t see straight). That’s what these storage vendors are trying to “show us” with their magic IOP’s. I’m tired of it and I’m done with BS made-up engineered numbers and I simply tell vendors that to their face anymore when they pull out their sheet of mythical numbers. (sorry about the rant… it just really annoys me)
Bill Todd on Wednesday, 23 April, 2008 at 4:50 am

One of Xiotech’s ESG marketing blurbs seems to make it clear that those ridiculous IOPS rates are to/from cache, not to/from disk.

– bill

Trackbacks/Pingbacks

SNWSpotlight: The Week In Review | PodTech.net - [...] It’s a busy time of year for storage folks. For the many who made it to Orlando for this…
Netapp’s on-board disk diagnostics — 21st century storage - [...] It seems that there areÂ more companiesÂ who consider on-board disk diagnostics. It definitely cheaper to check how bad the situation…