StorageMojo




Robin Harris    


The value of guaranteed uptime

May 1st, 2008 by Robin Harris in Architecture, Enterprise, Future Tech

What, if any, is the value of multi-year storage uptime?

Xiotech and Atrato promise 5 and 3 year uninterrupted service on their new arrays. Now it is time to ask, as some commenters have, so what?

After all, enterprise data centers are already well-equipped to deal with disk failures. RAID keeps the data available. 7×24 service replaces the failed drive with a new hot spare. Experienced storage admins paper over the cracks.

It isn’t like you’re going to fire all your storage admins just because arrays stop breaking.

Opex vs capex
The direct cost saving - no maintenance contract for x years - may or may not be reflected in the purchase price. From a buyer’s perspective there are 2 costs: the capital expense - capex - and the operating expense - opex. Opex is fully tax deductible in the year incurred, so it is easier to get.

Atrato and Xiotech need to think creatively about maintenance pricing.

Breaking into the glass house
Breaking into data centers with the promise of cost savings isn’t easy. The provable cost savings have to be 50% or better to get conservative data centers to change vendors. And it helps if there is a recession or the business is tanking. Motivation.

A case can be made that after adding up a standard array’s maintenance costs, random disruption costs and additional management it will be cheaper to go with the new product. The CFO will demand it.

But if you want to change the market, you have to change the way the market thinks.

Re-thinking the issue
Straight cost-displacement arguments aren’t going to have the legs both companies would like. They need a different model.

Enterprise IT is manufacturing plant - not an engineering testbed. It confuses the engineers because it seems like a techie haven - but it isn’t.

It is all about shipping product, each and every day. Like a real factory.

SPC
Everyone accepts that statistical process control has changes the face of manufacturing. A core idea behind SPC, reducing variability improves quality, is directly applicable to IT factories.

What Atrato and Xiotech do, ideally, is reduce IT ops variability. There is always a known level of performance. Availability is 100%.

Thus most of the usual dependencies are no longer dependencies. I/O slowdowns and timeouts should disappear. Drive rebuilds won’t impact performance. Admins won’t pull the wrong drive - which happens about 2% of the time - and bring down the array. And so on.

The StorageMojo take
Enterprises over-configure because they never know what is going to hit them - but they do know it will be at the worst possible time. Ideally they want to be ready to handle the biggest shopping day of the year - even after an array failure.

Workload variability isn’t going away. But wouldn’t it be nice if equipment performance and availability variability did?

That’s what Atrato and Xiotech are selling. I wish them luck communicating a value prop that strikes at the heart of what every other array vendor is selling.

Comments welcome, of course.

Holographic storage debuts next month

April 20th, 2008 by Robin Harris in Disk, Enterprise, Future Tech

After 8 years of hard slogging the folks at InPhase are ready to ship the world’s first holographic storage system.

As StorageMojo noted 2 years ago:

InPhase is claiming they will ship drives with removable holographic disks with 300GB capacity and 20Mbps transfer rate later this year.

I love holographic technology and wish InPhase the best, but I don’t believe they have a viable business with their technology - yet. The problem: 3.5″ disk drives will reach 750GB by the end of this year with much faster transfer rates. InPhase’s 20 Mbps is only 2.5 million bytes per second or only 9GB per hour. It will take over 30 hours just to fill one disk! I predict that hard drives will still be more convenient and fairly cost-competitive than this promising new technology.

But keep at it guys. Lightning will strike if your investors are patient enough.

So what’s different now? They’re saying they will ship next month instead of “later.” The transfer rate is 20 MB/sec. And the media archive life is 50 years - higher density and longer life than tape.

Limited availability until fall
I saw a unit - not sure it was functional - at NAB last week. Marketing VP Liz Murphy gave me the pitch, about 110 seconds of which you can watch here:


The yellow plastic on the drive is for display purposes. Note the nifty see-through media.

Target market
As befits a small company with an $18,000 holographic drive whose media is quantity 1 $180 a copy, InPhase has a sharp focus on people who need a 50 year archive life. Like film studios, whose film-based archives are bulky and subject to the vagaries of physical chemistry.

The media price is reasonable - compared to Blu-ray. NewEgg has TDK 25 GB blu-ray media for $17. 12x that - to get 300 GB - is $204. Plus the clutter. The burners are cheaper though.

Why did it take 8 years?
InPhase had to literally invent almost every piece of the system.

  • The optical media.
  • The manufacturing process for fabricating thick, optically-flat and high-dynamic range media.
  • The mathematics and circuitry needed to use digital camera CMOS chips for high-speed and high-accuracy image reconstruction.
  • A new method - polytopic multiplexing - for a 10x density increase.
  • Holographic mastering techniques for commercial reproduction.

For example, in order to use commercial, l.e. affordable, CMOS optical sensors to read the holograms, InPhase engineers had to do a deep dive (pdf) into optical information theory:

For holographic data storage it is advantageous to limit the spatial bandwidth of the object beam to only slightly higher than the Nyquist frequency of the data pattern. Typically an aperture in a Fourier plane is used to band limit the data beam (thereby also minimizing the size of the holograms in a Fourier-transform geometry). The data pattern may contain at most 1 cycle/2 data image pixels, so that the Nyquist frequency of the optical field of the object beam is minimally 1 sample/pixel. However, since the spectrum of the irradiance pattern is the auto-correlation of the spectrum of the optical field, the Nyquist frequency of the detectable signal is actually 2 linear samples/pixel minimum. Thus any method relying on less than 4 detector elements/data image pixel is operating in a sub-Nyquist regime where the Nyquist rate is defined with respect to the actual irradiance pattern impinging on the detector.

As Liz noted, you can’t hire experienced holographic storage engineers. InPhase has trained every one of them.

The StorageMojo take
Kudos to InPhase for a magnificent achievement. This is comparable to IBM’s original RAMAC disk effort back in 1957. They all deserve to get rich.

15 years ago a 3x CD reader cost a few hundred dollars. Perhaps in 15 years holographic burners will be $50 and the media less than a $1.

Learn more about the technology at the InPhase Technologies web site.

Comments welcome, of course. See a more accessible version of this article on my ZDnet blog, Storage Bits.

Xiotech’s ISE: beast or gamine?

April 13th, 2008 by Robin Harris in Architecture, Disk, Enterprise

What’s behind the hype?
Congrats to the Xiotech team on generating the most interest at SNW. Their demos were crowded with the curious. Their claims bordered on the implausible, but the credibility of the engineering team kept derision in the corners.

I talked to Ellen Lary, engineering VP, and Steve Sicola, CTO, as well as taping the very helpful Chad. Before going any further, let’s roll the 103 second - less if you skip the credits - tape:

How do they do it?
Darned if I know - they weren’t talking. Reading between the lines:

  • Systems thinking: each disk drive is more powerful than that 1980’s workhorse VAX 11-780 supermini. Put that intelligence to work!
  • Clean code: Xiotech has had free run of Seagate’s best thinking - so they’ve gotten rid of the firmware hairballs inside disk drives to create a distributed architecture where components cooperate in a trusted environment instead of competing. Their disks won’t work with your Brand X controller.
  • Spare no expense: the Xiotech team is going for the gold with a top-of-the-line resource-intensive architecture. If you have to ask how much it costs you can’t afford it.

With 350 IOPS per 15k FC drive claimed - and Sicola said more was coming - this is a lot of bang. When we see some pricing we’ll know about the bucks.

The value proposition
Xiotech’s bet is this: all is forgiven if it kicks butt 7×24 for 5 years. Each ISE is a storage utility writ small. With these building blocks, they promise, you can build an infrastructure whose availability and performance - still the storage ne plus ultra - will beat anything from EMC, IBM or HP.

A worthy goal, indeed.

The StorageMojo take
Just when EMC is assuming that Maui’s new Über-layer will win them the undying cashflow of multinationals, Xiotech comes along and exposes EMC’s feet of clay.

That sucking sound you hear is EMC emptying the datacenter’s coffers to run 7×23.999. If Xiotech can win even 10% of EMC’s business, they’ll be a $1 billion company sooner than they dreamed. And their VCs will be high-fiving in Aspen this winter.

NetApp, IBM and HP should worry as well. It sounded like Xiotech was OEM’ing the ISE to others - if so it makes sense to add them to the product line.

The disk-in-a-box model needed a thorough rethink and kudos to Xiotech for doing it. But many promising - on paper - products have failed. Once Xiotech is shipping and there is independent testing - then we’ll know what they’ve really got.

Comments welcome, of course. The indefatiguable Beth Pariseau homes in on the Atrato/Xiotech nexus.

SNW update - Xiotech’s ISE and the dilithium solution

April 9th, 2008 by Robin Harris in Architecture, Disk, Enterprise

It looks like Xiotech is going to cop the “Best Announcement at Spring SNW ‘08″ prize. See the nifty flash intro.

I did speak to Ellen Lary, Engineering VP last night after going through their mobbed booth. Later today I have an appointment with Steve Sicola, Xiotech’s CTO. I’ll have a more complete report later. Here’s what I’ve gleaned so far.

Remember Atrato?
Interesting stuff:

  • Sealed unit starting at 1.5 TB. They had a 1 PB system on display in 3 54 RU - i.e. bigger than you use - racks.
  • 5 year warranty and nifty blue LED light. Are we in a data center or a cocktail lounge?
  • Uses the draft T10 DIF (Data In Flight or Data Integrity Field, Data Integrity Feature - depending on where you read it - evidence that humans have a far greater problem with data integrity than computers do) standard to protect data within the array.
  • Uses Seagate’s own drive test software to attempt repairs on drives in place. Ellen said that about 70% of drives work normally after a power cycle.
  • If power cycling doesn’t work, the box can perform a complete reformat of the drive, starting with laying down tracks and proceeding on to what you and I consider “formatting”.
  • If a particular head is the problem, they can electrically disable that side of a platter while continuing to use the rest of the capacity of the drive.
  • It is cheaper to put in a couple of extra high-end drives than it is to make a service call. This won’t be true in China of course.

The best announcement that WASN’T made at Spring SNW
A company has figured out how to enable long distance synchronous replication. Here in America we like things big - including our idiots in Washington - and our disasters are no exception.

Hurricanes, earthquakes, volcanos, floods, blizzards, tornados and fires - and purblind ideologues - can lay waste to hundreds or thousands of square miles. So normal synchronous replication distances don’t cut it for gotta-have-it infrastructure.

The still-in-stealth-mode company’s Chief Engineer, Montgomery Scott, explained that by running dilithium crystals a little hot, a special hyperspace “tunnel” is created enabling . . . .

Just kidding. Their actual solution looked good in principle but the devil is in the details. I asked all the hard questions I could think of and they had answers for all of them, so it looks like they have something real.

Look for a fall announce.

The StorageMojo take
Those of you wondering if this year would be more of the same old, same old, fear not. The spirit and fact of invention is still strong in the ever-more-vital storage industry.

Comments welcome, of course. Would you use 1,000 mile synchronous replication if you could get it?

Dear Uncle StorageMojo: Datacore vs EqualLogic

March 31st, 2008 by Robin Harris in Architecture, Enterprise

The 2nd installment of an occasional feature . . .
A reader writes:

I think your input would be valuable in helping me make a decision on storage for my company. I’ve done loads of research and I’m fairly certain I have good players narrowed down, but have reservations about both. . . .

Players:
-Datacore SANMelody H/A solution on HP hardware.
-Equallogic PS3800XV

The app
It’s is an up-to-the-minute commercial application supporting virtual machines. The VM’s run proprietary messaging/transactional servers that spend 99% of their disk I/O time appending very small messages - ~300 bytes - to transaction logs.

Update: After the initial comments, the prefers-to-remain-anonymous reader (BTW, I did check him out and his company is for real) added this clarification:

  • Yes, there are DR and HA requirements.
  • Each VM has its own transaction logs that can grow to GBs in size. These transaction logs are not for archival purposes, rather to recover state in the event of an application restart
  • Traffic: Traffic will come in bursts and maximize at about 1500 iops between 10 separate hosts.
  • Reservations: Is Equallogic a “true” H/A solution considering it does not support synchronous replication between completely separate hardware? Are the competitors claims of Datacore’s “unprotected cache” well-founded? (Datacore insists in H/A mode that all cache is synchronously written and requires a commit from its H/A partner before committing to client.)
  • Storage size requirements are small, so I’ll pay for SAS performance over SATA terabytes.

End update.

Update II: The anonymous reader comes back with more crucial detail:

Let’s pretend the budget is around $60k-$70k. I know the two finalists can provide an acceptable degree of HA, DR, and iSCSI performance at that price. What products should one be looking at from HDS/EMC/NetApp? They were not considered initially for the perception of being unaffordable.

End update II.

Update III:

The plan is for an H/A setup in a class 1 datacenter with asynchronous replication over an existing DS3 (..but dark fiber is in the works) to a remote site.

All things considered, the question could be framed, “Whom/What should be demanded for trial?”

End update III.

The StorageMojo take
It is interesting that this customer is NOT looking at the traditional OLTP storage vendors. This is a business-critical application - the company is handling Other People’s Money.

What are the questions the reader should be asking of vendors? How should the problem be framed? I surmise that price is an issue. Where else might the reader go?

I welcome comment from vendors, but please do us the courtesy of identifying yourself as such.

Comments welcome, of course.

NetApp’s new name: NetApp

March 11th, 2008 by Robin Harris in Enterprise

NetApp has formally changed its name from Network Appliance Inc to NetApp Inc. The name change is part of a larger effort to raise their awareness among what they call the Storage 5000 - the 5000 largest storage customers worldwide.

I’m in NYC today and tomorrow at the annual NetApp analyst meeting, flown out and put up at NetApp’s expense, wined and dined, charmed by their capable director of analyst relations. It works - a warm feeling about NetApp suffuses every atom of my being. Don’t worry, it won’t last.

The StorageMojo take
Great technology poorly marketed loses to mediocre technology well-marketed almost every time.

NetApp is very serious about taking more market share, just as EMC is ramping up a strategy to take more NAS market share as well. The market also-rans will take most of the initial damage, but both companies mean to take a piece of the other.

Storage consumers will benefit the most.

Comments welcome, of course.

Flash futures

March 11th, 2008 by Robin Harris in Enterprise, Future Tech, SSD/Flash Disk

How flash is really going to affect the storage industry is becoming clear. The short take: not as big a deal as flash vendors hoped. The longer take: There won’t be much of a mid-range flash market; instead we’ll see either costly fast flash or cheap slow flash.

There are lots of theories about how flash will alter the mass storage landscape. This is mine.

The flash write problem
The fundamental flash problem is the slow writes. There are 3 elements to the slow write problem.

  • Flash has to be erased before it can be written. Every write operation is really 2 write operations.
  • The writes are large. Typical block sizes are 128KB to 256KB. Writing a single page requires writing - after erasing it first - the entire block.
  • The write bandwidth to a single block is less than a slow disk. High bandwidth writes requires parallel paths to multiple blocks.

These problems can all be engineered around.

  • Garbage collection-like algorithms can be extended to enable a supply of erased blocks
  • RAM backed by a small battery or capacitor can buffer writes for later re-writing to flash
  • Controller chips can be built in high volume with multiple data paths

But at what cost? The first two require well-engineered software and some sort of CPU to run it. Since it is software it will have bugs. Can it be any more reliable than current drive firmware?

The dilemma
For enterprise use, flash-based SSDs need to be rock-solid, which implies a lot of careful and costly engineering. For consumer use, they need to be very high volume, which means low-cost.

It is a similar problem to RAID controllers: very low-end RAID controllers aren’t reliable enough for enterprise use. They also aren’t cheap enough - or easy enough - for consumers to buy in volume. RAID controllers have engineering problems similar to flash translation layers.

Making flash drives look like disks makes them easy to integrate, but if you really need performance it also makes them costly - like the $10k for the flash drive EMC is using in the Sym.

Flash in the disk controller?
As I’m writing this a NetApp exec says that flash will be disruptive because by placing flash in a disk controller they will reduce the need for the costly and highly profitable fibre channel disks. That could be correct. It sounds smarter than sticking flash on a disk.

The StorageMojo take
Despite the miracles of cost-reduction and integration the industry regularly performs, some things, like power provisioning, don’t get cheaper. High-quality software engineering doesn’t either. That is what high-performance flash drives require.

The high-performance consumer flash drive seems to be a mirage. I’d like to be proven wrong, but today’s notebook SSDs don’t offer superior application performance and cost 10x as much. Hardly a recipe for success.

Update: Intel is planning to offer “high-performance” flash drives with partner Micron. I saw an impressive demo - is there any other kind? - at the Storage Visions conference. But with the early marketing missteps of Samsung, it looks like the consumer flash drive may fall off the hype cycle into a deep ditch. Flash drive marketers: now is the time for precision marketing if you ever hope to establish a mass market. Consumers remember unkept promises. Until you are cheaper. End update.

Comments welcome, as always. Also check out BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage by two Samsung researchers, Hyojun Kim and Seongjun Ahn for a nice intro to flash issues.

Flash talking - and a wee DRAM - with Texas Memory Systems

March 7th, 2008 by Robin Harris in Enterprise, SSD/Flash Disk

I ran into Woody Hutsell, EVP at Texas Memory Systems, last week. He graciously agreed to a talk on camera about their experience with flash and DRAM-based solid state storage.

TMS sells both: a DRAM-based SSD with multiple FC and Infiniband ports; and a 2 TB flash box with 128 GB of DRAM cache. Woody offered some interesting insights. For example, workloads with a large number of writes - even if they are a small percentage of the total workload - may not be suitable for flash-based storage.

Here’s the video:

Blame me for the shaky camera work.

Disclosure: I taped and edited this gratis.

Comments welcome, as always. BTW, Google now accepts files up to 1 GB. Seagate and WD should be happy.

Isilon update

March 7th, 2008 by Robin Harris in Enterprise

A week after writing What’s with Isilon they announced revised results for several past quarters.

In a long and boring press release the company discussed the impact. The money quote:

he Company estimates that $7.0 million of the approximately $67.4 million of previously reported revenue from the fourth quarter of 2006 through the second quarter of 2007 is expected to be adjusted. Approximately $3.0 million of the adjusted revenue will be recorded in periods subsequent to the second quarter of 2007. Approximately $2.1 million of the adjustment is not expected to be recorded as revenue, and approximately $1.9 million of the adjustment will be reversed and recorded as revenue only to the extent that products are sold through to end-user customers and collection is reasonably assured and all other criteria for the recognition of revenue are met.

It seems prior management got a wee bit aggressive with revenue recognition.

The StorageMojo take
Isilon’s value proposition is unchanged even though the stock price has been hammered. What we still need to see is what they’ve done in the last 9 months. It may not be safe to assume that all the accounting issues are closed.

The bigger issue is the underlying health of the company. If they are growing, winning new customers and keeping old ones, then the revenue games will not much matter. That is the next chapter in Isilon’s reporting. I’ll be very interested to read it.

Disclosure: I don’t own Isilon stock.

Comments welcome, as always.

EMC: shake, rattle and roll

March 6th, 2008 by Robin Harris in Enterprise, Future Tech

EMC looks to be single-handedly reinventing the industry. And creating a new one as well.

They just topped Fortune’s list of most admired computer peripherals companies, beating out NetApp.

They bought Pi Corporation, snagging Paul Maritz, longtime Microsoftie, and put him in charge of a new division focused on cloud computing.

And the soon-to-be-announced Hulk/Maui project will further roil the waters of a complacent industry. Unlike their usual secrecy about futures, EMC can’t shut up about Maui. That reflects a confidence that they’ve got something that can’t easily be replicated.

Note to erstwhile competitors: be afraid - be very afraid. The last time EMC was this fired up they rolled IBM out of their decades-old enterprise storage domination. And now they are a $12 billion company. There will be a lot of collateral damage.

Eat lunch or be lunch
18 months ago I was writing CEO Joe Tucci off, but I was wrong. He’s brought in a lot of new blood with a mandate to create change. EMC is hungry.

The StorageMojo take
EMC’s famously fractious product groups don’t like the emerging order. EMC isn’t going to be an array-centric company much longer. That is hard for the Sym and Clariion groups. Especially when their margin dollars are going to support new ventures that won’t all be successful.

EMC will continue selling arrays, though they won’t be core to the company’s message. They’ll be a stepping stone to more compelling services and products geared to global enterprises. Software that ties EMC arrays, virtual machines and the new cloud infrastructure products together will freeze out point products.

Those planning to continue with “faster, better, cheaper” competitive strategies will have to adjust. Starting now would be wise.

Comments welcome, as always.

Cleversafe’s dispersed storage network

I had a con call with Chris Gladwin and Russ Kennedy of Cleversafe a couple of weeks ago. They’ve come to market with a product line that seeks to deliver:

  • Massive scalability to meet growing digital content requirements
  • Unprecedented Security and Privacy for critical digital assets
  • Survivability against disasters, dishonesty and time
  • Extremely cost-effective infrastructure compared to traditional methods

That’s a quote from their pitch.

Cleversafe’s product line
Cleversafe, IIRC, started as a software company, but their announced products come in nice rack-mountable boxes. There are 3 of them:

  • CS Slicestor - Dispersed Storage server - $11.3k
  • CS Accesser - Dispersed Storage router - $12.3k
  • CS Manager - Dispersed Storage network manager - $12.3k

The Slicestor is a 1U storage server containing 4 disks. The Accessor slices up the data and distributes it - think slice router. The Manager works out of band to monitor and manage the storage network components.

I assume the pricing includes some room for volume discounts. There is an open-source version (c. 2006) of the software. The company intends to offer a software-only version as well.

Why hardware?
The Conventional Wisdom in VC circles is that tin-wrapped software ramps revenues faster - hey, you’re selling tin + bits - at the cost of lower margins and loss of focus.

Qualifying hardware is non-trivial; so you tend to stay on one platform longer than you should. At liquidity event time, software companies fetch higher multiples, so it may be a net loss. VCs live by the Golden Rule: he who has the gold makes the rules.

What it does
Cleversafe has an iSCSI or block storage interface. It takes the data, slices it into small pieces using Information Dispersal Algorithms and then ships the slices off to storage either locally or around the world.

In the latest version you can specify how many slices the system makes and how many slices are required to rebuild the data. If you have 11 data centers around the world, you can specify that, say, 6 are required to recreate the data.

You could lose access to 5 data centers and still recover. If the local controlling authority busts into 3 or 4 data centers, they get nothing. Pretty cool if you worry about corrupt government officials getting hold of your company secrets.

The company is planning on adding FTP, CIFS and NFS in the fullness of time.

How well it works
Cleversafe claims that given sufficient low-latency bandwidth the dispersed storage is as fast as a local disk. That’s a tall order, but for now I’ll take their word for it.

Who should buy it?
The company is aiming the Dispersed Storage Network at ISPs to offer as a service and multinationals with round the clock operations and critical data.

How it works
Cleversafe uses Cauchy Reed Solomon erasure codes to slice and dice the data. These codes have several advantages:

  • More capacity efficient and failure tolerant than parity codes
  • Doesn’t require a license
  • Code and decode are faster than other stack operations

If you’d like to play with Cauchy Reed Solomon, check out Dr. Jim Plank’s software page which includes

. . . Reed-Solomon coding, Cauchy Reed-Solomon coding, general bit-matrix coding, Reed-Solomon coding optimized for RAID-6, and Liberation coding. The documentation provides some tutorial material on matrix and bit-matrix based erasure coding.

I met the good doctor at FAST, where he was delighted to find that Clevesafe - also a FAST presenter - was using techniques he’d worked on a decade ago.

The StorageMojo take
I’m impressed with what Cleversafe has done. They will look even smarter after EMC’s Hulk/Maui announcement this spring. I suspect they’ll be bought by year’s end.

Kudos to the Cleversafe team.

Comments welcome, of course.

Why do storage systems fail?

February 24th, 2008 by Robin Harris in Architecture, Disk, Enterprise

It’s the disks, right?
We’ve heard much about disk failures - as recently as last week as well as last year’s reports from Google and CMU. But what about the rest of the system?

In a FAST ‘08 paper to be presented this week - Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics - authors Weihang Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky analyze logs from 39,000 systems over 44 months to get answers.

1.8 million disks in 155,000 shelves
NetApp provided data from a variety of systems, including near-line, low-end, mid-range and high-end arrays. The team analyzed the log reports to understand what components led to failures.

The 15 page paper offers some interesting findings

  • Physical interconnect failures are a significant contributor - anywhere from 27-68% - of storage subsystem failures.
  • Subsystem failure rates that use the same disk models show similar disk failure rates - but the subsystem failure rates vary significantly.
  • Enclosures have a strong impact on subsystem failures. Some enclosures work better with some drives than others.
  • Dual-redundant FC shelf interconnects reduce annual failure rates 30-40%.
  • Interconnect and protocol failure rates are much more bursty than disk failures. Some 48% of overall subsystem failure arrive at the same shelf within 10,000 seconds (~ 3 hours) of the previous failure.
  • As interconnect failures are so bursty, resilience mechanisms beyond RAID are required to achieve subsystem availability.

What else?
They also found that enterprise drives had an AFR consistent with manufacturer specs - less than 1% AFR. This result derives from looking at the disks as the system does rather than as users see them.

The StorageMojo take
Interconnects, especially connectors, have long been fingered as a significant cause of the equipment problems - and not just in storage. While the team seems to report that interconnects are a greater cause of subsystem failure than disks, there seems to be some room for disagreement about what the numbers are telling us.

For example, this result doesn’t fully explain the delta between what disk users have found and the “trouble not found” rates that manufacturers report. Even if you accept the common 50% TNF vendors report, drive failures are still higher than this research finds.

Perhaps we should conclude that NetApp’s engineering is higher quality than the general run of storage arrays. Or perhaps system log analysis is still a dark art whose results are more indicative than conclusive.

Comments welcome, as always. I’m at the FAST ‘08 conference this week in the San Jose Fairmont hotel.

Latent sector errors in disk drives

February 18th, 2008 by Robin Harris in Disk, Enterprise

Last year’s Google and CMU papers on disk failure rates (see Everything you know about disks is wrong and Google’s Disk Failure Experience) made the points that a) annual disk failure rates are significantly higher than manufacturers admit and b) that enterprise drives aren’t more reliable than consumer drives.

But in An Analysis of Latent Sector Errors in Disk Drives Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy and Jiri Schindler analyzed the error logs on over 50,000 arrays covering 1.53 million enterprise and consumer drives disks. It looks like the largest such study ever published.

Lakshmi was with the U of Wisconsin-Madison while the latter 3 work at NetApp. They published at the Sigmetrics ‘07 conference last June.

A different kind of latency
Unreported or latent disk errors are real. That’s why vendors have stopped recommending RAID 5 on SATA drives.

Disks have a lot of errors, most of them transient. This study focused on Latent Sector Errors (LSE), defined as:

. . . when a particular disk sector cannot be read or written, or when there is an uncorrectable ECC error. Any data previously stored in the sector is lost.

They don’t say so explicitly, but these are surely NetApp arrays. They also comment on the effectiveness of media and disk scrubbing, a feature of high-end arrays.

Results

  • Yes, there are “bad” disks: 0.2% of the drives had more than 1000 errors.
  • 3.45% of the entire population had LSE over the 32 month study period.
  • 8.5% of the consumer disks had LSE
  • 1.9% of the enterprise disks had LSE
  • In their first 12 months 3.15% of consumer and 1.46% develop at least one LSE

Causation
The team found several factors that contribute to LSE.

  • Size matters. As disk size increases, so does the fraction of disks with LSE.
  • Age matters. LSE rates climbed with age. 20% of some - but not all - consumer disks had LSE after 24 months. Rates climbed faster for consumer drives than for enterprise drives.
  • Vendor matters. They also found that some vendors had much higher LSE than others. Due to the industry omerta they don’t rat out the offenders.
  • Errors matter. A drive that develops one error is much more likely to develop a second. The second error is likely to be close to the first error. Once a drive develops an error, both enterprise and consumer drives are equally likely to develop a 2nd error.

Annual sector error rates
This figure from the paper indicates the variability in age-related error rates


The caption states:

For each disk model that has been in the field for at least two years, the first bar represents Year 1 and the second represents Year 2. The NL and ES bars represent weighted averages for nearline and enterprise class drives respectively.

Consumer/SOHO users with large, cheap, old disks will see LSE. Another reason Desktop RAID is a bad idea. Not many consumers replace their drives every 24 months.

File system implications
File systems rely on disk-based data structures to keep track of your stuff. One of the key findings of the team is that disk errors tend to congregate near each other, like congressmen and lobbyists.

Therefore, file systems that replicate critical data across the disk are much less likely to lose your data than those, like ReiserFS, place critical structures in one contiguous area. Related issue: since disks virtualize the block structure, how do FS designers know where their data structures actually go on disk?

Media and data scrubbing
What’s the difference?

Media scrubs use a SCSI Verify command to validate a disk sector’s integrity. This command performs an ECC check of the sector’s content from within the disk without transferring data to the storage layer. On failure, the command returns a latent sector error.

While

A data scrub is primarily used to detect data corruption. This scrub issues read operations for each disk sector, computes a checksum over its data, compares the checksum to the on-disk 8-byte checksum, and reconstructs the sector from other disks in the RAID group if the checksum comparison fails. Latent sector errors discovered by data scrubs appear as read errors.

In the analyzed drives over 60% of LSE were found by scrubbing. Scrubbing is a high-end feature that works.

The StorageMojo take
The consistency of LSE as disk capacity increased suggests that there is a constant head/media issue. Since consumer drives are larger than enterprise drives, part of the higher LSE rate is explicable.

The higher LSE rate increase for aging consumer drives suggests that enterprise drives are higher quality. Or maybe their error correction is better.

Finally, drive vendors need to re-think their ECC strategies. As capacities increase so will LSE. Higher quality ECC comes at the cost of capacity. It is time to start paying that price.

Comments welcome, of course. Download the article pdf here.

What was Ray Ozzie thinking?

February 2nd, 2008 by Robin Harris in Enterprise, Future Tech

I wrote a first pass on the Microsoft/Yahoo for ZDnet yesterday morning. Short version: are they nuts?

The silliest comment
Ray Ozzie was quoted saying:

Our lives, our businesses, and even our society have been progressively transformed by the Web, and Yahoo! has played a pioneering role by building compelling, high-scale services and infrastructure,” said Ray Ozzie, chief software architect at Microsoft. “The combination of these two great teams would enable us to jointly deliver a broad range of new experiences to our customers that neither of us would have achieved on our own.

I agree about the compelling services. Yahoo has a number of market-leading services, starting with mail.

High-scale infrastructure?
I don’t think so. Very conservatively Yahoo’s infrastructure costs are 3x Google’s. Probably 8-10x.

By all accounts Mr. Ozzie is a brilliant fellow. So why the silly comment? A few possibilities come to mind:

  • PR flacks wrote the comment for him and he was too busy to review it.
  • MS investor relations wrote the comment to try to paper over the fact that there is no technology synergy in the acquisition, figuring that Wall St. analysts wouldn’t know the difference.
  • He actually believes it. They are so-o doomed!

Other than IBM, Microsoft Research probably has the most brilliant CompSci group in the industry - and that includes Google. They can’t solve problems?

What is the real problem?
BillG and Steve Ballmer were out of new ideas - or good ideas they could easily copy - after Windows 3.1 and Office. The illegal strangulation of Netscape has cost Msoft billions in penalties and still, 10 years after, IE is losing market share. Gee, maybe the browser wasn’t important after all!

It also looks like Microsoft avoids the kind of clean sheet design that gave Google its cost advantage. You must use Windows. You must use Dell. You must use CIFS. Who knows what self-sabotaging corporate injunctions are stifling Microsoft developers? Because they sure have the smarts. And the money.

The StorageMojo take
Microsoft has to stop chasing the latest Big New Thing - be it game consoles, music players, web portals or Internet advertising - and start focusing on new opportunities that they are uniquely positioned to exploit.

For example, how about migrating web-scale technology down to the enterprise? Storage companies are using Linux to create commercial storage clusters like Google’s. Why isn’t Microsoft building Boxwood-style cluster software to help enterprises lower their storage TCO? Take advantage of the Microsoft army of admins and resellers to move the concept and further entrench Windows.

And that’s far from the only opportunity.

Instead Ballmer et. al. seem obsessed with fighting wars they’ve already lost against Apple, Google and Linux (see Farewell, Bill. Yo, Ballmer, now it’s your turn! on ZDnet). Even the richest and most powerful nation software company on earth has limits and should pick its fights.

Comments welcome, of course.

A stroll down memory lane: HP’s 2004 storage grid vision

January 24th, 2008 by Robin Harris in Architecture, Clusters, Enterprise

What could be more appropriate?
HP’s lackluster showing in The Info Pro’s survey - see yesterday’s post - reminded me that I’d written about HP shortly after after I’d started blogging (see HP’s Storage Grid-lock: panic-stricken execs promise fix in four years).

Here’s a quote from the September 2004 post:

HP’s storage shortfall this past quarter has Carly pulling out the stops to save one of HP’s few high-margin non-printer businesses. Firing executives, check. Inspirational speech to resellers, check. Roll out ambitious product roadmap for delivery in 2008, check.

2008!?! Why not just put out a press release titled ” HP execs panic over storage shortfall, have no clue how to fix business in less than four years”? Then, at least, they could start facing up to their real problems.

Issuing long-range roadmaps is always a move of pure desperation, so the HP storage business must be considerably weaker than they have let on.

So what were they going to deliver this year?
Get this wacky idea: a storage grid. As an HP whitepaper described it:

The HP StorageWorks Grid will be built from intelligent building blocks called smart cells. These elements will incorporate commodity hardware. In addition to the expected control and storage hardware, smart cells will incorporate a flexible operating environment that allows storage functions to be downloaded as needed—and allows smart cells to be repurposed if necessary.

While the 4 year timeline was laughable the actual proposal was smart and forward-looking. The technology was there. But the paper didn’t address the business dynamics that would enable HP to bring the project to fruition. In other words, the marketing problem.

Clearly no one else did either.

The StorageMojo take
Today clustered storage is the obvious successor to big-iron arrays. When HP wrote the paper it wasn’t. If they’d followed through they’d be the thought leaders in the industry and well-positioned to take commercial leadership as well.

As I noted then,

But HP’s roots as a device company have always made the systems approach [. . .] difficult to execute. Even though HP bought a strong storage business with Compaq (which won it when they bought DEC) and the StorageWorks line, the feckless storage mavens in Palo Alto (best idea ever: “Let’s introduce EMC into all our top corporate accounts! It’s cheap, easy and with no engineering its all profit!”) have managed to run it into the ground.

HP should take a leaf from Sun’s storage strategy: move storage software from tin-wrapped boxes into the OS group where commodity servers and networks can provide cost-effective infrastructure.

HP had a great idea 4 years ago. Now it’s time to deliver.

Comments welcome, of course.



Next Article »
StorageMojo RSS Feed May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006 October 2006