StorageMojo




Robin Harris    


HGST getting ready to rumble

April 29th, 2008 by Robin Harris in Disk

I got quoted in Byte & Switch today about Hitachi Global Storage Technology and the new CIO. HGST has been a money pit for Hitachi since they bought the IBM disk operation.

They question is: are they ready to do something about it? The answer is yes.

An informant assures me that HGST has created Raj Das - late of SGI - the new SVP of Marketing.

How many psychiatrists does it take to change a lightbulb?
Raj and I worked together at Sun, where he was one of the few results-oriented, damn-the-torpedos marketing guys. He’s high energy and creative.

Turning around Hitachi marketing is going to take everything he’s got. Disk companies are not only engineering dominated - the engineers are even more anti-marketing than most. Add in the culture clash of two proud companies and, well, it isn’t good.

The engineers need to understand one thing. Until the Hitachi GST brand means something positive to consumers - at Fry’s and at datacenters around the world - the company won’t be able to justify an extra nickel of margin. Without that, profitability will remain a mirage.

One, but the lightbulb has to really want to change
I know Raj and I know what he can do. Will the guys across the pond let him do it?

The StorageMojo take
Disk vendors mostly compete on price. HGST has an opportunity to change this by re-thinking the disk value proposition - and the communication of it. The industry is at several inflection points.

Here’s hoping HGST can seize at least one of them. More competition will be good for all of us.

Comments welcome, of course. You can see Raj on the SGI video from a month ago below.

Holographic storage debuts next month

April 20th, 2008 by Robin Harris in Disk, Enterprise, Future Tech

After 8 years of hard slogging the folks at InPhase are ready to ship the world’s first holographic storage system.

As StorageMojo noted 2 years ago:

InPhase is claiming they will ship drives with removable holographic disks with 300GB capacity and 20Mbps transfer rate later this year.

I love holographic technology and wish InPhase the best, but I don’t believe they have a viable business with their technology - yet. The problem: 3.5″ disk drives will reach 750GB by the end of this year with much faster transfer rates. InPhase’s 20 Mbps is only 2.5 million bytes per second or only 9GB per hour. It will take over 30 hours just to fill one disk! I predict that hard drives will still be more convenient and fairly cost-competitive than this promising new technology.

But keep at it guys. Lightning will strike if your investors are patient enough.

So what’s different now? They’re saying they will ship next month instead of “later.” The transfer rate is 20 MB/sec. And the media archive life is 50 years - higher density and longer life than tape.

Limited availability until fall
I saw a unit - not sure it was functional - at NAB last week. Marketing VP Liz Murphy gave me the pitch, about 110 seconds of which you can watch here:


The yellow plastic on the drive is for display purposes. Note the nifty see-through media.

Target market
As befits a small company with an $18,000 holographic drive whose media is quantity 1 $180 a copy, InPhase has a sharp focus on people who need a 50 year archive life. Like film studios, whose film-based archives are bulky and subject to the vagaries of physical chemistry.

The media price is reasonable - compared to Blu-ray. NewEgg has TDK 25 GB blu-ray media for $17. 12x that - to get 300 GB - is $204. Plus the clutter. The burners are cheaper though.

Why did it take 8 years?
InPhase had to literally invent almost every piece of the system.

  • The optical media.
  • The manufacturing process for fabricating thick, optically-flat and high-dynamic range media.
  • The mathematics and circuitry needed to use digital camera CMOS chips for high-speed and high-accuracy image reconstruction.
  • A new method - polytopic multiplexing - for a 10x density increase.
  • Holographic mastering techniques for commercial reproduction.

For example, in order to use commercial, l.e. affordable, CMOS optical sensors to read the holograms, InPhase engineers had to do a deep dive (pdf) into optical information theory:

For holographic data storage it is advantageous to limit the spatial bandwidth of the object beam to only slightly higher than the Nyquist frequency of the data pattern. Typically an aperture in a Fourier plane is used to band limit the data beam (thereby also minimizing the size of the holograms in a Fourier-transform geometry). The data pattern may contain at most 1 cycle/2 data image pixels, so that the Nyquist frequency of the optical field of the object beam is minimally 1 sample/pixel. However, since the spectrum of the irradiance pattern is the auto-correlation of the spectrum of the optical field, the Nyquist frequency of the detectable signal is actually 2 linear samples/pixel minimum. Thus any method relying on less than 4 detector elements/data image pixel is operating in a sub-Nyquist regime where the Nyquist rate is defined with respect to the actual irradiance pattern impinging on the detector.

As Liz noted, you can’t hire experienced holographic storage engineers. InPhase has trained every one of them.

The StorageMojo take
Kudos to InPhase for a magnificent achievement. This is comparable to IBM’s original RAMAC disk effort back in 1957. They all deserve to get rich.

15 years ago a 3x CD reader cost a few hundred dollars. Perhaps in 15 years holographic burners will be $50 and the media less than a $1.

Learn more about the technology at the InPhase Technologies web site.

Comments welcome, of course. See a more accessible version of this article on my ZDnet blog, Storage Bits.

Xiotech’s ISE: beast or gamine?

April 13th, 2008 by Robin Harris in Architecture, Disk, Enterprise

What’s behind the hype?
Congrats to the Xiotech team on generating the most interest at SNW. Their demos were crowded with the curious. Their claims bordered on the implausible, but the credibility of the engineering team kept derision in the corners.

I talked to Ellen Lary, engineering VP, and Steve Sicola, CTO, as well as taping the very helpful Chad. Before going any further, let’s roll the 103 second - less if you skip the credits - tape:

How do they do it?
Darned if I know - they weren’t talking. Reading between the lines:

  • Systems thinking: each disk drive is more powerful than that 1980’s workhorse VAX 11-780 supermini. Put that intelligence to work!
  • Clean code: Xiotech has had free run of Seagate’s best thinking - so they’ve gotten rid of the firmware hairballs inside disk drives to create a distributed architecture where components cooperate in a trusted environment instead of competing. Their disks won’t work with your Brand X controller.
  • Spare no expense: the Xiotech team is going for the gold with a top-of-the-line resource-intensive architecture. If you have to ask how much it costs you can’t afford it.

With 350 IOPS per 15k FC drive claimed - and Sicola said more was coming - this is a lot of bang. When we see some pricing we’ll know about the bucks.

The value proposition
Xiotech’s bet is this: all is forgiven if it kicks butt 7×24 for 5 years. Each ISE is a storage utility writ small. With these building blocks, they promise, you can build an infrastructure whose availability and performance - still the storage ne plus ultra - will beat anything from EMC, IBM or HP.

A worthy goal, indeed.

The StorageMojo take
Just when EMC is assuming that Maui’s new Über-layer will win them the undying cashflow of multinationals, Xiotech comes along and exposes EMC’s feet of clay.

That sucking sound you hear is EMC emptying the datacenter’s coffers to run 7×23.999. If Xiotech can win even 10% of EMC’s business, they’ll be a $1 billion company sooner than they dreamed. And their VCs will be high-fiving in Aspen this winter.

NetApp, IBM and HP should worry as well. It sounded like Xiotech was OEM’ing the ISE to others - if so it makes sense to add them to the product line.

The disk-in-a-box model needed a thorough rethink and kudos to Xiotech for doing it. But many promising - on paper - products have failed. Once Xiotech is shipping and there is independent testing - then we’ll know what they’ve really got.

Comments welcome, of course. The indefatiguable Beth Pariseau homes in on the Atrato/Xiotech nexus.

SNW update - Xiotech’s ISE and the dilithium solution

April 9th, 2008 by Robin Harris in Architecture, Disk, Enterprise

It looks like Xiotech is going to cop the “Best Announcement at Spring SNW ‘08″ prize. See the nifty flash intro.

I did speak to Ellen Lary, Engineering VP last night after going through their mobbed booth. Later today I have an appointment with Steve Sicola, Xiotech’s CTO. I’ll have a more complete report later. Here’s what I’ve gleaned so far.

Remember Atrato?
Interesting stuff:

  • Sealed unit starting at 1.5 TB. They had a 1 PB system on display in 3 54 RU - i.e. bigger than you use - racks.
  • 5 year warranty and nifty blue LED light. Are we in a data center or a cocktail lounge?
  • Uses the draft T10 DIF (Data In Flight or Data Integrity Field, Data Integrity Feature - depending on where you read it - evidence that humans have a far greater problem with data integrity than computers do) standard to protect data within the array.
  • Uses Seagate’s own drive test software to attempt repairs on drives in place. Ellen said that about 70% of drives work normally after a power cycle.
  • If power cycling doesn’t work, the box can perform a complete reformat of the drive, starting with laying down tracks and proceeding on to what you and I consider “formatting”.
  • If a particular head is the problem, they can electrically disable that side of a platter while continuing to use the rest of the capacity of the drive.
  • It is cheaper to put in a couple of extra high-end drives than it is to make a service call. This won’t be true in China of course.

The best announcement that WASN’T made at Spring SNW
A company has figured out how to enable long distance synchronous replication. Here in America we like things big - including our idiots in Washington - and our disasters are no exception.

Hurricanes, earthquakes, volcanos, floods, blizzards, tornados and fires - and purblind ideologues - can lay waste to hundreds or thousands of square miles. So normal synchronous replication distances don’t cut it for gotta-have-it infrastructure.

The still-in-stealth-mode company’s Chief Engineer, Montgomery Scott, explained that by running dilithium crystals a little hot, a special hyperspace “tunnel” is created enabling . . . .

Just kidding. Their actual solution looked good in principle but the devil is in the details. I asked all the hard questions I could think of and they had answers for all of them, so it looks like they have something real.

Look for a fall announce.

The StorageMojo take
Those of you wondering if this year would be more of the same old, same old, fear not. The spirit and fact of invention is still strong in the ever-more-vital storage industry.

Comments welcome, of course. Would you use 1,000 mile synchronous replication if you could get it?

Atrato disk array goes public

March 28th, 2008 by Robin Harris in Architecture, Disk, Future Tech

6 weeks ago StorageMojo covered the leaving-stealth-mode non-announce of Atrato’s new storage box. I spoke to Dan McCormick, Atrato’s co-founder and CEO a few days ago for an update.

They’ll have more details at SNW. But here’s what I found interesting.

Density and capacity
The new Atrato box is 3U, not 5, and has about 200 2.5″ drives, for 50 TB raw. With the new 500 GB 2.5s coming out they’ll be able to do 100 TB.

That blows away the density of EMC’s soon-to-be-announced Hulk box. And with the declining delta between 3.5″ and 2.5″ drive capacities, the Atrato box should increase their capacity per rack unit lead.

Performance
In a refreshing change from normal industry practice Atrato quotes IOPS to disk, not cache. Thus their quoted 10,000 IOPS is a real-life number. Dan said that one user got up to 20,000 IOPS after tuning their app.

Apps with big files and large I/Os need disk I/Os, not cache I/Os. Most controllers turn off cache when they see large I/Os anyway. Quoting cache IOPS to their market would be a mistake.

Power
Atrato claims an 80% reduction in power per I/O. 80% of that is due to the power efficiency of 2.5″ drives. The remaining third though is their own special sauce.

Virtual drive hospital
When a drive starts acting up - and with 200 drives that doesn’t take very long - their software “pulls” the drive and tests it. If the drive is failing they leave it alone, but Atrato has found that over half the problem drives can be put back into service.

The StorageMojo take
Still cool. An interesting metric will be uptake into space and power constrained enterprise data centers. If power really is an issue - and while I’m sure it is at some level, the priority is the question - I’d expect to see all the big NYC data centers testing these things within 90 days.

Comments welcome, of course. Dan also commented that StorageMojo’s original Atrato post was the best researched and most insightful of all the reportage they saw. Flattery works.

StorageMojo’s favorite FAST 08 paper

March 14th, 2008 by Robin Harris in Architecture, Backup, Disk

It didn’t win Best Paper honors at FAST 08 - IIRC it was An Analysis of Latent Sector Errors in Disk Drives (the link is to the StorageMojo review of that excellent paper last month) but I really like the thinking behind Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage.

Written by Mark W. Storer, Kevin M. Greenan, Ethan L. Miller (UC Santa Cruz) and Kaladhar Voruganti (NetApp) the paper discusses a prototype that

. . . is a distributed network of intelligent, disk-based, storage appliances that stores data reliably and energy-efficiently. While existing MAID systems keep disks idle to save energy, Pergamum adds NVRAM at each node to store data signa- tures, metadata, and other small items, allowing deferred writes, metadata requests and inter-disk data verification to be performed while the disk is powered off.

They call the appliances tomes.

Tape: where data goes to die
One of tape’s big advantages is that it uses no power at rest. Any disk-based tape replacement will have to come as close to the same ideal.

The tomes use a single hard drive, an ARM-based processor board with NIC and NVRAM. Total power use - when powered up - about 11.5 watts, less than 15k FC drive. With tighter code, a slower drive and more integration, I’d bet they could cut that in half.

The single disk drive means that tomes must be used in groups to enable distributed RAID techniques and exchange of algebraic signatures to ensure inter-disk recovery. The paper goes into those techniques in detail.

NVRAM

The purpose of the NVRAM is to provide low-power, persistent storage; operations such as metadata searches and signature requests do not require the unit’s drive to be spun up.

. . . the NVRAM primarily holds metadata such as algebraic signatures and index information, flash writes are relatively rare; flash writes coincide with disk writes.

The Ethernet interconnect is important - by using cheap unmanaged switches for fan out, high aggregate bandwidth, exceeding that of current tape libraries, is easily and inexpensively achieved. The use of power-over-Ethernet would further reduce costs, especially if the system used 4200 RPM drives.

The StorageMojo take
Most of the disk vs tape discussions look at the disk device vs tape cartridge cost issue - and they aren’t that different even today. But the tape library market is a $4-5 billion market. A disk-based alternative to slow tape libraries could take a big chunk of that.

Further, this design could be integrated into a single disk controller board, creating a disk with a single Ethernet port and incredible packaging and manufacturing economies.

If Seagate were smart they’d jump on this. This is a major opportunity to drive another significant consumer of disk drive units - without encroaching on existing OEM customer businesses. That doesn’t happen very often.

Comments welcome, as always. Pergamum was an ancient Greek city known for its sizable library, second only to the library of Alexandria.

NetApp’s research offensive

February 26th, 2008 by Robin Harris in Architecture, Disk

After last year’s publication of the Google and CMU papers on the much-higher-than-expected annual failure rates of disk drives, StorageMojo challenged vendors to respond.

I said

The industry has an excellent opportunity to move to greater transparency with storage consumers. Sometimes relationships need a jolt to remind everyone just how much we rely upon each other. Storage is a vital industry with the responsibility to protect and access an ever increasing fraction of mankind’s data. Customers want the best tools for the job. It appears the industry hasn’t been providing them, at least for disk drives. I know some efforts are underway in IDEMA to improve the quality of the numbers. I’d get serious about ensuring that the revised processes actually benefit customers rather than soothing corporate egos. Otherwise this situation will arise again.

Further, the need to engage at a more personal level is a predictable outcome of the continuing consumerization of IT. This is an example of the new normal. Embrace it.

Working through the weekend, NetApp’s Val Bercovici did. IBM did so a little later. EMC said semi-nothing.

Two weeks later a not-very-bright EMC’er sent an EMC lawyer to shut StorageMojo up. Some people are so-o-o sensitive.

FAST forward
This week at FAST (File and Storage Technologies ‘08) a group of research papers respond to the Google and CMU work. In Parity Lost and Parity Regained, Are Disks the Dominant Contributor for Storage Failures?, An Analysis of Latent Sector Errors in Disk Drives and An Analysis of Data Corruption in the Storage Stack NetApp researchers working with academics including Bianca Schroeder - one of the authors of the CMU paper - and Andrea and Remzi Arpaci-Dusseau, of the University of Wisconsin, produced a series of papers examining the state of the art in data storage.

Often using NetApp’s AutoSupport data base, the papers delve into knotty problems in array architecture and component behavior. With the advantage of large sample sizes the papers see further into statistically uncommon events.

For example An Analysis of Data Corruption in the Storage Stack looked at over 1.5 million disks on more than 40,000 systems over 41 months. Those numbers dwarf the combined samples of the Google and CMU teams.

Some surprising results
The cynical, myself among them, might be tempted to dismiss the work as exercise in self-justification. The studies find disk scrubbing useful in eliminating silent data corruption, a result any half-awake SE will use to their advantage.

But in Parity Lost and Parity Regained - nice Milton reference! - they also found that disk scrubbing could spread an error - parity pollution - across multiple disks. In fact,

. . . the tendency of scrubs to pollute parity increases the chances of data loss when only one error occurs.

This is honest research, following the data where ever it goes. It is the difference between science and spin.

The StorageMojo take
NetApp’s research offensive is commendable. While IBM, HP and Microsoft maintain large research groups and publish regularly, they are many times NetApp’s size.

It is also smart marketing. NetApp’s research gives them a ready entree to corporate system architects and technical opinion leaders with a fresh and data-heavy perspective on IT risk management.

NetApp is to be congratulated for the work they’ve done. By participating in the conversation they advance the state of the art and their stature with customers. The former is good for the industry and both are good for NetApp.

Update: A commenter requested links to the papers. They aren’t all freely available on line yet. Here are the two I found online. Download the pdf for Parity Lost and Parity Regained, An Analysis of Data Corruption in the Storage Stack.

Update 2: Prof. Peter Honeyman of CITI wrote in to let us know that the FAST papers are available here. Thanks Doc.

Comments welcome, of course.

Why do storage systems fail?

February 24th, 2008 by Robin Harris in Architecture, Disk, Enterprise

It’s the disks, right?
We’ve heard much about disk failures - as recently as last week as well as last year’s reports from Google and CMU. But what about the rest of the system?

In a FAST ‘08 paper to be presented this week - Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics - authors Weihang Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky analyze logs from 39,000 systems over 44 months to get answers.

1.8 million disks in 155,000 shelves
NetApp provided data from a variety of systems, including near-line, low-end, mid-range and high-end arrays. The team analyzed the log reports to understand what components led to failures.

The 15 page paper offers some interesting findings

  • Physical interconnect failures are a significant contributor - anywhere from 27-68% - of storage subsystem failures.
  • Subsystem failure rates that use the same disk models show similar disk failure rates - but the subsystem failure rates vary significantly.
  • Enclosures have a strong impact on subsystem failures. Some enclosures work better with some drives than others.
  • Dual-redundant FC shelf interconnects reduce annual failure rates 30-40%.
  • Interconnect and protocol failure rates are much more bursty than disk failures. Some 48% of overall subsystem failure arrive at the same shelf within 10,000 seconds (~ 3 hours) of the previous failure.
  • As interconnect failures are so bursty, resilience mechanisms beyond RAID are required to achieve subsystem availability.

What else?
They also found that enterprise drives had an AFR consistent with manufacturer specs - less than 1% AFR. This result derives from looking at the disks as the system does rather than as users see them.

The StorageMojo take
Interconnects, especially connectors, have long been fingered as a significant cause of the equipment problems - and not just in storage. While the team seems to report that interconnects are a greater cause of subsystem failure than disks, there seems to be some room for disagreement about what the numbers are telling us.

For example, this result doesn’t fully explain the delta between what disk users have found and the “trouble not found” rates that manufacturers report. Even if you accept the common 50% TNF vendors report, drive failures are still higher than this research finds.

Perhaps we should conclude that NetApp’s engineering is higher quality than the general run of storage arrays. Or perhaps system log analysis is still a dark art whose results are more indicative than conclusive.

Comments welcome, as always. I’m at the FAST ‘08 conference this week in the San Jose Fairmont hotel.

Apple’s Xserve RAID bites the dust

February 19th, 2008 by Robin Harris in Disk, SAN, FC

StorageMojo reported last June 19th a rumor that Apple’s Xserve RAID would bite the dust. And now, exactly 8 months later, they’ve pulled the plug.

I saw a wall of Xserves and Xserve RAIDs at NAB last year and they were, without a doubt, the prettiest server/storage combo in the world. Brushed stainless steel, blue LEDs and the symmetrical installation looked like Hollywood’s idea of a computer. (Although the server room in Live Free or Die Hard is even crazier.)

Replaced by the Promise Vtrak
Not as pretty but more functional. The Xserve RAID didn’t have dual-redundant active/active controllers with failover, so users had to rely on software mirroring. An OK solution, but not a great one.

Xserve RAID’s big advantage, other than great looks, was price. A quarter the price of other FC RAID kit.

But with the Promise Vtrak arrays, Apple can now quote $1.12 per GB in 26 TB chunks. Pretty good! On a 4 Gbit FC backbone, they can deliver 6 streams of 8-bit uncompressed HD video. Pretty fast!

The Promise kit is fully redundant with hot-swap components. Not the sort of thing that Apple should spend money engineering. And it looks like it is packaged in a nice Xyratex enclosure, the standard of the industry.

Update: One commenter assures us that Promise doesn’t use Xyratex enclosures. I guess there are just so many ways to stick 16 drives into a 3U 19″ rack.

There also seems to be some angst over the apparent outsourcing to Promise as opposed to the Apple label Xserve RAID. Make no mistake, Apple outsourced the Xserver RAID as well to someone who did Apple’s industrial design. With Promise they are just making that apparent, probably because they get a better deal. But you still buy it from the Apple store, not Promise.

As an aside, Steve Jobs has many fine qualities, but his appreciation for how storage can extend Apple’s business is on a par with Scott McNealy’s - i.e. clueless. So it goes. End update.

The StorageMojo take
This move strengthens Apple’s thrust into professional video production and film editing. Their software-only competitors should be sweating, since Apple keeps throwing more functionality into Final Cut Studio, like Color, for very competitive prices.

With the release of Final Cut Server, expected shortly, Apple will have a storage-intensive software infrastructure that should meet the needs of many TV, cable and production studios. With low-cost storage they only make the business case more persuasive.

Apple will be moving a lot more terabytes this year.

Comments welcome, of course.
Update 2: I’ll be adding the Object Matrix price list to Price Lists shortly. They’ve built a cluster storage solution for Apple’s Final Cut Server archives. If you are waiting impatiently for Final Cut Server to ship you’ll want to check them out. End update 2.

Latent sector errors in disk drives

February 18th, 2008 by Robin Harris in Disk, Enterprise

Last year’s Google and CMU papers on disk failure rates (see Everything you know about disks is wrong and Google’s Disk Failure Experience) made the points that a) annual disk failure rates are significantly higher than manufacturers admit and b) that enterprise drives aren’t more reliable than consumer drives.

But in An Analysis of Latent Sector Errors in Disk Drives Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy and Jiri Schindler analyzed the error logs on over 50,000 arrays covering 1.53 million enterprise and consumer drives disks. It looks like the largest such study ever published.

Lakshmi was with the U of Wisconsin-Madison while the latter 3 work at NetApp. They published at the Sigmetrics ‘07 conference last June.

A different kind of latency
Unreported or latent disk errors are real. That’s why vendors have stopped recommending RAID 5 on SATA drives.

Disks have a lot of errors, most of them transient. This study focused on Latent Sector Errors (LSE), defined as:

. . . when a particular disk sector cannot be read or written, or when there is an uncorrectable ECC error. Any data previously stored in the sector is lost.

They don’t say so explicitly, but these are surely NetApp arrays. They also comment on the effectiveness of media and disk scrubbing, a feature of high-end arrays.

Results

  • Yes, there are “bad” disks: 0.2% of the drives had more than 1000 errors.
  • 3.45% of the entire population had LSE over the 32 month study period.
  • 8.5% of the consumer disks had LSE
  • 1.9% of the enterprise disks had LSE
  • In their first 12 months 3.15% of consumer and 1.46% develop at least one LSE

Causation
The team found several factors that contribute to LSE.

  • Size matters. As disk size increases, so does the fraction of disks with LSE.
  • Age matters. LSE rates climbed with age. 20% of some - but not all - consumer disks had LSE after 24 months. Rates climbed faster for consumer drives than for enterprise drives.
  • Vendor matters. They also found that some vendors had much higher LSE than others. Due to the industry omerta they don’t rat out the offenders.
  • Errors matter. A drive that develops one error is much more likely to develop a second. The second error is likely to be close to the first error. Once a drive develops an error, both enterprise and consumer drives are equally likely to develop a 2nd error.

Annual sector error rates
This figure from the paper indicates the variability in age-related error rates


The caption states:

For each disk model that has been in the field for at least two years, the first bar represents Year 1 and the second represents Year 2. The NL and ES bars represent weighted averages for nearline and enterprise class drives respectively.

Consumer/SOHO users with large, cheap, old disks will see LSE. Another reason Desktop RAID is a bad idea. Not many consumers replace their drives every 24 months.

File system implications
File systems rely on disk-based data structures to keep track of your stuff. One of the key findings of the team is that disk errors tend to congregate near each other, like congressmen and lobbyists.

Therefore, file systems that replicate critical data across the disk are much less likely to lose your data than those, like ReiserFS, place critical structures in one contiguous area. Related issue: since disks virtualize the block structure, how do FS designers know where their data structures actually go on disk?

Media and data scrubbing
What’s the difference?

Media scrubs use a SCSI Verify command to validate a disk sector’s integrity. This command performs an ECC check of the sector’s content from within the disk without transferring data to the storage layer. On failure, the command returns a latent sector error.

While

A data scrub is primarily used to detect data corruption. This scrub issues read operations for each disk sector, computes a checksum over its data, compares the checksum to the on-disk 8-byte checksum, and reconstructs the sector from other disks in the RAID group if the checksum comparison fails. Latent sector errors discovered by data scrubs appear as read errors.

In the analyzed drives over 60% of LSE were found by scrubbing. Scrubbing is a high-end feature that works.

The StorageMojo take
The consistency of LSE as disk capacity increased suggests that there is a constant head/media issue. Since consumer drives are larger than enterprise drives, part of the higher LSE rate is explicable.

The higher LSE rate increase for aging consumer drives suggests that enterprise drives are higher quality. Or maybe their error correction is better.

Finally, drive vendors need to re-think their ECC strategies. As capacities increase so will LSE. Higher quality ECC comes at the cost of capacity. It is time to start paying that price.

Comments welcome, of course. Download the article pdf here.

Protein quantum dot optical next gen storage

February 14th, 2008 by Robin Harris in Disk, Future Tech

The magnetic spots in disk storage are already smaller than semiconductor feature sizes, and patterned media and heat-assisted recording will give us 10 TB 2.5″ disks in the next decade. But then what? Optical protein-based quantum dots could be the answer.

Scientists at a Osaka University lab say in a recent paper:

. . . we have established a novel, rapid method for the fabrication of a “protein recording material”, which enables us to spatiotemporally regulate the recording, reading, and erasing of a fluorescent protein array as information by a photochemical technique. A photolinker that we synthesized here was used to control the protein array spatiotemporally.

The patterned surface was manufactured using two similar processes. One used quantum dot 605-streptavidin conjugates. Under a medium wave UVB laser, the conjugate fluoresces, distinguishing a 1 from a zero. They used a similar substance to build a positive version as well.

The team
Professors Koji Nakayama, Takashi Tachikawa, and Tetsuro Majima, who authored the paper have published an incredible amount of work on nanotechnology, biochemistry and chemistry. It feels like they woke up one day and realized, “hey, we have fluorescent markers, proteins and substrates, let’s build a storage prototype!”

Here’s a picture I borrowed from their paper:

Mainstream technology
What I like about this technology - and this is simply a lab demo, nowhere near commercial introduction, and could be derailed by many problems - is that it could use much of today’s disk infrastructure. Servo, signal processing, steppers, glass disks - and some of the planned future technology - patterned media and HAMR lasers - is directly applicable.

The underlying technology is widely used, as the team notes:

Protein patterning on solid surfaces is a topic of significant importance in the fields of biosensors, diagnostic assays, cell adhesion technologies, and biochip microarrays.

The importance of utilizing existing technology, representing thousands of man-years of refinement and billions of dollars of investment, is key. Thousands of engineers know how to work with current technology, speeding adaptation of new techniques.

The Storage Bits take
Few appreciate how much the exponential increase in storage areal density has fostered computing advances. As Moore’s law has driven processing power, the advance of storage technology has - just barely - enabled massive data stores and rates to feed insatiable processors.

Optical protein storage should be much more stable than magnetic storage as well. Magnetic bits are subject to many kinds of degradation, while proteins can be very persistent, as the prions causing Mad Cow disease show.

Much work remains before protein storage sees the light of a commercial introduction. Its importance is that it gives us another tool to advance our ability to preserve and access the information that makes our culture and civilization possible. Professor Tetsuro Majima and his team deserve our gratitude for this breakthrough.

Comments welcome, as always. This is a highly technical chemistry paper so I just skimmed the surface. Get the pdf here.

EMC’s new flash drives

January 14th, 2008 by Robin Harris in Disk, Enterprise, SSD/Flash Disk

About time
I’m in Silicon Valley for a few days. So I’ll keep this brief.

EMC is pulling out the stops. First Hulk/Maui clusters and now putting flash SSDs in the Symm. They are positioning it as technology leadership, which it isn’t, but it is marketing leadership. I’m impressed.

SSDs have been around for decades. Symms have been around for over 15 years, so why now?

I suspect the rising chorus of customers complaining about 30% capacity utilization rates coupled with Wall Street’s economic woes - I wouldn’t want to be EMC’s Citibank account manager - helped them make the decision. Plus the rise of cluster block storage - XIV the latest case in point - means that if you want to own the high-performance array crown it is time to stake out the territory.

Plus the margins are great!
I haven’t seen any pricing yet, but knowing EMCs general strategy I suspect they are charging their usual 6x markup over cost for the SSDs. Despite that it should be an easy business case for a CFO to approve.

But if you are going to spend big bucks on an SSD, is putting it inside a single storage array the right way to go? The wide-awake folks at Texas Memory Systems think not. They provided me with this table comparing their SSD to the STEC ZeusIOPS drive EMC is using.

Performance metric
Zeus IOPS
TMS
RamSan-500
Sustained random read IOPS
52,000
100,000
Sustained random write IOPS
17,000
20,000
Sustained sequential reads
250MB/sec
2,000MB/sec
Sustained sequential writes
200MB/sec
2,000MB/sec

Make an entire SAN go faster instead of a single array? Sounds good to me.

The StorageMojo take
Will SSDs finally get some data center love? EMC’s endorsement of SSDs should provide an opening for the long-suffering SSD companies to get more attention from the enterprise. If it’s good enough for EMC . . . .

Comments welcome, as always. Moderation may be slightly more intermittent than usual, but moderate I shall. When I’m not enjoying the convertible I rented.

Flash drives not worth the candle in notebooks

November 15th, 2007 by Robin Harris in Disk, SSD/Flash Disk

I wrote about my testing of notebook disk drive power usage on ZDnet yesterday (see How much does a flash disk increase battery life?). I pulled the 160 GB WD Scorpio out of my MacBook and ran it on wall power through a Kill-a-watt meter to better understand power usage. I learned - or relearned - a few things.

What surprised me most was the fact that as I measured power usage I saw that I/O, CPU and network usage were all intertwined. I’m surprised I was surprised since they are systems and the pieces work together.

I/O and CPU
I ran a defrag program that exercised the disk while driving CPU usage on a Core Duo to 90%. Since the CPU uses almost 3x the power of the disk it is the CPU and not the disk that is the power hog.

In fact, the biggest power hog is the base system: 13 watts sitting there doing nothing, LCD, Wi-Fi and Bluetooth all turned off, with no CPU load.

Hey, Taiwan, want to build long-battery life notebooks: figure out how to turn more pieces of the system off when not in use.

28 watts max
With the LCD turned on full, Wi-Fi and Bluetooth on, and the CPU at 90% the diskless notebook pulled 28 watts.

Turning on Wi-Fi was good for 2 watts, almost as much as a busy disk, partly due to the CPU load of running the TCP/IP stack.

Optical drives are power hogs too. Maybe not the drive itself, but the associated graphics or CPU processing.

Here are the numbers

The StorageMojo take
The significance of the intertwined nature of I/O, CPU and network usage is this. Flash drives sip power, but in a busy system it is all the other subsystems that chew up the battery.

The power saving advantages of a flash drive are best in a lightly loaded system with a long battery life, i.e. your cell phone, PDA or ultra-light notebook. In a 2-3 hour battery life 15-17 inch notebook the 2-3 watts a disk uses is almost noise level.

I see some evidence that the flash drive makers are adjusting their marketing to these facts. That is all to the good. Given the price/capacity differential you want flash disk customers buying for the right reasons.

Comments welcome, of course.

Seagate ships infected drives

November 13th, 2007 by Robin Harris in Disk, Security & Public Policy

The China syndrome pt. II
According to Engadget some Maxtor-branded Seagate drives shipped with a handy little virus:

. . . drives produced by a company sub-contract manufacturer located in China were reportedly sent out with the Virus.Win32.AutoRun.ah program already loaded. Apparently, the molar virus is one that get its kicks by searching for passwords to online games (World of Warcraft included) and sending them back to a “server located in China,” and as if that wasn’t enough, it can also disable virus detection software and delete other molar viruses without breaking a sweat.

So many questions
So what would be different if Seagate was Chinese-owned (see The China syndrome)? I suppose it would be easier to build viruses into the firmware. Array vendors would be likely to see them, but would commodity-based cluster storage have any way to catch them?

What if the virus waited to engage until the drive had 7,000 hours of use? Even array vendors wouldn’t see that during integration.

The StorageMojo take
We can scare ourselves silly thinking about how the Chinese government could use disk drives to ferret out secrets. Ultimately though, any such data has to go through servers and networks to reach the outside world. Scanning outgoing data is the only way to protect against such espionage, be it human or virus based.

Where would that scanning take place? In a router? And where is code developed for routers? Some, at least, in China.

If the Chinese made a $30 billion investment in Seagate they’d have to weigh the short term advantage of surreptitious data gathering against the virtually 100% chance they’d get caught. The impact on their investment and their world image would be huge, especially in all the 3rd world countries that would have no idea how badly they’d been compromised.

Disk-based espionage seems highly unlikely. Router-based espionage seems much more likely.

Comments welcome, of course.

Hurtin’ Hitachi

October 1st, 2007 by Robin Harris in Disk

They bought IBM’s business, not the brand
Things aren’t good when an unsourced report claims Hitachi is selling its loss-making disk and the stock rises 7% in the biggest 1 day percentage gain in 4 years.

Hitachi denied it. Sounds like the bankers are trying to get something going. “More fees!”

Spend $2 Billion and what do you get?
I day older and deeper in debt. And the disk industry’s premier research group. And a contract with IBM to buy a bunch of drives. So why aren’t they minting money?

Consumerization is the new black
Hitachi is your basic $88 Billion - more if the dollar keeps sliding - company with 384,444 employees. The org chart (pdf) makes the Pentagon look like a model of organizational clarity.

Hurtin' Hitachi

Where did the disk business go? I had it here somewhere.
The “Disk Array Systems Business” is on the chart. The “Storage Area Network Systems Solution Division” is on the chart. The disk drive business isn’t. Did they forget they own it?

So a rumor about the sale of a business that doesn’t even show up on the org chart drives your stock up 7%. Ouch.

Delusional marketing will do that to you
Hitachi makes good drives. They just don’t know how to tell people. Case in point: the Hard Drive by Hitachi marketing program.

The program centers around the usage of the Hard Drive by Hitachi icon (figure 1). The icon signifies that the hard drive integrated as a key component in your product(s) is manufactured and backed by Hitachi, a premier supplier of high quality hard disk drives. Participating companies can utilize Hitachi’s reputation for quality and reliability as well as technology leadership to strongly differentiate their products in today’s competitive marketplace.

Name 1 non-geek consumer who knows Hitachi makes quality drives. Give up? Me too.

Oh, and here’s the logo:

Hurtin' Hitachi

Not quite as zippy as “Intel Inside” is it? I can just see the wise graybeards back in Japan nodding “we’ll save money with a 2 color sticker. Intel sticker costly. We don’t need that. After all, we Hitachi.”

I’m sure this is about as much as the American marketing folks could get past the “Hitachi Corporate Cross-Functional Sales Prevention Strategic Planning and Execution (literally) Steering Committee.” Grim.

The StorageMojo take
Disk drives are modern marvels and Hitachi is one of the best vendors. But Hitachi isn’t a brand to conjure with in America or Europe. Matsushita, years ago, bought every American consumer electronics brand that had any cachet and milked them ever since. Plus they built their own brands, like Panasonic, that now resonate.

What has Hitachi done? Him-m-m? “Hey, I know, let’s be so boring that we hypnotize customers into buying our products!”

Hitachi Japan: if you want a successful American disk drive business you need to get out of your nice comfortable rut and buy somebody like Iomega, that has a brand and some distribution and get serious about wooing consumers. IBM won’t buy your drives forever. America is a consumer society - or at least it was until the ecstasy-fueled sub-prime party ended - and you need consumers to buy your drives. Red labels may work for Bass Ale, but you have a different problem: no one knows who you are.

Comments welcome. Warning: when I upgraded my anti-spam software I lost the ability to look at the spam queue. So if you send a comment and it doesn’t show up in a day it may have been eaten by Akismet. Please resend. Yes, I sent the developers a note about it, but apparently I am a special case. Maybe the next update will fix it.



Next Article »
StorageMojo RSS Feed May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006