StorageMojo




Robin Harris    


NetApp’s research offensive

February 26th, 2008 by Robin Harris in Architecture, Disk

After last year’s publication of the Google and CMU papers on the much-higher-than-expected annual failure rates of disk drives, StorageMojo challenged vendors to respond.

I said

The industry has an excellent opportunity to move to greater transparency with storage consumers. Sometimes relationships need a jolt to remind everyone just how much we rely upon each other. Storage is a vital industry with the responsibility to protect and access an ever increasing fraction of mankind’s data. Customers want the best tools for the job. It appears the industry hasn’t been providing them, at least for disk drives. I know some efforts are underway in IDEMA to improve the quality of the numbers. I’d get serious about ensuring that the revised processes actually benefit customers rather than soothing corporate egos. Otherwise this situation will arise again.

Further, the need to engage at a more personal level is a predictable outcome of the continuing consumerization of IT. This is an example of the new normal. Embrace it.

Working through the weekend, NetApp’s Val Bercovici did. IBM did so a little later. EMC said semi-nothing.

Two weeks later a not-very-bright EMC’er sent an EMC lawyer to shut StorageMojo up. Some people are so-o-o sensitive.

FAST forward
This week at FAST (File and Storage Technologies ‘08) a group of research papers respond to the Google and CMU work. In Parity Lost and Parity Regained, Are Disks the Dominant Contributor for Storage Failures?, An Analysis of Latent Sector Errors in Disk Drives and An Analysis of Data Corruption in the Storage Stack NetApp researchers working with academics including Bianca Schroeder - one of the authors of the CMU paper - and Andrea and Remzi Arpaci-Dusseau, of the University of Wisconsin, produced a series of papers examining the state of the art in data storage.

Often using NetApp’s AutoSupport data base, the papers delve into knotty problems in array architecture and component behavior. With the advantage of large sample sizes the papers see further into statistically uncommon events.

For example An Analysis of Data Corruption in the Storage Stack looked at over 1.5 million disks on more than 40,000 systems over 41 months. Those numbers dwarf the combined samples of the Google and CMU teams.

Some surprising results
The cynical, myself among them, might be tempted to dismiss the work as exercise in self-justification. The studies find disk scrubbing useful in eliminating silent data corruption, a result any half-awake SE will use to their advantage.

But in Parity Lost and Parity Regained - nice Milton reference! - they also found that disk scrubbing could spread an error - parity pollution - across multiple disks. In fact,

. . . the tendency of scrubs to pollute parity increases the chances of data loss when only one error occurs.

This is honest research, following the data where ever it goes. It is the difference between science and spin.

The StorageMojo take
NetApp’s research offensive is commendable. While IBM, HP and Microsoft maintain large research groups and publish regularly, they are many times NetApp’s size.

It is also smart marketing. NetApp’s research gives them a ready entree to corporate system architects and technical opinion leaders with a fresh and data-heavy perspective on IT risk management.

NetApp is to be congratulated for the work they’ve done. By participating in the conversation they advance the state of the art and their stature with customers. The former is good for the industry and both are good for NetApp.

Update: A commenter requested links to the papers. They aren’t all freely available on line yet. Here are the two I found online. Download the pdf for Parity Lost and Parity Regained, An Analysis of Data Corruption in the Storage Stack.

Update 2: Prof. Peter Honeyman of CITI wrote in to let us know that the FAST papers are available here. Thanks Doc.

Comments welcome, of course.

Why do storage systems fail?

February 24th, 2008 by Robin Harris in Architecture, Disk, Enterprise

It’s the disks, right?
We’ve heard much about disk failures - as recently as last week as well as last year’s reports from Google and CMU. But what about the rest of the system?

In a FAST ‘08 paper to be presented this week - Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics - authors Weihang Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky analyze logs from 39,000 systems over 44 months to get answers.

1.8 million disks in 155,000 shelves
NetApp provided data from a variety of systems, including near-line, low-end, mid-range and high-end arrays. The team analyzed the log reports to understand what components led to failures.

The 15 page paper offers some interesting findings

  • Physical interconnect failures are a significant contributor - anywhere from 27-68% - of storage subsystem failures.
  • Subsystem failure rates that use the same disk models show similar disk failure rates - but the subsystem failure rates vary significantly.
  • Enclosures have a strong impact on subsystem failures. Some enclosures work better with some drives than others.
  • Dual-redundant FC shelf interconnects reduce annual failure rates 30-40%.
  • Interconnect and protocol failure rates are much more bursty than disk failures. Some 48% of overall subsystem failure arrive at the same shelf within 10,000 seconds (~ 3 hours) of the previous failure.
  • As interconnect failures are so bursty, resilience mechanisms beyond RAID are required to achieve subsystem availability.

What else?
They also found that enterprise drives had an AFR consistent with manufacturer specs - less than 1% AFR. This result derives from looking at the disks as the system does rather than as users see them.

The StorageMojo take
Interconnects, especially connectors, have long been fingered as a significant cause of the equipment problems - and not just in storage. While the team seems to report that interconnects are a greater cause of subsystem failure than disks, there seems to be some room for disagreement about what the numbers are telling us.

For example, this result doesn’t fully explain the delta between what disk users have found and the “trouble not found” rates that manufacturers report. Even if you accept the common 50% TNF vendors report, drive failures are still higher than this research finds.

Perhaps we should conclude that NetApp’s engineering is higher quality than the general run of storage arrays. Or perhaps system log analysis is still a dark art whose results are more indicative than conclusive.

Comments welcome, as always. I’m at the FAST ‘08 conference this week in the San Jose Fairmont hotel.

StorageMojo update

February 21st, 2008 by Robin Harris in Off-Topic

Nothing major if you’re wondering.

IDG
You should be seeing higher quality advertising. IDG, the folks who publish everything that ends in “world” like PC World, Computerworld and Macworld have put together a network of mostly IDG content and selected independent publishers like StorageMojo to create a compelling Internet ad buy.

You should be seeing the first of the new format ads at the top of the page and to the right. I’ll be replacing the Google ads with the IDG ads to reduce clutter.

Soon there will be a larger ad in the center column. IDG tells me that advertisers love them, i.e. pay more. As a card-carrying capitalist I’m down with that.

The point is to generate significant revenue so I can continue to bring you lovely pictures of red rocks and maybe even more content.

IDG offers publishers a traditional CPM model based on impressions rather than clicks. Benefit: you don’t have to click Google ads just so I can eat.

New format?
The timing is less certain, but I’ve found a new format, or theme, for Wordpress that would be a nice update. It is a cleaner layout, more whitespace, better typography and offers a larger header picture. I don’t know when I’ll do the update, but it is on the agenda.

I hope you’ll like it, whenever it arrives.

Top 100 analyst blog
I discovered today that StorageMojo made Technobabble’s Top 100 analyst blog list. AFAIK there are only about 120 analyst blogs, but I’m heartened that StorageMojo came in #19 overall and ahead of any other storage analyst blog. Dave Hitz probably gets more hits, but as co-founder of NetApp he isn’t a storage analyst.

Warning: if StorageMojo falls off the next version of the list, don’t expect to read about it here!

Update: Someone, bless ‘em, linked to StorageMojo from - who knew? - Flickr! LOL!

Maybe I need to get with the social networking thing after all.

Comments welcome, of course.

What’s with Isilon?

February 21st, 2008 by Robin Harris in Clusters, NAS, IP, iSCSI

They haven’t reported financials for almost 3 quarters. Their stock is trading at about 20% of its peak. They fired their CEO and put founder Sujal Patel in his place. And NetApp was trying to strangle baby Isilon (see NetApp filers for $1/GB?) in its crib.

Are they goners?

I don’t think so.
I’ve been trying to read the tea leaves on the Peter van Oppen’s decision to join the board earlier this month.

Peter led the tape library company ADIC, also based in the Seattle area, for 12 years until its sale to Quantum. ADIC out-innovated Quantum - saddled with a cranky and slow DLT development group - in libraries and software as well.

If you think the folks who buy storage arrays are conservative, you haven’t sold any tape libraries. It is a tough market and ADIC did well.

So why would van Oppen join a sinking ship?
That’s why I don’t think Isilon is sinking. An external audit team is reviewing Isilon’s accounting to ensure that any financial dirty laundry - say, hypothetically, channel stuffing - gets cleaned up. They’ve been at it for months and must be about done.

The StorageMojo take
Based on the Isilon press release and pure speculation, here’s what I think is going down:

  • Peter exercised some due diligence before accepting the directorship and isn’t terribly worried about the basic health of the company
  • After he gets up to speed on company operations, he assumes the CEO role by July
  • Sujal happily goes back to one of the best jobs in any company: CTO and Founder while the stock climbs in value

However it goes down, getting Peter on board is a real plus. Storage experience is thin in Seattle. Isilon has lots of smart people, but the storage market has many unique wrinkles that networking or software folks take a long time to learn.

Comments welcome, as always. Disclosure: I met Sujal 7 years ago and I’ve done some work for Isilon. I hope they do well.

Apple’s Xserve RAID bites the dust

February 19th, 2008 by Robin Harris in Disk, SAN, FC

StorageMojo reported last June 19th a rumor that Apple’s Xserve RAID would bite the dust. And now, exactly 8 months later, they’ve pulled the plug.

I saw a wall of Xserves and Xserve RAIDs at NAB last year and they were, without a doubt, the prettiest server/storage combo in the world. Brushed stainless steel, blue LEDs and the symmetrical installation looked like Hollywood’s idea of a computer. (Although the server room in Live Free or Die Hard is even crazier.)

Replaced by the Promise Vtrak
Not as pretty but more functional. The Xserve RAID didn’t have dual-redundant active/active controllers with failover, so users had to rely on software mirroring. An OK solution, but not a great one.

Xserve RAID’s big advantage, other than great looks, was price. A quarter the price of other FC RAID kit.

But with the Promise Vtrak arrays, Apple can now quote $1.12 per GB in 26 TB chunks. Pretty good! On a 4 Gbit FC backbone, they can deliver 6 streams of 8-bit uncompressed HD video. Pretty fast!

The Promise kit is fully redundant with hot-swap components. Not the sort of thing that Apple should spend money engineering. And it looks like it is packaged in a nice Xyratex enclosure, the standard of the industry.

Update: One commenter assures us that Promise doesn’t use Xyratex enclosures. I guess there are just so many ways to stick 16 drives into a 3U 19″ rack.

There also seems to be some angst over the apparent outsourcing to Promise as opposed to the Apple label Xserve RAID. Make no mistake, Apple outsourced the Xserver RAID as well to someone who did Apple’s industrial design. With Promise they are just making that apparent, probably because they get a better deal. But you still buy it from the Apple store, not Promise.

As an aside, Steve Jobs has many fine qualities, but his appreciation for how storage can extend Apple’s business is on a par with Scott McNealy’s - i.e. clueless. So it goes. End update.

The StorageMojo take
This move strengthens Apple’s thrust into professional video production and film editing. Their software-only competitors should be sweating, since Apple keeps throwing more functionality into Final Cut Studio, like Color, for very competitive prices.

With the release of Final Cut Server, expected shortly, Apple will have a storage-intensive software infrastructure that should meet the needs of many TV, cable and production studios. With low-cost storage they only make the business case more persuasive.

Apple will be moving a lot more terabytes this year.

Comments welcome, of course.
Update 2: I’ll be adding the Object Matrix price list to Price Lists shortly. They’ve built a cluster storage solution for Apple’s Final Cut Server archives. If you are waiting impatiently for Final Cut Server to ship you’ll want to check them out. End update 2.

Latent sector errors in disk drives

February 18th, 2008 by Robin Harris in Disk, Enterprise

Last year’s Google and CMU papers on disk failure rates (see Everything you know about disks is wrong and Google’s Disk Failure Experience) made the points that a) annual disk failure rates are significantly higher than manufacturers admit and b) that enterprise drives aren’t more reliable than consumer drives.

But in An Analysis of Latent Sector Errors in Disk Drives Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy and Jiri Schindler analyzed the error logs on over 50,000 arrays covering 1.53 million enterprise and consumer drives disks. It looks like the largest such study ever published.

Lakshmi was with the U of Wisconsin-Madison while the latter 3 work at NetApp. They published at the Sigmetrics ‘07 conference last June.

A different kind of latency
Unreported or latent disk errors are real. That’s why vendors have stopped recommending RAID 5 on SATA drives.

Disks have a lot of errors, most of them transient. This study focused on Latent Sector Errors (LSE), defined as:

. . . when a particular disk sector cannot be read or written, or when there is an uncorrectable ECC error. Any data previously stored in the sector is lost.

They don’t say so explicitly, but these are surely NetApp arrays. They also comment on the effectiveness of media and disk scrubbing, a feature of high-end arrays.

Results

  • Yes, there are “bad” disks: 0.2% of the drives had more than 1000 errors.
  • 3.45% of the entire population had LSE over the 32 month study period.
  • 8.5% of the consumer disks had LSE
  • 1.9% of the enterprise disks had LSE
  • In their first 12 months 3.15% of consumer and 1.46% develop at least one LSE

Causation
The team found several factors that contribute to LSE.

  • Size matters. As disk size increases, so does the fraction of disks with LSE.
  • Age matters. LSE rates climbed with age. 20% of some - but not all - consumer disks had LSE after 24 months. Rates climbed faster for consumer drives than for enterprise drives.
  • Vendor matters. They also found that some vendors had much higher LSE than others. Due to the industry omerta they don’t rat out the offenders.
  • Errors matter. A drive that develops one error is much more likely to develop a second. The second error is likely to be close to the first error. Once a drive develops an error, both enterprise and consumer drives are equally likely to develop a 2nd error.

Annual sector error rates
This figure from the paper indicates the variability in age-related error rates


The caption states:

For each disk model that has been in the field for at least two years, the first bar represents Year 1 and the second represents Year 2. The NL and ES bars represent weighted averages for nearline and enterprise class drives respectively.

Consumer/SOHO users with large, cheap, old disks will see LSE. Another reason Desktop RAID is a bad idea. Not many consumers replace their drives every 24 months.

File system implications
File systems rely on disk-based data structures to keep track of your stuff. One of the key findings of the team is that disk errors tend to congregate near each other, like congressmen and lobbyists.

Therefore, file systems that replicate critical data across the disk are much less likely to lose your data than those, like ReiserFS, place critical structures in one contiguous area. Related issue: since disks virtualize the block structure, how do FS designers know where their data structures actually go on disk?

Media and data scrubbing
What’s the difference?

Media scrubs use a SCSI Verify command to validate a disk sector’s integrity. This command performs an ECC check of the sector’s content from within the disk without transferring data to the storage layer. On failure, the command returns a latent sector error.

While

A data scrub is primarily used to detect data corruption. This scrub issues read operations for each disk sector, computes a checksum over its data, compares the checksum to the on-disk 8-byte checksum, and reconstructs the sector from other disks in the RAID group if the checksum comparison fails. Latent sector errors discovered by data scrubs appear as read errors.

In the analyzed drives over 60% of LSE were found by scrubbing. Scrubbing is a high-end feature that works.

The StorageMojo take
The consistency of LSE as disk capacity increased suggests that there is a constant head/media issue. Since consumer drives are larger than enterprise drives, part of the higher LSE rate is explicable.

The higher LSE rate increase for aging consumer drives suggests that enterprise drives are higher quality. Or maybe their error correction is better.

Finally, drive vendors need to re-think their ECC strategies. As capacities increase so will LSE. Higher quality ECC comes at the cost of capacity. It is time to start paying that price.

Comments welcome, of course. Download the article pdf here.

Protein quantum dot optical next gen storage

February 14th, 2008 by Robin Harris in Disk, Future Tech

The magnetic spots in disk storage are already smaller than semiconductor feature sizes, and patterned media and heat-assisted recording will give us 10 TB 2.5″ disks in the next decade. But then what? Optical protein-based quantum dots could be the answer.

Scientists at a Osaka University lab say in a recent paper:

. . . we have established a novel, rapid method for the fabrication of a “protein recording material”, which enables us to spatiotemporally regulate the recording, reading, and erasing of a fluorescent protein array as information by a photochemical technique. A photolinker that we synthesized here was used to control the protein array spatiotemporally.

The patterned surface was manufactured using two similar processes. One used quantum dot 605-streptavidin conjugates. Under a medium wave UVB laser, the conjugate fluoresces, distinguishing a 1 from a zero. They used a similar substance to build a positive version as well.

The team
Professors Koji Nakayama, Takashi Tachikawa, and Tetsuro Majima, who authored the paper have published an incredible amount of work on nanotechnology, biochemistry and chemistry. It feels like they woke up one day and realized, “hey, we have fluorescent markers, proteins and substrates, let’s build a storage prototype!”

Here’s a picture I borrowed from their paper:

Mainstream technology
What I like about this technology - and this is simply a lab demo, nowhere near commercial introduction, and could be derailed by many problems - is that it could use much of today’s disk infrastructure. Servo, signal processing, steppers, glass disks - and some of the planned future technology - patterned media and HAMR lasers - is directly applicable.

The underlying technology is widely used, as the team notes:

Protein patterning on solid surfaces is a topic of significant importance in the fields of biosensors, diagnostic assays, cell adhesion technologies, and biochip microarrays.

The importance of utilizing existing technology, representing thousands of man-years of refinement and billions of dollars of investment, is key. Thousands of engineers know how to work with current technology, speeding adaptation of new techniques.

The Storage Bits take
Few appreciate how much the exponential increase in storage areal density has fostered computing advances. As Moore’s law has driven processing power, the advance of storage technology has - just barely - enabled massive data stores and rates to feed insatiable processors.

Optical protein storage should be much more stable than magnetic storage as well. Magnetic bits are subject to many kinds of degradation, while proteins can be very persistent, as the prions causing Mad Cow disease show.

Much work remains before protein storage sees the light of a commercial introduction. Its importance is that it gives us another tool to advance our ability to preserve and access the information that makes our culture and civilization possible. Professor Tetsuro Majima and his team deserve our gratitude for this breakthrough.

Comments welcome, as always. This is a highly technical chemistry paper so I just skimmed the surface. Get the pdf here.

Atrato: High-performance, high-density storage

February 12th, 2008 by Robin Harris in Architecture, Future Tech

Got an interesting press release this morning about a Denver-area company, Atrato, announcing its existence and $18 million in funding. Their mission:

Based in Westminster, Colorado, Atrato Inc.’s (www.atrato-Inc.com) mission is to help companies in entertainment, the Web, IPTV, HPC and VOD open up infinite new worlds of content for customers by offering them high-speed, high volume data access. Atrato’s high-density storage system with integrated data acceleration does nothing less than change the economics of high-speed/high-volume I/O processing.

So what do they have?
They say very little about their technology in the release:

. . . breakthrough technology, a high-performance storage platform that is designed to eliminate the barriers to high-speed / high-volume data access, unlocking revenue and opportunities for a range of applications and industries.

The web site makes some more specific claims which are excerpted below:

Speeds to support any load level. Easily handles traffic spikes with the power of hundreds of servers energizing your site.

Content is protected at both the stream and hardware levels (in flight and at rest) to ensure the security and integrity of your content while the sealed array eliminates most physical security vulnerabilities.

The industry’s only three year maintenance-free, fail-in-place operation available today and has been granted hundreds of patent claims with numerous others applied for.

. . . up to 10,000 I/Os per second or 3000 streams in 5RU . . . .

[bolding added]

How do they do it?
They say little about the secret sauce, but after looking their web site and some patent applications I’ll venture this much:

  • The core team is heavy on hardware guys. With the numbers they’re quoting this is an ASIC-enabled box - think BlueArc for I/O. Lots of internal parallelism, wide stripes and mirroring.
  • They’ve developed some innovative packaging technology for high-density disk - I’m guessing up to 400 2.5″ drives per enclosure. You can’t easily replace the drives, so they’ve made a virtue of necessity and “sealed” the enclosure.
  • Beyond the high-density packaging they’ve thought long and hard about how to ensure a 3 year operational life. Offset counter-rotating drive pairs to damp rotational and actuator vibration, high-flow cooling and ample hot-spare provisioning are key.

The StorageMojo take
Way cool! Hardware is cheap, labor and downtime expensive so their architecture works from a TCO perspective. Sticking these boxes in cable system head-ends will simplify content distribution and support at the same time.

The prices are likely to look high, but when you factor in the 3 year maintenance contract it should be persuasive. With 80,000 IOPS from a single 42U rack it may even find favor in more I/O intensive environments.

This is the kind of innovative packaging I would have expected from Xyratex. Congrats to the Atrato team for a thorough re-thinking of storage infrastructure.

Comments welcome, as always. Atrato team members?

StorageMojo at FAST 2008

February 10th, 2008 by Robin Harris in Future Tech

Join me in San Jose, CA, February 26–29, 2008, for the latest in File And Storage Technologies.

Top researchers from academe and industry - NetApp, IBM, Microsoft, Data Domain, HP, Panasas, Yahoo, Seagate and more - will present their latest research. [Doesn't EMC do any research?]

There will be thought-provoking presentations. I’ve already downloaded a number of papers and plan to report on some in the next couple of weeks.

If you’d like to meet, please drop me a line in the comments or send an email to robinATthisdomainname. Libation bearers are especially welcome.

The StorageMojo take
A lot of great research (see Everything you know about disks is wrong) has been presented at past FAST conferences and this year looks to be no exception. The work on data corruption looks very promising.

I’m also looking forward to shooting some video for the StorageMojo YouTube channel. Yes, it’s looking a little threadbare right now, as technical difficulties have slowed me down - FCS2’s underpinnings could be a lot more robust - so FAST should be a good place to get some new content.

Comments and invites welcome. Does anyone know if EMC has ever presented at FAST? Update: Several commenters quickly assured me that EMC

  • Does lots of research
  • Has presented at FAST
  • Prefers to present at ACM

Thanks for the links!

White House data loss

February 6th, 2008 by Robin Harris in Information Management, Security & Public Policy

What’s wrong with White House backup?
I published a review of David Gewirtz’s book Where Have All the Emails Gone? over on ZDnet.

A quick overview:

  • The White House may or may not have lost 5 million emails. They aren’t sure.
  • Gewirtz, an email expert, started investigating the White House email infrastructure and found:
    • The mail archiving process is unprofessional and unworkable.
    • The claimed loss of email in a Notes to Exchange migration is highly unlikely.
    • Over 100 million emails from the White House were sent through an insecure ISP in Chattanooga TN.
  • Existing law - the Hatch Act - mandates an external email system for partisan political activity, a ludicrous requirement in a 7×24 Washington.

The Hatch Act prescribes what partisan political activities are acceptable for federal employees. One of the prohibitions is the partisan use of government property. While a good idea in general, in the case of telecom the prohibition is senseless.

White House communications need to be secure. When we force White House employees to use multiple email, IM and computer systems it is inevitable that material received on the internal system will go out over the external system. A single secure system is easier to achieve.

This isn’t about George Bush
This is about maintaining records so the next administration can know how policy got developed and what committments were made. I’ll let others worry about if the loss of the emails was part of a deliberate attempt to cover up criminal activity.

Ironic, isn’t it?
American companies are spending billions for backup and archive software and hardware. But the White House, head of an executive branch with a $3 trillion budget, can’t manage its email backups despite a clear legal requirement to do so under the Presidential Records Act?

The StorageMojo take
Gewirtz recommends that a professional, non-partisan IT organization be detailed with the job of protecting and archiving all White House email communications. There are many groups with the ability and the motive to snoop White House email going out over the public Internet. That has to stop.

Making a single entity responsible, as the Secret Service is for Presidential safety, is the best way to ensure that vital public records are protected. It will also help remind White House officials that they are accountable to the people of the United States.

Comments welcome, as always. BTW, Congress also needs to clean up its data protection act. It is less urgent thant the White House, but just as important.

Update: As luck would have it the New York Times reports another Bush attack on America’s right to know. After passing Congress unanimously he’s gutting the latest freedom-of-information law in the budget. A new high in bipartisanship! Less than a year to go!

Set phasers to “change”

February 5th, 2008 by Robin Harris in Future Tech, SSD/Flash Disk

Flash may be getting all the attention, but the boffins are working hard to ensure we have options to flash. We need those options because flash has some serious limitations, like random write performance and density, that we may not be able to overcome.

On the other hand it is easy to underestimate the power of sustained investment in R&D. Disk drives have successfully fended off numerous would-be usurpers thanks to their incredible areal density growth.

Now flash is facing a challenger
Intel and STMicroelectronics’ new phase-change memory threatens flash, but not any time soon. Unveiled at ISSCC, the thermal phase-change device can store 2 bits per cell, like MLC flash.

I asked Jim Handy, of Objective Analysis, a semi-conductor market research firm, for his take on the Intel-STM announcement. He responded:

The big question is “When does Phase Change stand a chance?” I doubt that it could be competitive today because a wafer with a new material is bound to be much more costly than a pure silicon wafer until volumes get up, and the volume won’t get up until the price is competitive with flash. The number of chips per wafer is about the same for PCM as for NOR flash, so there’s no cost advantage from die size.

Once flash reaches its scaling limit brick wall (which was expected to be 2006 at 65nm, then at 25nm in 2012, now looking like 10nm around 2016) then PCM will zip right past it. Trouble is - that brick wall has legs and keeps zipping ahead of us.

Until then PCM should have trouble competing on cost, and cost is everything in the semiconductor memory markets.

One advantage PCM has is that it has a fast write, so in many cases a PCM chip can replace a flash and a RAM. This means that the cost target is something higher than simply matching a NOR price.

The significance of an MLC PCM is that it puts PCM on the same footing as MLC NOR. Had that not happened, then there would have been a longer delay before PCM replaced NOR, since a PCM die size would have been twice as large as an MLC NOR of the same density on the same process.

As for PCM replacing NAND, wellllll…… that’ll take longer since NAND’s about 1/3 the cost of NOR. The same brick wall impacts both technologies, though, so it will happen!

Thanks, Jim. I hadn’t realized that PCM could replace a NOR and a RAM chip.

The StorageMojo take
Jim’s take is better informed than mine. My take away is that the growth of PCM will depend on how broad a market niche it can build for itself over time. That won’t be easy.

Comments welcome, of course.

Flash performance update

February 2nd, 2008 by Robin Harris in SSD/Flash Disk

Update on mobile flash performance
Mikko Pitkanen over at the mobile development blog Delay ToleraNt posted some more tests on Nokia N800’s flash performance. He’s a doctoral candidate at the Helsinki Institute of Physics at CERN in Switzerland with a strong interest in storage.

The money quotes:

. . . the first observation is that we achieve write performance close to 1Mbit/s for small (less than 1 MB) files.

. . . the read performance is much better than for writing and is certainly enough to play movies. The write performance instead, is poor and would not allow the user to receive large files with the full bandwidth achievable by the device’s WLAN.

Mikko’s got a new Nokia N810 that he’s loving, so that will be it for the N800 data. Good data point. Mikko, if and when you get some N810 performance data please send it along. Thanks!

More flash high performance
Intel and Micron announced a very fast flash chip - 200 MB/s read and 100 MB/s write - but the press release included this big caveat:

“Micron looks forward to unlocking the possibilities with high speed NAND,” said Frankie Roohparvar, Micron vice president of NAND development. “We are working with an ecosystem of key enablers and partners to build and optimize corresponding system technologies that take advantage of its improved performance capabilities.

Translating from marketing speak: “nobody has the technology, like the translation layer, to take advantage of this chip.”

The StorageMojo take
Realizing flash’s potential will be a multi-year, multi-company effort. No one has a clear idea of what the ultimate limits will be. In the meantime the disk folks will be working to limit the damage by raising reliability, density and shock resistance. Both technologies have a place. The fight is about boundaries. And all of us consumers benefit.

Comments welcome, as always.

What was Ray Ozzie thinking?

February 2nd, 2008 by Robin Harris in Enterprise, Future Tech

I wrote a first pass on the Microsoft/Yahoo for ZDnet yesterday morning. Short version: are they nuts?

The silliest comment
Ray Ozzie was quoted saying:

Our lives, our businesses, and even our society have been progressively transformed by the Web, and Yahoo! has played a pioneering role by building compelling, high-scale services and infrastructure,” said Ray Ozzie, chief software architect at Microsoft. “The combination of these two great teams would enable us to jointly deliver a broad range of new experiences to our customers that neither of us would have achieved on our own.

I agree about the compelling services. Yahoo has a number of market-leading services, starting with mail.

High-scale infrastructure?
I don’t think so. Very conservatively Yahoo’s infrastructure costs are 3x Google’s. Probably 8-10x.

By all accounts Mr. Ozzie is a brilliant fellow. So why the silly comment? A few possibilities come to mind:

  • PR flacks wrote the comment for him and he was too busy to review it.
  • MS investor relations wrote the comment to try to paper over the fact that there is no technology synergy in the acquisition, figuring that Wall St. analysts wouldn’t know the difference.
  • He actually believes it. They are so-o doomed!

Other than IBM, Microsoft Research probably has the most brilliant CompSci group in the industry - and that includes Google. They can’t solve problems?

What is the real problem?
BillG and Steve Ballmer were out of new ideas - or good ideas they could easily copy - after Windows 3.1 and Office. The illegal strangulation of Netscape has cost Msoft billions in penalties and still, 10 years after, IE is losing market share. Gee, maybe the browser wasn’t important after all!

It also looks like Microsoft avoids the kind of clean sheet design that gave Google its cost advantage. You must use Windows. You must use Dell. You must use CIFS. Who knows what self-sabotaging corporate injunctions are stifling Microsoft developers? Because they sure have the smarts. And the money.

The StorageMojo take
Microsoft has to stop chasing the latest Big New Thing - be it game consoles, music players, web portals or Internet advertising - and start focusing on new opportunities that they are uniquely positioned to exploit.

For example, how about migrating web-scale technology down to the enterprise? Storage companies are using Linux to create commercial storage clusters like Google’s. Why isn’t Microsoft building Boxwood-style cluster software to help enterprises lower their storage TCO? Take advantage of the Microsoft army of admins and resellers to move the concept and further entrench Windows.

And that’s far from the only opportunity.

Instead Ballmer et. al. seem obsessed with fighting wars they’ve already lost against Apple, Google and Linux (see Farewell, Bill. Yo, Ballmer, now it’s your turn! on ZDnet). Even the richest and most powerful nation software company on earth has limits and should pick its fights.

Comments welcome, of course.



StorageMojo RSS Feed May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006 October 2006 September 2006 August 2006