StorageMojo




Robin Harris    


SOHO backup that works: why is it so hard?

March 19th, 2008 by Robin Harris in Backup, Information Management

Moving to a small town in northern Arizona from Silicon Valley has enriched my perspective on many things, including how the industry develops products. The consensus is that if we take datacenter technology and put in enough defaults it will be “simple” enough for consumers. Wrong.

Memo to developers: it is ALL consumer IT
The consumerization of IT is usually means the adoption by IT of high volume consumer technologies. The PCI bus, Microsoft Windows, USB, x86, SATA disks and Wi-Fi all started in the consumer space and displaced more sophisticated and expensive IT.

But consumerization also means taking tech first developed for IT and making it easy enough for consumers. Ethernet LANs, symmetric multi-processing, external disk systems (well, really only Drobo) and what we used to call “office automation” software are now usable by non-geeks.

Pro vs amateur
Amateurs like GUIs. Pro’s like CLIs. Why do we have both on “enterprise” products? Because we are all amateurs - at something.

The third shift guys are all amateurs. They may want to be “professional” but they aren’t now.

Backup: the highest failure rate in IT?
Who knows how good the numbers are. A 40% enterprise backup failure rate is frequently bandied about. Whatever the “real” number is, it isn’t good enough.

If “professionals” with “industrial strength” backup hardware and software can only achieve a 60% completion rate - a failing grade anywhere - why does it surprise us that only a tiny percentage of small office/home office people backup regularly?

And further, why do we assume that SOHOs will never backup? “Americans will never wear seatbelts.” “People will never recycle.” “SOHOs will never backup.”

Yet the record is clear. If you take an education and ease of use approach, people will change their behavior. They will wear seatbelts. They will recycle. They will even learn to deal with PITA child seats. And they will backup.

But not if it is presented as a “junior” enterprise backup. Make it easy and affordable. Mostly easy. And people will do it.

A couple of backup products that work
On ZDnet I reviewed a Windows backup product that I could recommend to any small business here in the red rock-strewn desert, Backupkey. Plug it in, hit “enter” twice, and all your valuable Windows data gets copied.

Did this simple, useful product come from Boston? Silicon Valley? Redmond? Denver? Nope. Charleston, South Carolina? Bingo!

I suspect Backupkey got built there because the developer actually knows small business people. Knows their frustration and their intolerance for stuff that doesn’t work as they think it should.

Most Windows backup software is simply dumbed-down “real” backup. Backup sets. Incrementals. Images. Bootable. Whatever. But non-IT folks don’t know those words or concepts. Why can’t it just work?

On a Mac both Carbon Copy Cloner and SuperDuper work great and are almost easy enough for complete idiots to use. Partial idiots only, please. Apple’s Time Machine, which I finally set up last night on a new 500 GB USB/eSATA drive, is totally easy. Mindless bliss.

The StorageMojo take
I cringe every time I hear the big companies proclaim a new focus on the SMB market. Usually it is some shrunk-down enterprise product with incentives for the channel.

But what doesn’t change is the thinking behind the product. The assumptions about the consumer - “like us, only dumber” - and how the problem they are trying to solve rarely get the kind of re-think that went into Time Machine.

But the logic is inescapable: the more pervasive IT becomes, the more the technology must adapt to people. Backupkey does that for low-end Windows backup. Time Machine does that for Mac OS X. Who, and what, is next?

Comments welcome, as always.

White House data loss

February 6th, 2008 by Robin Harris in Information Management, Security & Public Policy

What’s wrong with White House backup?
I published a review of David Gewirtz’s book Where Have All the Emails Gone? over on ZDnet.

A quick overview:

  • The White House may or may not have lost 5 million emails. They aren’t sure.
  • Gewirtz, an email expert, started investigating the White House email infrastructure and found:
    • The mail archiving process is unprofessional and unworkable.
    • The claimed loss of email in a Notes to Exchange migration is highly unlikely.
    • Over 100 million emails from the White House were sent through an insecure ISP in Chattanooga TN.
  • Existing law - the Hatch Act - mandates an external email system for partisan political activity, a ludicrous requirement in a 7×24 Washington.

The Hatch Act prescribes what partisan political activities are acceptable for federal employees. One of the prohibitions is the partisan use of government property. While a good idea in general, in the case of telecom the prohibition is senseless.

White House communications need to be secure. When we force White House employees to use multiple email, IM and computer systems it is inevitable that material received on the internal system will go out over the external system. A single secure system is easier to achieve.

This isn’t about George Bush
This is about maintaining records so the next administration can know how policy got developed and what committments were made. I’ll let others worry about if the loss of the emails was part of a deliberate attempt to cover up criminal activity.

Ironic, isn’t it?
American companies are spending billions for backup and archive software and hardware. But the White House, head of an executive branch with a $3 trillion budget, can’t manage its email backups despite a clear legal requirement to do so under the Presidential Records Act?

The StorageMojo take
Gewirtz recommends that a professional, non-partisan IT organization be detailed with the job of protecting and archiving all White House email communications. There are many groups with the ability and the motive to snoop White House email going out over the public Internet. That has to stop.

Making a single entity responsible, as the Secret Service is for Presidential safety, is the best way to ensure that vital public records are protected. It will also help remind White House officials that they are accountable to the people of the United States.

Comments welcome, as always. BTW, Congress also needs to clean up its data protection act. It is less urgent thant the White House, but just as important.

Update: As luck would have it the New York Times reports another Bush attack on America’s right to know. After passing Congress unanimously he’s gutting the latest freedom-of-information law in the budget. A new high in bipartisanship! Less than a year to go!

Disk-based archive vs disk-based storage

January 27th, 2008 by Robin Harris in Information Management, SAN, FC, Security & Public Policy

What’s the difference?
I came across a thoughtful essay on the “Top Ten Differences between Disk-based Archive & Disk-based Storage” in the MatrixStore blog. MatrixStore is a Mac cluster-based disk archive for Apple’s to-be-announced-RSN Final Cut Server.

MatrixStore is focused on one market segment - video content archiving - but their comments seem to be generally applicable. With 2008’s likely focus on the disk-based backup and archive market, it is worth starting the conversation now.

Key points
SANs aren’t designed for archiving.

Reason 1.

If you are archiving your data, it’s probably because you don’t want to lose it.

Raison d’etre for a disk based archive? To keep data - safe. For a SAN? Speed of delivery, QoS… You wouldn’t put 256 bit delivery checksums into a SAN; SANs cut corners on flushing to disk; SANs don’t build in search or audit-trails, or security; SANs can down completely because of single-points-of-failure in the hardware; a bad software update in a SAN and…. Don’t do it. With nursing care and attention they can run fine for years, but they are inherently tightly coupled, software version sensitive, high maintenance, error prone and hardware technology dependent… even if they are brilliant at fast storage and delivery of information…

A disk-based archive must be: loosely coupled and free from dependencies between hardware components on independent nodes (surely the greatest example of a loosely coupled solution is the world-wide-web; you have no fear on the www that a server going down, say, hosting an IBM site, is going to bring down another in Cupertino!); free from requiring constant latest updates to software/firmware; able to guarantee safe delivery and storage of data; and basically, able to safely, securely store and protect data for year upon year, without complications, manual intervention, spanners…

Archives must be engineered for easy adoption of new technology
In storage everything is cheaper next quarter. So why buy now?

Reason 2.

There’ll be bigger, better, cheaper, more efficient disks in 2009, and in 2010, and in 2011…

Will there be bigger, better, cheaper, more energy efficient storage devices coming out this year, and every year that follows? Yes, of course there will be.

In your SAN do you have to mirror between like-sized devices? What happens when one of those devices goes down in 2 years time? Do you end up throwing away the good device? In your SAN can you bolt on new technologies as they arrive; holographic disks that store 10TB a shot, or new fibre connectors?

In ZFS can you decommission a part of a storage pool, replacing it with new storage devices without significant bleeding edge techniques and without disrupting the rest? Ideally, it be great to bolt new technology into an archive, as and when they arrive, rolling out old technologies if they reach the point of diminishing returns; to be able to do that whilst always seeing a single archive storage cluster; and without a maintenance or data migration headache; or should I say; without risk. A disk based archive can achieve that, if selected carefully.

Vendor handcuffs
Long-term storage and proprietary products don’t mix. Along with upgradeability-in-place, this should be high on customer checklists.

Reason 3.

Vendor tie-in is more like Vendor hand-cuffs.

OK - this isn’t strictly about SAN vs Disk based archiving; but fact of the matter is that most SAN/any other disk-based storage solutions tie you in to a particular vendor, which is great when they are supplying the ‘best-in-class’ solution of the moment at time of purchase, but not quite so clever when you come to upgrade that solution a year down the line and they aren’t offering the best in class anymore.

The archive should be vendor independent otherwise, for many reasons, you’re just creating tomorrow’s headache with a solution from yesteryear.

Stability and security

Reason 5.

Viruses. Hackers.

Choice one:

“out of the box” configured with encryption, firewalled, data locked down, all access to data routed through PPK, all maintenance functionality requiring 256 bit passwords.

Choice two:

bolt on each of the above to your favourite SAN/filesystem. Wait five years as your conglomerate of software solutions evolve (along with the workforce) and cross fingers. A disk-based archive must be secure out-of-the-box.

There’s more, of course, and if you are interested please read the whole essay and respond here with your thoughts so every one can see and respond.

The StorageMojo take
EMC’s upcoming backup and archive cluster, code-named Hulk/Maui (HW/SW), will drive a lot of customers to think about this topic. Of course, EMC’s famously disciplined sales force will scrupulously limit Hulk/Maui sales to B&A applications for the first several months weeks days hours after its release. Once the customer utters the magic word “Isilon” Hulk/Maui will suddenly be ready for enterprise use.

[I hope someone has mentioned this to the Maui engineers: forget about summer vacation.]

Disk-based backup and archive is a fast growing application with very different requirements from SANs, arrays and fast NAS boxes. Data migrations will be increasingly infeasible. Management has to be stoner-on-the-night-shift-proof. And the data can’t be held hostage by proprietary standards.

Companies do discontinue products or go bankrupt, after all.

Comments welcome, of course. Anything else?

Microsoft RIFs old file formats - mea culpa

January 9th, 2008 by Robin Harris in Enterprise, Information Management, Off-Topic, SOHO/SMB

Darn! It looks like I screwed up. I’m sorry. While Microsoft did disable a number of early Word and other file formats, it wasn’t as long a list as I thought.

Textual analysis
I take a text-heavy approach to the content on StorageMojo. I prefer to go to original source material, unpack the meaning and the context, and then give my take on it.

That usually works pretty well. But in this case it didn’t.

What happened?
I read a lot of technical documents. Most never get written about. But the Microsoft knowledge base article was an exception. Since Microsoft was the topic it also got a lot of attention from me and others

There is a lot of emotion around Microsoft. They are a big, powerful, immensely profitable and sometimes clueless corporation whose desktop monopoly is a fact of life for computer users and IT professionals.

I try to stay with the facts as best I can determine them. In this case I got confused by the KB article. That other people made the same mistake is small comfort and no excuse (see a Microsoft take here).

Lessons learned
Other than resolving to analyze content from Microsoft more carefully, I’m not sure what else I would do differently. I didn’t question their motives for the change, only the way it was handled.

However, I do have some suggestions for Microsoft.

  • Reducing functionality on an already purchased product is a problem. You should notify users that you are limiting product functionality and give them the opportunity to decline the update. Even if it is for their own good.
  • Suggesting that editing the registry or using esoteric admin tools to solve the problem is OK for the tech savvy. But what about my 85 year old neighbor Dorothy, whose computer is a lifeline to her great-grandchildren? Her late husband was an engineer, so she has files that go back quite a few years. Microsoft, you are both an enterprise and a consumer company. Own it.
  • Communication is worth spending money on. Tech writers tell me that Microsoft doesn’t pay very well and, as a result, it doesn’t get very good tech writing. Maybe MCSEs are used to the style, but it sure didn’t work for this reasonably tech-savvy consumer.

The StorageMojo take
Tech is complicated and sometimes people - like I just did - get it wrong. Listening to criticism and learning from mistakes is how we all get better, even Microsoft. I hope you’ll keep coming back to StorageMojo and I’ll keep doing my level best to make it worth your time.

Comments welcome, as always.

Microsoft RIFs old file formats

January 4th, 2008 by Robin Harris in Enterprise, Information Management, SOHO/SMB

“They trusted us with their data? Will the fools never learn?”
The Service Pack 3 update to Office 2003 blocks over a dozen old file formats, effectively rendering the data inaccessible. Unless you are adept at the registry editing Microsoft cautions you against.

And they don’t warn you that you won’t be able to access the old files. Whee!

Check out my ZDnet article for the gory details. It isn’t pretty.

Update: While the SP3 does block opening a number of old file formats, the formats in question are older: all Word pre-6.0; PowerPoint pre-97; Excel 4.0 charts; dBASE II .dbf; Lotus and Quattro files; Corel Draw .cdr. See my mea culpa. End update.

Clueless droids?
How does the world’s largest software company make this kind of wrong-on-so-many-levels decision? Is there ANY adult supervision in Redmond?

The decision bespeaks a corporate culture that is painfully clueless about its customers. Gee, why would anyone want to access 5 year old Word documents?

Medical products marketing
Redmond’s blindness echoes that of Detroit’s for the last 50 years. “Safety doesn’t sell.” “Bigger is better.” “Good enough quality is good enough.” “Americans will never buy Japanese cars.”

Microsoft clearly doesn’t get the fact that their products are an intimate part of consumer’s lives, much as medicines are. When 8 bottles of Tylenol capsules were poisoned with cyanide in 1982, Johnson & Johnson quickly recalled 31 million bottles and spent on the order of $100 million dollars to restore consumer confidence in the Tylenol brand.

Would Microsoft spend a nickel to protect and reassure consumers? I give it a qualified “maybe.”

The StorageMojo take
In case anyone thought that archiving documents in proprietary formats was acceptable, this is your wake-up call. ASCII text and probably PDFs are OK. Everything else, including RTF - which Microsoft controls - is suspect.

With the growing focus on e-discovery, there should be a market for a high-speed “any format to .txt or .pdf” appliance. Producing unreadable softcopies won’t cut much ice in Federal courts.

Comments welcome, as always.

Magic in the OLPC

December 15th, 2007 by Robin Harris in Future Tech, Information Management

Most criticism of the One Laptop Per Child PC centers on the cost for what is a low-spec computer. As ASUS with its Eee machine is proving, a low-cost conventional laptop can be pretty powerful. But that misses the point. The OLPC is a fundamental rethinking of the computing experience.

[photo courtesy OLPC]

This child’s review of the OLPC is the first hint that suggests that Laptop.org may have gotten it right. As the 9 year old’s father writes:

So Rufus is using his laptop to write, paint, make music, explore the internet, and talk to children from other countries.

Because it looks rather like a simple plastic toy, I had thought it might suffer the same fate as the radio-controlled dinosaur or the roller-skates he got last Christmas - enjoyed for a day or two, then ignored.

Instead, it seems to provide enduring fascination.

I had returned from Nigeria not entirely convinced that the XO laptop was quite as wonderful an educational tool as its creators claimed. I felt that a lot of effort would be needed by hard-pressed teachers before it became more than just a distracting toy for the children to mess around with in class.

But Rufus has changed my mind.

With no help from his Dad, he has learned far more about computers than he knew a couple of weeks ago, and the XO appears to be a more creative tool than the games consoles which occupy rather too much of his time.

OLPC roots
Even though the OLPC is the only notebook whose industrial design chops rival those of Apple, its real innovation lies in software. Building on educational theorist Seymour Papert’s work - he invented the Logo language - the OLPC’s re-thinks the relationship between man and machine.

OLPC differences
The OLPC has activities instead of applications.

Activities are distinct from applications in their foci—collaboration and expression—and their implementation—journaling and iteration.

The collaboration comes in the form of built-in mesh networking that allows all local OLPCs to talk to each other.

By exploiting this connectivity, every activity has the potential to be a networked activity. We aspire that all activities take advantage of the mesh; any activity that is not mesh-aware should perhaps be rethought in light of connectivity. As an example, consider the web-browsing activity bundled with the laptop distribution. Normally one browses in isolation, perhaps on occasion sending a friend a favorite link. On the laptop, however, a link-sharing feature integrated into the browser activity transforms the solitary act of web-surfing into a group collaboration.

The connectivity seems to be powerful. Young Rufus is conversing with other kids who send him messages in Spanish from his home in England. How does that work?

Expression is the goal of the activities and collaboration. Rather than downloading music, the laptop is equipped to create music. The rethinking extends to the file system:

The objectification of the traditional file system speaks more directly to real-world metaphors: instead of a sound file, we have an actual sound; instead of a text file, a story. In order to support this concept, activity developers may define object types and associated icons to represent them.

Another aspect of the system’s UI is a focus on the Journal. This is more than written documentation of what a child has done.

The Journal combines entries explicitly created by the children with those that are implicitly created through participation in activities; developers must think carefully about how an activity integrates with the Journal more so than with a traditional file system that functions independently of an application. The activities, the objects, and the means of recording all tightly integrate to create a different kind of computer experience.

I’ll be interested to see how children who grow up with the OLPC think about computers. I fear we have a generation of children whose creativity has been permanently stunted by the desktop metaphor.

The StorageMojo take
Negroponte’s biggest mistake is that he did not market the OLPC in the industrialized world first. All the good intentions in the world won’t convince the 3rd world that something is good unless it has been embraced by the opinion leaders of the 1st world.

If I was Steve Jobs, I’d be taking a very close look at this machine to see what I could steal. Michael Dell could learn a few things too.

Comments welcome. OLPC has a beautiful web site.

Internet video’s performance/quality vise

December 5th, 2007 by Robin Harris in Future Tech, Information Management

Internet video is about where film was 100 years ago
I was talking to a company who will be announcing a video infrastructure solution when the CEO mentioned something he called the “video performance/quality vise.”

Here’s the problem: a video stream requires both capacity and bandwidth. Higher quality video requires more bits per second and more capacity. Bandwidth and capacity both cost money.

So as Internet video quality rises, the financial cost to provide the video rises too. An HD video stream is 4 Mbit/sec.

500,000 channels and somethin’ on
As cute as YouTube, et. al. are, they suck. Movies are small, picture and audio quality awful, and viewing options limited - like films 100 years ago.

Bandwidth limitations are part of the problem, at least here in the US. But those are being addressed, however slowly.

What happens when Internet video becomes competitive with broadcast TV in quality? Popularity will soar. As TiVo has shown, people love choice. And the Internet will have the most choice.

The price/performance/popularity vise
Digital Fountain’s raptor codes will change the Internet landscape for video. High quality video will drive be much more popular, just as long-form movies took film to the next level.

Bandwidth costs are dropping fast to pennies a GB. So infrastructure costs - especially storage - are critical to Internet video’s commercial success. The more popular it gets, the more storage will be needed. It is a huge opportunity.

The StorageMojo take
Massive data storage is still a very young technology. The ultimate cultural impact will be more profound than film because of the many-to-many nature of the Internet and the low barriers to entry. Should be fun!

Comments welcome, please. I don’t think the firm wanted me to mention their name, so I haven’t. If we get that cleared up I’ll update the post. Or maybe wait a while to write about them.

Update: Joe, thanks for catching the 4Mbit mistake I made. I corrected it above.

Google thinks I’m a virus

December 4th, 2007 by Robin Harris in Information Management

Google is sorry
I do a lot of research on the web using Google. Starting early last week I started getting these Google error messages:

The search term was “gutenberg” as in Gutenberg Bible.

This is happening 5-10 times a day. I enter the captcha and I’m on my way. But it is irritating.

What is going on?
The downside of “free” is non-existent customer service. I’ve written to Google’s comment address asking about this and, of course, no response.

I have seen reports that other people are experiencing this problem, so it isn’t just me. I’m running Mac OS 10.5.1 and as near as I can tell I am virus free. I even checked for the codec Trojan and it isn’t there.

There is a Windows XP machine on the home network, which has the virus protection our local Windows guru recommends. It is a business system and doesn’t get out much anyway.

The StorageMojo take
My sense is that the boffins in Mt. View tweaked something last week that started this. What makes a human-generated query look like a virus? Or a DoS attack? I’m stumped.

Comments and/or solutions welcome. Any thoughts?

Update: Ms. Mojo ran the virus/spyware/whatever software on her Windows machine and it located 17 suspicious files. Haven’t gotten the message since. Since Ms. Mojo is all business it gives me a new appreciation for just how vulnerable XP really is. Thanks to all who wrote in with suggestions.

Mac ZFS debate

October 15th, 2007 by Robin Harris in Architecture, Future Tech, Information Management

I’ve been a fan of ZFS since I researched it over a year ago. I’ve also been happy with the progress ZFS is making on OS X.

So it was a bit of surprise when I saw (thanks Wes) that MacJournals, a developers web site, was all sideways about it.

A good conversation
Fortunately a former Mac file system developer, Drew Thaler, responded with Don’t be a ZFS Hater.

Another respected Mac developer, Michael Tsai, also responded with a thoughtful post.

The StorageMojo take
I follow the ZFS discussion on OpenSolaris, so I understand that the ZFS implementation has a ways to go. From a marketing perspective, ZFS or something like it is required if consumers are going to use computers as media centers for purchased content. Seeing a couple of thousand dollars worth of music, TV, movies and videos go poof! is a sure way to get tossed out of America’s living rooms.

I believe Apple developers have the Mojo to make ZFS use transparent for Mac customers. They certainly have the help of the Sun team and it is in the interest of both companies to make this work. Plus, don’t forget Apple’s “touchless” file system upgrade patent.

But MacJournals correctly points out that UFS was once thought - though without the level of support ZFS has enjoyed to date - to be the successor to HFS+ and that a similar fate may befall ZFS. While that is certainly a possibility - never say never around Steve Jobs - there are good business and marketing reasons for going forward with ZFS, regardless of what techies think. Apple will go forward with ZFS and make it the standard OS X file system within 2 years.

Comments welcome, as always.
Update: I’ve started editing comments on this post to keep them on topic and away from personalities. I regret not doing so sooner. Nonetheless the discussion is informative and if file systems interest you, well worth perusing.

Sun’s adds Lustre to supercomputing

September 26th, 2007 by Robin Harris in Clusters, Information Management

What about Sun’s acquisition of Cluster File Systems, Inc.?
Yawn. CFSI was going out of business. Sun bought the assets, not the company.

Good for CFSI employees
They get a paycheck from a solvent company. They may even get some sensible marketing. Hey, it could happen.

What is Lustre?
Arguably the highest-end parallel file system. At the Seattle Conference on Scalability, founder Peter Braam spoke about current 25,000 node Lustre clusters and plans to 10x that number in the next 5 years.
Update: It appears the Lustre.org and the Lustreusers.org sites are suspended. Hm-m-m? Update II: They are back up.

Cool, huh?

So why aren’t they rich?
CFSI was a tech playpen, not a company. Like Formula 1 racing. Instead of Ferrari, CFSI had the national labs backing them. Great stuff, except nobody else has the problems the national labs have, so it limits the market.

Lustre will be facing some serious competition from pNFS once it gets baked into Linux and other operating systems. The fast-growing commercial HPC market will eat pNFS clusters up. Lustre isn’t part of that.

The StorageMojo take
Sun bought a hook into a customer base that, when budgets are good, can be very profitable. They also bought a technical team that is very knowledgeable about fabric interconnects, which in the shift to cluster storage and grids will be a very good thing for Sun.

Comments welcome, as always. OK, Lustre proponents, tell me where I’m wrong.

CERN’s data corruption research

September 19th, 2007 by Robin Harris in Disk, Information Management

I was surprised at how many ZDnet readers reacted with disbelief to my recent Storage Bits series on data corruption (see How data gets lost, 50 ways to lose your data and How Microsoft puts your data at risk), claiming it had never happened to them.

Then I thought about it
What does data corruption look like to users? Does a window pop up with big red letters blaring “DATA CORRUPTION!!!” Nope, we get these “File not found” and other notices that could be - but who knows? - related to data corruption. Something goes badly wrong and you have to reinstall an application or the OS. But really, how prevalent is data corruption?

CERN does some research
That’s why I was delighted to see a new paper from CERN. Now, finally, some statistics are in, reported in a recent paper titled Data Integrity by Bernd Panzer-Steindel of the CERN IT group.

Petabytes of on-disk data analyzed
At CERN, the world’s largest particle physics lab, several researchers have analyzed the creation and propagation of silent data corruption. CERN’s huge collider - built beneath Switzerland and France - will generate 15 thousand terabytes of data next year.

The experiments at CERN - high energy “shots” that create many terabytes of data in a few seconds - then require months of careful statistical analysis to find traces of rare and short-lived particles. Errors in the data could invalidate the results, so CERN scientists and engineers did a systematic analysis to find silent data corruption events.

The program
The analysis looked at data corruption at 3 levels:

  • Disk errors.The wrote a special 2 GB file to more than 3,000 nodes every 2 hours and read it back checking for errors for 5 weeks. They found 500 errors on 100 nodes.
    • Single bit errors. 10% of disk errors.
    • Sector (512 bytes) sized errors. 10% of disk errors.
    • 64 KB regions. 80% of disk errors. This one turned out to be a bug in WD disk firmware interacting with 3Ware controller cards which CERN fixed by updating the firmware in 3,000 drives.
  • RAID errors. They ran the verify command on 492 RAID systems each week for 4 weeks. The disks are spec’d at a Bit Error Rate of 10^14 read/written. The good news is that the observed BER was only about a 3rd of the spec’d rate. The bad news is that in reading/writing 2.4 petabytes of data there were some 300 errors.
  • Memory errors. Good news: only 3 double-bit errors in 3 months on 1300 nodes. Bad news: according to the spec there shouldn’t have been any. Only double bit errors can’t be corrected.

All of these errors will corrupt user data. When they checked 8.7 TB of user data for corruption - 33,700 files - they found 22 corrupted files, or 1 in every 1500 files.

The bottom line
CERN found an overall byte error rate of 3 * 10^7, a rate considerably higher than numbers like 10^14 or 10^12 spec’d for components would suggest. This isn’t sinister.

It’s the BER of each link in the chain from CPU to disk and back again plus the fact that for some traffic, such as transferring a byte from the network to a disk, requires 6 memory r/w operations. That really pumps up the data volume and with it the likelihood of encountering an error.

The cost of accuracy
Accuracy isn’t free. The CERN paper concludes that taking measures to improve accuracy

. . . will lead to a doubling of the original required IO performance on the disk servers and . . . an increase of the available CPU capacity on the disk servers (50% ?!). This will of course have an influence on the costing and sizing of the CERN computing facility.

The Storage Bits take
My system has 1 TB of data on it, so if the CERN numbers hold true for me I have 3 corrupt files. Not a big deal for most people today. But if the industry doesn’t fix silent data corruption the problem will get worse. In “Rules of thumb in data engineering” the late Jim Gray posited that everything on disk today will be in main memory in 10 years.

If that empirical relationship holds, my PC in 2017 will have a 1 TB main memory and a 200 TB disk store. And about 500 corrupt files. At that point everyone will see data corruption and the vendors will have to do something.

So why not start fixing the problem now?

Comments welcome, of course.
Update: Peter Kelemen, one of the CERN researchers, kindly wrote in and pointed out that the it is the disks that are rated at 10^14, not the RAID card. There are no specs for the RAID cards. I’ve corrected it above.

Our lackluster commodity file systems

August 9th, 2007 by Robin Harris in Enterprise, Information Management

I’ve been ranting about data loss on Storage Bits. Data loss makes me irate because I see regular folks who know nothing about computers struggling with the fallout and it is so unnecessary.

The stimulus was a fine PhD thesis IRON File Systems (pdf) by Vijayan Prabhakaran, now of Microsoft Labs, exploring how commodity file systems corrupt data by injecting errors into ext3, ReiserFS, JFS, XFS and NTFS and then recording their responses.

Dr. Prabhakaran built an error-injection framework that enabled him to control what kind of errors the file system would see so he could document how the FS handled them. These errors include:

  • Failure type: read or write? If read: latent sector fault or block corruption? Does the machine crash before or after certain block failures?
  • Block type: directory block; super block? Specific inode or block numbers could be specified as well.
  • Transient or permanent fault?

Sure enough, he found a lot of bugs in the file systems, even though, due to its proprietary nature, he couldn’t get as deep into NTFS as the others.

From our analysis results, we find that the technology used by high-end systems (e.g., checksumming, disk scrubbing, and so on) has not filtered down to the realm of commodity file systems. Across all platforms, we find ad hoc failure handling and a great deal of illogical inconsistency in failure policy, often due to the diffusion of failure handling code through the kernel; such inconsistency leads to substantially different detection and recovery strategies under similar fault scenarios, resulting in unpredictable and often undesirable fault-handling strategies.

And

We also discover that most systems implement portions of their failure policy incorrectly; the presence of bugs in the implementations demonstrates the difficulty and complexity of correctly handling certain classes of disk failure. We observe little tolerance to transient failures; most file systems assume a single temporarily-inaccessible block indicates a fatal whole-disk failure. We show that none of the file systems can recover from partial disk failures, due to a lack of in-disk redundancy.

This is what the EMC Centera is running on. Feeling better?

As hardware gets more reliable, software is a bigger problem
Software is always buggy, and with Moore’s Law, we have more software at more levels of the storage stack. File systems need to be the enforcers of data integrity in the storage stack since only file systems know where every block is and what every block is supposed to have in it.

The marketing problem
From my small-town perch, working with computer naifs, I know that most folks have absolutely no idea if a problem is caused by a lame file system or not. So how do you make people care?

I don’t think you can. People don’t care about whether their car has a timing belt or a timing chain, until they realize 2 things: first, it costs money to replace a belt and; second, timing chains don’t require replacement. Most folks will never put the two together.

All the vendor can do is add up all the features, like timing chains, electronic ignitions and platinum-tipped spark plugs and offer “no tune-ups for 100,000 miles.” People understand that, especially if you remember when a tune-up every 3,000 miles was common.

Sell the benefit, not the technology.

The StorageMojo take
One of the things I love about my other blog is that it exposes me to something closer to consumer thinking. On the one hand there are folks who understand some things about the technology - such as “clean power is good” - and don’t get, say, why a file system should be concerned with disk drive problems. It is partly education and partly cognitive.

But I think I also see something else: an emotional need for storage confidence; an unwillingness to confront the idea that storage systems fail. At one level I get it. Paranoia is time-consuming and not very productive.

But unlike CPU’s and networks, storage is all about persistence. For all its faults the industry cares deeply about that. How do we tap into the consumer’s concern for persistence in a way that spurs action rather than denial? I’m hoping Apple is coming up with some good ideas as they prepare to roll out Time Machine and ZFS.

Comments welcome, as always. I didn’t try to evaluate Vijayan’s architectural solution as that is beyond my competence. Somebody want to take a look at it and give us the pros and cons?

All (almost) Seattle Conference on Scalability videos now online

July 10th, 2007 by Robin Harris in Architecture, Clusters, Future Tech, Information Management

An alert reader sent this in as a comment this morning. Thank you!

As of Jul 10, 1:00am PDT, 10 of the talks have been published (including the Lustre and Verisign ones). Searching for “seattle conference on scalability” on google video seems to return most, but not all of them. Weird. Anyway here is a complete list of links:

Building a Scalable Resource Mgmt System for Grid Computing (Khalid Ahmed, Platform Computing)

Lustre File System (Peter Braam, Cluster File Systems)

Abstractions for Handling Large Datasets (Jeff Dean, Google)

Scalable Test Selection Using Source Code Deltas (Ryan Gerard, Symantec Corporation)

Lessons In Building Scalable Systems (Reza Behforooz, Google)

Using MapReduce on Large Geographic Datasets (Barry Brumitt, Google)

YouTube Scalability (Cuong Do Cuong, Youtube)

Scaling Google for Every User (Marissa Mayer, Google)

SCTPs Reliability and Fault Tolerance (Brad Penoff, Mike Tsai, Alan Wagner, UBC)

VeriSign’s Global DNS Infrastructure (Scott Courtney, Pat Quaid, VeriSign)

I know how I’ll be spending an hour today.
I’m going to watch the YouTube talk, which was on at the same time as Amazon.

Still waiting for the Amazon talk. Hope it arrives soon. Even if it doesn’t you can read about it below.

Update: Dan Creswell reminded me that Amazon has a paper coming out in the first half of August. So maybe the video is waiting on that. I hope to review the paper once it ships.

Seattle Conference on Scalability videos

July 5th, 2007 by Robin Harris in Architecture, Clusters, Future Tech, Information Management

The wily Googlers fooled me
I thought the videos were supposed to be on YouTube - the video service they bought for $1.6 billion a few months ago.

But NO!
They’re on Google Video. I just figured that out.

The good news: better quality on Google Video.

The bad news: I don’t see either the YouTube or the Amazon presentations up, so they probably won’t be. They were on at the same time and I choose the Amazon presentation. Who would have thought that a Google subsidiary wouldn’t give permission to publish their talk at a Google sponsored conference. It isn’t on YouTube either.

Weird. Update: The redoubtable Dan Creswell who also blogged about the Amazon talk, says that they are just a bit slow getting them up. Marissa Meyer’s afternoon keynote is now up. So let’s wait and see. Patience, grasshopper.

Anyone who attended the YouTube session want to trade notes?

Here are the links:
This links to Barry Brummit’s entertaining and informative presentation on using MapReduce on large geographic data sets.

This is Jeff Dean’s excellent talk about abstractions for handling large data sets, but don’t let the title fool you, it covers a lot of ground on Google infrastructure.

And this is Reza Behforooz’s talk about integrating GoogleTalk with two large existing services.

There is a fourth talk by the founder of Platform Computing on Building a Scalable Resource Mgmt System for Grid Computing . I attended the first few minutes until my ADD kicked in. If you watch it send me anything interesting you hear.

Comments welcome.

Seattle Conference on Scalability, Pt. I

June 26th, 2007 by Robin Harris in Architecture, Clusters, Future Tech, Information Management

I survived Seattle’s “summer” weather
And the Google-sponsored Seattle Conference on Scalability. It was like spending 10 hours trying to drink from a fire hose. Great stuff.

I took notes on four of the sessions I attended. I would have taken more, but since Apple hasn’t shipped a notebook with a ten hour battery life I had to stop to recharge. It’s been so long since I wrote anything by hand that I can’t even read my handwriting any more.

This is a highly idiosyncratic account of the conference: I’m just talking about what i found interesting. Fortunately Google video’d the event and will put it up on YouTube. When I get the URL I’ll update this post.

Jeff Dean, senior architect at Google
Jeff is the architect of virtually every large scale system at Google. He kicked off the event with a key note on scalability at Google. As I suspected, Google is looking for new ideas on scaling another 100x over the next few years. That would mean clusters of 500,000 to over 800,000 nodes - or at least cores.

Jeff noted that BigTable, Google’s storage system that runs on top of GFS has about 500 cells, the largest of which is up to 3000 terabytes of data.

The benefits of massive scale
Jeff talked about the impact of scale on machine translation, which is a major effort inside Google. The goal is to enable a someone to ask a question in Urdu and to get access to relevant documents no matter what language they are written in through machine translation of their query into many languages with machine translation back into Urdu.

The translation model is probabilistic rather than dictionary-based, so the more examples the system has to work with the better the translation. The MT team has found that translation accuracy increases 0.5% with each doubling of the training content. That means a *lot* of storage.

And a lot of I/O: over a million lookups per second. A lot of that is cached and it is still a lot of data.

Today’s Google rack
Jeff showed a picture of the current Google datacenter rack, which appeared to consist of 20 mobo’s, each with two dual-core Intel processors for a total of 80 cores per rack. There is a 4U gap in the middle of the rack, which I assume has the DC power distribution unit. It looked very neat and tidy, unlike the pictures of Google’s early racks.

MapReduce
I’ve meant to write about MapReduce, but I couldn’t quite get a handle on it. Jeff spent a fair amount time describing the advantages of MarReduce, so now I have that handle.

MapReduce is essentially a programming language that abstracts the messy details of programming a large cluster. The Map piece extracts the data that one wants to work on into a essentially a big spreadsheet or table, while the Reduce piece massages the data into the final form. With this tool a program of 50 lines can put thousands of compute nodes to work.

Google’s scalability challenges
Google is pretty happy with their tools, but it is American to want something better. And what they’d like is a single global namespace so that data can be accessed from anywhere. So the scalability number I offered at the beginning of this post may be way low. Instead of scaling a single cluster 100x, Google would actually like to scale and interconnect their entire cluster population - which I estimate is now over 4 million cores - 100x.

The StorageMojo take
Wow! More tomorrow as I continue the report on the conference.

Comments welcome, as always.



Next Article »
StorageMojo RSS Feed May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006