StorageMojo




Robin Harris    


SOHO backup that works: why is it so hard?

March 19th, 2008 by Robin Harris in Backup, Information Management

Moving to a small town in northern Arizona from Silicon Valley has enriched my perspective on many things, including how the industry develops products. The consensus is that if we take datacenter technology and put in enough defaults it will be “simple” enough for consumers. Wrong.

Memo to developers: it is ALL consumer IT
The consumerization of IT is usually means the adoption by IT of high volume consumer technologies. The PCI bus, Microsoft Windows, USB, x86, SATA disks and Wi-Fi all started in the consumer space and displaced more sophisticated and expensive IT.

But consumerization also means taking tech first developed for IT and making it easy enough for consumers. Ethernet LANs, symmetric multi-processing, external disk systems (well, really only Drobo) and what we used to call “office automation” software are now usable by non-geeks.

Pro vs amateur
Amateurs like GUIs. Pro’s like CLIs. Why do we have both on “enterprise” products? Because we are all amateurs - at something.

The third shift guys are all amateurs. They may want to be “professional” but they aren’t now.

Backup: the highest failure rate in IT?
Who knows how good the numbers are. A 40% enterprise backup failure rate is frequently bandied about. Whatever the “real” number is, it isn’t good enough.

If “professionals” with “industrial strength” backup hardware and software can only achieve a 60% completion rate - a failing grade anywhere - why does it surprise us that only a tiny percentage of small office/home office people backup regularly?

And further, why do we assume that SOHOs will never backup? “Americans will never wear seatbelts.” “People will never recycle.” “SOHOs will never backup.”

Yet the record is clear. If you take an education and ease of use approach, people will change their behavior. They will wear seatbelts. They will recycle. They will even learn to deal with PITA child seats. And they will backup.

But not if it is presented as a “junior” enterprise backup. Make it easy and affordable. Mostly easy. And people will do it.

A couple of backup products that work
On ZDnet I reviewed a Windows backup product that I could recommend to any small business here in the red rock-strewn desert, Backupkey. Plug it in, hit “enter” twice, and all your valuable Windows data gets copied.

Did this simple, useful product come from Boston? Silicon Valley? Redmond? Denver? Nope. Charleston, South Carolina? Bingo!

I suspect Backupkey got built there because the developer actually knows small business people. Knows their frustration and their intolerance for stuff that doesn’t work as they think it should.

Most Windows backup software is simply dumbed-down “real” backup. Backup sets. Incrementals. Images. Bootable. Whatever. But non-IT folks don’t know those words or concepts. Why can’t it just work?

On a Mac both Carbon Copy Cloner and SuperDuper work great and are almost easy enough for complete idiots to use. Partial idiots only, please. Apple’s Time Machine, which I finally set up last night on a new 500 GB USB/eSATA drive, is totally easy. Mindless bliss.

The StorageMojo take
I cringe every time I hear the big companies proclaim a new focus on the SMB market. Usually it is some shrunk-down enterprise product with incentives for the channel.

But what doesn’t change is the thinking behind the product. The assumptions about the consumer - “like us, only dumber” - and how the problem they are trying to solve rarely get the kind of re-think that went into Time Machine.

But the logic is inescapable: the more pervasive IT becomes, the more the technology must adapt to people. Backupkey does that for low-end Windows backup. Time Machine does that for Mac OS X. Who, and what, is next?

Comments welcome, as always.

StorageMojo’s favorite FAST 08 paper

March 14th, 2008 by Robin Harris in Architecture, Backup, Disk

It didn’t win Best Paper honors at FAST 08 - IIRC it was An Analysis of Latent Sector Errors in Disk Drives (the link is to the StorageMojo review of that excellent paper last month) but I really like the thinking behind Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage.

Written by Mark W. Storer, Kevin M. Greenan, Ethan L. Miller (UC Santa Cruz) and Kaladhar Voruganti (NetApp) the paper discusses a prototype that

. . . is a distributed network of intelligent, disk-based, storage appliances that stores data reliably and energy-efficiently. While existing MAID systems keep disks idle to save energy, Pergamum adds NVRAM at each node to store data signa- tures, metadata, and other small items, allowing deferred writes, metadata requests and inter-disk data verification to be performed while the disk is powered off.

They call the appliances tomes.

Tape: where data goes to die
One of tape’s big advantages is that it uses no power at rest. Any disk-based tape replacement will have to come as close to the same ideal.

The tomes use a single hard drive, an ARM-based processor board with NIC and NVRAM. Total power use - when powered up - about 11.5 watts, less than 15k FC drive. With tighter code, a slower drive and more integration, I’d bet they could cut that in half.

The single disk drive means that tomes must be used in groups to enable distributed RAID techniques and exchange of algebraic signatures to ensure inter-disk recovery. The paper goes into those techniques in detail.

NVRAM

The purpose of the NVRAM is to provide low-power, persistent storage; operations such as metadata searches and signature requests do not require the unit’s drive to be spun up.

. . . the NVRAM primarily holds metadata such as algebraic signatures and index information, flash writes are relatively rare; flash writes coincide with disk writes.

The Ethernet interconnect is important - by using cheap unmanaged switches for fan out, high aggregate bandwidth, exceeding that of current tape libraries, is easily and inexpensively achieved. The use of power-over-Ethernet would further reduce costs, especially if the system used 4200 RPM drives.

The StorageMojo take
Most of the disk vs tape discussions look at the disk device vs tape cartridge cost issue - and they aren’t that different even today. But the tape library market is a $4-5 billion market. A disk-based alternative to slow tape libraries could take a big chunk of that.

Further, this design could be integrated into a single disk controller board, creating a disk with a single Ethernet port and incredible packaging and manufacturing economies.

If Seagate were smart they’d jump on this. This is a major opportunity to drive another significant consumer of disk drive units - without encroaching on existing OEM customer businesses. That doesn’t happen very often.

Comments welcome, as always. Pergamum was an ancient Greek city known for its sizable library, second only to the library of Alexandria.

FastMail fights data corruption

December 19th, 2007 by Robin Harris in Backup, Enterprise

Email is the largest personal database for most people. Easy to search, my gigabyte of email contains contacts, documents, notes and the record of many relationships. I back it up both locally and remotely.

But how do I know it isn’t corrupted?

FastMail’s email data protection
FastMail is, I think [guys, how about a "what is FastMail" paragraph?], an open source email system an email hosting provider. In a recent blog post someone there talks about how FastMail protects user data from data corruption:

. . . we ensure that as soon as an email is delivered to a mailbox, a SHA-1 checksum of that email is generated and stored in the email index.

When the email is replicated, the email content and the checksum are sent separately. We then generate the checksum on the replicated email content and ensure that it matches the original checksum to see that the email was replicated correctly.

We also repeat this procedure when the email is backed up, ensuring that the backup of the email is correct.

We also run a regular check process that takes blocks of emails and recomputes their checksum to see it matches what is in the index. If there’s any issues, we’re alerted and can find which of the master, replica or backup email are correct and can correct the problem.

What do other email systems do?
To my untutored eye, this seems like comprehensive protection against data corruption.

Two questions:

  • Is it?
  • What do other email systems do?

Gmail presumably relies upon the triple redundancy of GFS to ensure data integrity. What do Exchange and Sendmail do? Are any of them demonstrably better?

The StorageMojo take
Email is looming ever-larger as a personal information management system. As volume and attachment size continue to grow, multi-gigabyte mailboxes will become the norm, if they aren’t already.

Are the data protection measures in email systems up to the task?

Update: Alert reader - more alert than me, anyway - Nathan, found the FastMail “About” page. They are another hosted email provider. I updated the post to reflect that. Thanks, Nathan!

Comments invited, of course.

Aptare backup management & capacity planning

December 18th, 2007 by Robin Harris in Backup, Enterprise

I’m a judge for the Codie awards this year, so I’m getting to see some storage software that I might not otherwise. Today, Rick Clark, CEO of Aptare, demo’d their Backup Manager and Capacity Manager.

I was impressed.

Oh no, has Robin gone soft?
Maybe. But Aptare has 3 important features:

  • Agentless architecture. They go direct to HDS and EMC arrays to get the info they need. More arrays coming.
  • Deep reporting. Application databases, LUNs, array, allotted and consumed.
  • Flexible GUI. Drag and drop the data you need to create a custom dashboard, like a web 2.0 mashup.

A custom Aptare GUI

Managed from a browser
Capacity and backup management use Aptare’s StorageConsole Platform. The company plans further modules. Next up, a replication manager.

The StorageMojo take
This is the first backup manager and capacity manager I’ve seen that actually feels easy enough for non-storage geeks to use. That is important because as capacity continues to explode, some storage management tasks need to get pushed out to application owners.

If you are in the market for either backup or capacity management or aren’t fully satisfied with your existing tools, you owe it to yourself to get the Aptare demo.

Comments welcome, as always.

EMC’s Maui and everybody else

December 12th, 2007 by Robin Harris in Backup, Clusters, Enterprise, Future Tech

For some reason I volunteered to write something about vendors after the Wikibon con call today. That follows.

Vendors: responding to EMC’s cluster storage initiative

Context:
EMC’s support of cluster storage for archiving and backup will legitimize the technology. Vendors with competitive products have a window of opportunity to position themselves as a superior alternative. Make no mistake: EMC plans to own this market and will commit significant resources to the effort.
EMC’s market entry will be hobbled by several problems that competitors can exploit.

  • Immature software: limitations, bugs and the eval cycle that implies
  • Maintaining a bright line positioning between Hulk/Maui and Symms
  • 60% gross margin requirement

EMC will be NDA’ing strategic customers starting mid-January to build major sales to reference at announcement. Smart customers will be calling other vendors, including the smaller, innovative ones, for perspective. Luck favors the prepared.

Strategy:
IBM, Hitachi, HP, NetApp IBM Global Services should be open to reselling/integrating suitable substitutes. There are efforts within IBM’s storage group to create a scalable, commodity storage infrastructure, but the chasm between IBM’s brilliant technologists and IBM marketing makes success problematic.

Hitachi doesn’t seem to be doing anything in this area. They will be looking at an acquisition and will take their time.

HP’s Polyserve acquisition may convince them that they have the cluster thing under control, but Polyserve isn’t competitive with EMC’s initiative. HP has a deep well of technology expertise from the DEC cluster products. Expect a cluster acquisition in 2008.

NetApp is vulnerable. ONTAP GX has missed the cluster market and their controller-based architecture has all the cost disadvantages of traditional arrays without the flexibility of clustering. Putting ONTAP 7G on commodity hardware bricks with software “mortar” - as Google does with GFS - would preserve their significant advantages with WAFL at a lower $/GB.

New competitors
Now is the time to get serious about what your product really does and what its appeal is to customers. Focus is critical to building a defensible position that can be used to win F500 business in areas where EMC is less competitive. There is also an opportunity to shift the terms of the customer debate. This market is still fluid and customers don’t have a clear mental map of the terrain. Smart, focused marketing can take advantage of that.

Action Item:
Small/new vendors: if you want to be acquired, now is the time to be shopping yourself to the big guys. If you want to build a big business, get your marketing focused on verticals and business justification.

Big vendors: start shopping now. EMC wants your scalp so you’ll want to be well-armed.

All: there is a lot more to know about Hulk/Maui. A focused competitive analysis effort will pay dividends.

Update: The audio is available here. If you are wondering if I mentioned your company, I probably didn’t.

Comments welcome, of course.

The Hulk goes Hawaiian

November 15th, 2007 by Robin Harris in Backup, Clusters, Enterprise

Joe Tucci let slip, on purpose, that EMC will be coming out with a cluster storage system for backup and archive purposes at a press event this week.

Hulk is the code name for the hardware. Maui is the software. Expect to see large green guys in grass skirts at the fall SNW. The subtext: think Big Green for EMC storage clusters. Hey, that sales force doesn’t come cheap!

Kudos to EMC
Storage clusters are the coming thing for the 85-90% of all data that is unstructured. EMC would like to sell Syms for that data, but they’ve twigged to the fact that that won’t happen. Smart.

Surf’s up!
StorageMojo’s most regular readers are the bright folks in EMC’s competitive analysis group. Some of them are fans, but it isn’t a good idea to admit it. How much fun is it to read analysts repeating what you’ve paid them to say?

Pipeline
EMC’s challenge is to move to cluster storage while maintaining the margins that Wall Street loves. IBM is somnolent, HP complacent and Sun, well, Sun is a wild card.

The real wild card is the yet to be announced, VC-backed, cruise-missile into the heart of the whole bloated enterprise storage market. Someone who sees the advantage of turning a $30 billion array market into a $3 billion storage cluster market, as long as they get 80% of it.

The StorageMojo take
EMC has no choice, but my hat is off to Mark Lewis for getting Joe Tucci to recognize that fact. The desiccated corpses of once-great Boston minicomputer companies should be an object lesson. Surf or drown, guys.

Now if they can just lose Egan’s frat-boy aggression and chip-on-the-shoulder attitude, they may find the admiration they crave.

Comments welcome.

EMC buys leader in telekinetic security

September 25th, 2007 by Robin Harris in Backup, Clusters, Enterprise

Time to get serious, guys
TechCrunch and VNUnet are reporting that EMC is buying online backup provider Mozy for $76 million. Neither EMC or Mozy has issued any confirmation, so who knows if it is real. But let’s assume it is.

Puzzled?
This is a good fit for EMC on several levels.

  • SMB branding. For reasons I still don’t get, Dell has happily handed EMC tens of millions of dollars worth of SMB branding that EMC could never have bought on its own. Mozy gives EMC a nice brand extension for servicing that market.
  • F1000 notebook backup. Mozy’s huge GE deal is just the first of many once EMC’s sales force gets its marching orders.
  • Margin enhancement. Mozy charges a flat $50/yr for unlimited personal backup. Storage prices fall every year. Margins grow automatically.
  • Grid storage. Mozy wasn’t using Symm’s and I doubt they’ll start now. EMC is buying expertise in the new storage paradigm.

The StorageMojo take
IBM, HP, NetApp: listen up. EMC is definitely moving into storage clusters as commercial products. If you guys don’t want to see a three-peat of the mid-90’s, when EMC rolled IBM big time, it is time to get serious.

EMC’s storage cluster strategy is hardly bulletproof. Yet playing catchup isn’t where you want to be when this party starts. If you liked EMC in the 90’s you’ll love them in the ‘teens!

Comments welcome, of course. BTW, Mozy’s beta for Mac is still buggy. The Windows client is much better. And yes, Mozy did - once - claim they could protect your data from “. . . potential telekinetic security breaches.”

When *haven’t* we had home storage?

July 20th, 2007 by Robin Harris in Backup, Future Tech, Off-Topic, SOHO/SMB

In a recent post, A Terabyte in the home? Hitachi’s CTO, the redoubtable Hu Yoshida writes

I don’t believe there will be a market for home storage units. I believe internet service providers will provide the storage and data management for our personal data. They will provide it as a service which we will be able to access whenever and where ever we want. Instead of trusting my data to a low cost home storage unit, I believe an ISP will be able to store it more reliably and cost effectively on a large enterprise class storage system which they can leverage across many thousands of users.

This world view is so at odds with the reality I see that it is hard to know where to start. But I’ll try.

Home storage already has a long history
People have always stored images, and later, text, in their homes. From the cave paintings of Lascaux to the wood block prints of Hokusai, people have always enjoyed having images of personal meaning in their homes. Television brought moving images to the home for the first time and later VHS and now DVD allow people to create libraries of moving images.

With the rise of literacy the home library became possible. Among those who could afford it the library became not only a storage area but a shelter from the cares of the world. The 21st century analogue is the home theater.

With the rise of Blu-Ray, it won’t be hard for an average family to acquire 2-3 TB of favorite programming. Especially families with children. People have always collected content and I don’t think that fundamental urge is going to abate any time soon. Today’s content just happens to be in a digital format.

Bandwidth and storage aren’t as fungible as Hu assumes
Home bandwidth is too low to support the kind of easy access to large files the home user wants: home video, graphics, games, movies. More importantly, many people, perhaps most, are visual thinkers. They need to see things to recall them. Thus collecting content in the home serves two purposes: high bandwidth and stimulating memory.

Now the album art images that iTunes displays are a pretty good substitute, especially if you are old enough to remember the LP version. Yet storing even the images locally has many advantages over placing them on the network.

No one is storing such content on a “large enterprise class storage system”
I guess Hu isn’t a regular StorageMojo reader or he’d know this already. Storage clusters and low(er) cost modular systems own the ISP storage business. No way are Tagmas or Symms ever going to compete for this business.

With all due respect, Hu needs a reality check on this part of the vision. I know some of the folks at EMC are ahead of him, and by extension Hitachi, on this point.

The StorageMojo take
I agree with Hu that all other things being equal, people would rather not have a storage array in their house. The point is they never will. Consumer-grade storage systems that work a lot better than today’s storage arrays will arrive, such as Drobo.

Those 1 TB disks will also be popular, combined with off-site backup for the truly paranoid, as people embrace the concept of Lots Of Copies Keep Stuff Safe. People like having stuff around where they can see and touch it. Home data storage is no different.

Comments welcome, as always. I discovered that I’ve been taking a break from blogging lately without planning to. I’ve discovered some new topics, so stay tuned.

Home RAID vs backup?

May 30th, 2007 by Robin Harris in Backup, SOHO/SMB

I got into it today on ZDnet with one of the other bloggers, George Ou, who published Why dumb-downed no-RAID storage is bad for consumers. As I believe that RAID is an idea whose time is coming to a close, I responded with Why home RAID won’t fly.

So far, ZDnet readers seem more persuaded by George
I’m in my trailer, sulking. How could they?

The exchange has sharpened my thinking, as George and some other folks came back with some good comments, and a couple of the more perceptive - obviously - folks came to my defense.

While I like the Drobo storage robot concept and Geoff Barrall personally, I’ll be very interested to see what kind of market they develop. Which is marketing-speak for “I’m dubious.”

Why?
The secular trend in computers is that technologies scale up from consumers - not scale down from the enterprise. But so what? The real question is why.

Because consumer stuff is cheap and enterprise stuff is expensive. Because one is high volume and the other low volume. Because volume enables low-cost experimentation and improvement. Because building cheap stuff usually forces people to focus on what really works for customers who won’t open a manual.

Home RAID? I don’t think so
Why not? Let me count the ways:

  1. Complexity: RAID fails ugly. Pick the wrong drive to pull or copy and your protected data is no more. And due to the redundancy, RAID systems have failures much more often than a single disk does.
  2. Completeness: while RAID solves some problems, it isn’t a substitute for a backup. Getting customers to understand that is hard. Not all the ZDnet readers get it.
  3. Cost: HW RAID means a controller, a chassis. A lot of money before you buy the first disk. SW RAID is cheaper - with Intel’s ICH8 chip almost free - and consumers still need to understand why they are buying a second drive and not getting more capacity.

The vast fetid swamp of consumer ignorance
In my small town I often help people with computer problems. Often these are small business people who’ve been using computers for years. What I’ve found is that these people don’t have the faintest idea how their computer works or how the components work together. To most people computers are magick.

Case in point: a professional photographer lives across the street. Two Macs, scanner, several photo quality printers, a couple of fancy digital SLRs. One Mac does color correction. The other is her main machine. Photoshop and a bunch of other image processing software that she knows how to use. Pretty sharp lady.

And she doesn’t know the difference between disk and RAM. It is all “memory” to her. She never added RAM to the skimpy amount Apple provided, so her disks are thrashed all day. She’d let the disks fill up, not realizing that she needs at least 10% free space just for the OS to use. A few hundred megabytes sounds like a lot to her.

This is the person you are going to sell RAID to? She’s your target market, with hundreds of gigabytes of valuable digital assets to protect. How would you start the conversation?

She does understand the value and process of making copies, which she would still need to do even if she bought your RAID gizmo. So how do you explain your value-add?

The StorageMojo take
Home RAID for the masses is an uphill battle. Backup is the battle the industry can win. What kind is the issue. Across the net to Mozy, Carbonite or some more fully featured option? Local backup to a DAS hard drive or to a simple USB-attached NAS drive? Those “one-touch” Maxtor drives?

Comments welcome, of course. Leaving for Boston today. If you’re in the neighborhood this weekend, send me an email and we’ll do coffee. I’ll be staying at Copley Square. Moderation may be a little slower than usual, but moderate I will.

Finally, some drive model failure numbers

April 5th, 2007 by Robin Harris in Backup, Enterprise

I wish I could report that one of the big guys stepped up . . .
Instead, Jon Bach of Puget Custom Computers in suburban Seattle offered his company’s data on his blog post titled Why RAID is (usually) a Terrible Idea. It is a good post and well worth reading in its entirety

I liked it so much that I blogged about it on my ZDnet blog yesterday. Here I’m focusing on the drive numbers.

What are these numbers?
PCC sells hundreds of desktop systems per month and they track all failures and trouble tickets. Jon’s numbers include ALL drive failures, including those caused by mishandling, like when a WD Raptor got dropped on the warehouse floor.

Here is the data I have for our hard drive sales in the last year, where we have sold at least 200 units:

Hard Drive Model # of Units Failure %
Seagate Barracuda 7200.9 250GB SATAII 280 3.21%
Seagate SATA Barracuda 80GB 271 2.58%
Western Digital SATA Raptor 74GB 592 2.03%
Seagate Barracuda 7200.10 320GB SATAII 202 1.98%
Seagate Barracuda 7200.9 160GB SATAII 265 1.89%
Seagate Barracuda 7200.9 80GB SATAII 403 1.74%
Western Digital ATA100 80.0GB WD800JB 290 1.72%
Western Digital SATA Raptor 150GB 278 1.44%
Total # of drives 2581 2.05%

These are all first year numbers. And I think they show how reliable disk drives as a group are. Make no mistake: disk drives are probably the greatest IT bargain out there. Drive companies have done a great job making massive storage affordable.

I added the total
Maybe one of my statistically smarter readers can do more with these numbers. As I look at the numbers though, I see a mix of desktop and server drives with no particular pattern - a result that agrees with Bianca Shroeder’s paper from FAST ‘07. Any other conclusions readers can reach?

Let us all know in the comments.

The StorageMojo take
It isn’t clear to me why folks who have the data about drive model reliability don’t want to publish it. Maybe they don’t want the hassle of customers requesting specific drives. Maybe all the drive and array makers do back room deals where they take volumes of not-as-good drives for knock-down prices and shovel them off to less-favored customers. Who knows?

Perhaps StorageMojo readers who have businesses like Jon’s or who work in corporate IT with access to failure data could pass it on to me. I’ll total them up and publish them. If a vendor doesn’t like the numbers then they can send me their own.

From a statistical perspective that’s a little rough, but we have to start somewhere.

Comments, as always, welcome. Moderation turned on to keep spam at bay.

Data replication from 33 AD to 405 AD

March 30th, 2007 by Robin Harris in Backup, Off-Topic, Security & Public Policy

A distinguished scholar published a book last year about data replication in the Greek-speaking ancient world. He examined a group of texts and how the technology and context of the times affected data integrity.

He looked at (I think he had some help) over 5700 ancient source texts, all of them at least copies of copies of copies, to find textual variants. There are over 250,000 variants, or more than one for every word of the texts. Makes floppies look like graven stone.

Boy, do we have it good!
We may complain about migrating data from one Windows machine to another, but the ancients had it far worse. Data replication technology was a guy looking at a text and copying it. No printing presses, not even punchcards. Primitive in the extreme.

The UI really stunk!
The standard scribal technique was to write without lifting pen from parchment, papyrus, vellum or whatever. No gaps between the words. No punctuation. TheywouldjustwriteandwriteuntilwellIdonotknowwhentheywouldstop. And they wouldn’t have that period there. Needless to say, no paragraphs, headers or hypertext links.

No wonder people couldn’t read. With text like that who would want to?

Reading a Turing machine tape, except in Greek
People make mistakes. Bored people make mistakes. Poorly trained people make more mistakes. Usually the folks copying these texts were amateurs, making a copy for themselves or for friends, maybe at the end of a long day. The words all running together, many of the words looking alike. Some common error patterns emerged, such as:

  • Mistaking one letter or word for another
  • Eye-skips, where the copyist skipped a line
  • Dictation errors, where one person was reading to the copyist and a word was substituted for one it sounded like

Mistakes on purpose
People, being people, often have opinions about a text, and sometimes the copyist would change the text to, in their opinion, correct or improve the text. Much of the book is taken up with analyzing where and why these changes were introduced, using rules developed by scholars over several hundred years to attempt to reconstruct the original text.

AFAIK no other ancient text has received such rigorous scholarly treatment. I find the techniques fascinating, even if they result in less certainty, rather than more, about the original, long lost, text.

Modern day counterparts
Our ability to store massive amounts of data has a downside: we can store massive amounts of error as well. Credit reports have high error rates that can cost people real money. America’s infamous “no-fly” list has snagged Senator Ted Kennedy and the wife of another Senator. To err is human. To err and preserve it in computer files demonic.

Oh, and the text is:
The New Testament. The book on textual analysis is Bart D. Ehrman’s Misquoting Jesus, The Story Behind Who Changed the Bible and Why. Bart is chairman of the religious studies department at UNC. A fascinating book, aimed at laypeople, on New Testament textual analysis. I highly recommend it.

The StorageMojo take
I’m not making or asking for any comment of the religious implications of Bart’s textual analysis of the New Testament. What is valuable, IMHO, is the awareness that information gets altered in many ways for many reasons.

Even in the age of bit-perfect digital copies, we also have tools that allow us to edit, alter and even fake digital information. One of the highest purposes of education is foster the ability to evaluate information independently of supposed authority, provenance or reliability. I don’t think that will ever change no matter what technological marvels we develop.

Comments welcome, of course. I haven’t been writing as frequently as I would like on StorageMojo due partly to travel and to other work, including my new blog on ZDnet. I plan to keep up with both, yet I expect it will take some time for me to figure out what, if any, the audience differences are between the two.

Creating an Historical Archive

March 19th, 2007 by Robin Harris in Backup, SOHO/SMB

Culinary history: old wedding cake in the freezer?
An long time friend of mine is working with the Culinary Historians of New York on a project to gather and preserve the records of a Depression-era WPA project. According to the CHNY:

The mission of “America Eats”– part of the New Deal and abandoned at the outset of WWII– was to send writers and photographers nationwide to document community eating in America from church suppers and clambakes to barbecues and holiday meals. The diverse flavors chronicled in these documents have lain forgotten in scattered archives and are only now being brought to light.

As you’d imagine, this a volunteer organization made up of foodies, not IT gurus. I’m no IT guru either, but not knowing that my friend asked me for help.

Easier to find than preserve
She wrote:

. . . we are trying to organize a search for these scattered and lost WPA documents inc. photographs that are buried in attics, historical societies, and a some collections in Library of Congress. We hope to “digitize” them to preserve in a central location for present and future food scholars to access.

So I asked myself, “Self, how would you build a historical archive?”
In response, I wrote:

CHNY has two problems: getting the materials digitized and then preserving the digitized copies for posterity.

Scanning, the easier problem, IMHO
Scanners can digitize textual and photographic materials quite handily. For text 300 dpi (dots per inch) is fine. Photographs should be scanned at a minimum of 600 dpi. Higher dpi is better; most scanners will do at least 1200 dpi and many will go up to 2400 dpi and beyond. Higher dpi results in larger files which may be harder to store, edit or share, yet if you don’t have the resolution to start with you can’t create it later.

Perfectly adequate text scanners start at $50, while very good photo scanners are available for $400. Photographs of particular interest can be commercially digitized in drum scanners for the very highest resolution and quality. Negatives and slides can be scanned by film scanners that range from $400 to $1200 depending on speed and quality.

Creating an archive of scanned documents
Preserving the digitized data is the more difficult problem. Over the decades file formats may change, data storage devices become obsolete - think 8 track tape - and media decays. There is only one strategy that I would trust and it goes by the acronym LOCKSS: Lots Of Copies Keeps Stuff Safe.

For CHNY I would save every file in at least three formats and distribute the copies on at least three media. For photos use JPEG, PDF and TIFF file formats. For text use ASCII text, PDF and PNG formats. For media store complete collections on DVD, server-attached hard drives and backed up to tape using ZMANDA, a commercial variety of the open-source AMANDA, which can be read without the application.

Ship the DVDs to people who will store the content on their web-servers and make new DVDs for people - DVDs you can burn yourself only have a life of 5-10 years. Also, print out complete copies of the data on archival quality equipment and media and donate them to a couple of archives at research libraries.

This may sound like overkill - it does to me, a little bit - and others may have different opinions as to the best file formats, but the basic LOCKSS strategy is your best bet. Once you’ve gone to the trouble of gathering the source material you never want to have to do that again. So preserve it with LOCKSS.

The StorageMojo question
That was all off the top of my head. I know some of you are smarter about this stuff than I am, so please, what would you do?

I suspect that many small and non-profit organizations have the same problem. If we put our heads together maybe we can put something together that will help a lot of people.

Comments welcome, especially in this case. Moderation turned on to keep spam out of the comments.

Update: I meant to put in a reference to the actual LOCKSS site and didn’t. I thank the commenter for reminding me of that. So I put in a reference above.

Un-Intel-igent Email Retention

March 14th, 2007 by Robin Harris in Backup, Enterprise, Off-Topic, Security & Public Policy

Intel last week provided a window into just how screwed up even wealthy, forward looking companies are around document retention for pending litigation. In the law biz these policies go under the general term of “litigation hold”. Intel is in Federal court on an anti-trust suit filed by competitor AMD in June 2005.

The recent changes in the Federal Rules of Civil Procedure (FRCP) added explicit requirements for electronically stored information (ESI) on December 1st, but litigation hold policies have been around for ages (for more info see Sto’Mo’s 3 Minute Guide to Electronic Discovery and Today’s the Day: New FRCP Rules Now in Effect). Which makes Intel’s behavior even more peculiar.

Let 1,000 litigation hold policies bloom
Intel’s email system automatically purges emails after short time, said by Intel to be about three months. It was only in October of 2005 that they started a weekly backup of the emails of executives whose actions might be relevant to the case. Until then they asked that employees voluntarily retain any emails that might be germane. Even after the backups started, an employee could receive and delete an email immediately to avoid having it backed up.

In effect, Intel replaced a single corporate litigation hold policy with one for every employee. With potentially billions of dollars damages at stake may one be forgiven for thinking that one of the world’s most successful high-tech companies might have done better?

Dumb, yes; malicious, maybe not
There is no evidence now that Intel sought to hide incriminating emails, which could bring disastrous “adverse inferences” from the judge if the case goes to a jury. Yet AMD’s legal team will be looking for evidence that Intel is hiding something, and if they find any Intel will have no one but itself to blame.

Intel will now invest in software that automatically preserves the emails of designated employees. One has to wonder why they waited until now.

The StorageMojo take
The resolution of Intel’s email retention liability may add to the evolving case law of ESI and electronic discovery. It certainly should serve as a warning to large companies that audited litigation hold policies are a necessity.

A sheepish “oops!” and a good-faith effort to recover lost documents may protect a company if no evidence of a cover up is found today, but in a few years judges and corporate audit committees will not be so forgiving. Get your litigation hold policies in order now, or face real pain sooner rather than later.

Comments welcome, please. I’m spending much of the day on an airplane headed back to StorageMojo’s Fortress of Solitude in the Arizona mountains, so moderation will be a bit slow.

Lightscribe: High-Tech Sharpie Replacement

March 9th, 2007 by Robin Harris in Backup, SOHO/SMB

In January I wrote about installing a $35 OEM dual-layer, Lightscribe DVD burner in a $30 Firewire/USB case. Since then I’ve been playing with Lightscribe CDs and I must say, I like the technology. It isn’t perfect, but for the extra $5 it cost to get a Lightscribe burner, it is a worthwhile tool.

What is it?
Lightscribe uses a burner’s laser to create monochrome images and text on the surface of a a specially coated CD or DVD. The background is light and the scribed area is darker.

How does it work?
The coating darkens where the laser toasts it. You put the disk in upside down so the laser can reach it, and then flip it over to read or write the disk.

What is it good for?
Labeling disks with optional decorative flourishes. I burn music CDs from iTunes for car use, and the hastily scribbled “Favorite Rock” isn’t much information 3 months later. With Lightscribe it is easy to burn the playlist on the disk.

Lightscribe quality
Given a print engine of over 2.4 billion dpi, you’d expect pretty high resolution. And indeed the resolution is excellent.

Yet there are two problems with Lightscribe quality: the printing is monochrome; and the contrast is limited. So while the detail is there, it doesn’t leap of the disk at you.

Burn time?
For some reason it takes 10-20 minutes to scribe a disk, which is a little odd when you consider that you can burn 700 MB of data faster than you print a 100 KB bitmap. You can cut the time by scribing smaller areas of the disk as you might for labeling backups.

Software?
There is free software for PCs, Macs and Linux available on the web, including an open-source labeler and a product from LaCie. For Linux and Mac OS I recommend the LaCie product because it has a reasonable number of designs to choose from and is fairly flexible. The software worked fine with my homebrew burner. Play with it though: it took me a while to figure out how to get a long playlist on disk using the LaCie software.

Kudos to LaCie for making the software freely available. They get a nice big LaCie logo on my desk when I use it, which must be worth something.

A quick scan suggests that most commercial disk labeling packages also support Lightscribe.

Playing catch the iTunes playlist
iTunes doesn’t make it easy to get an editable playlist. You can export a playlist, but it includes much junk, like file paths. The workaround I found: select the playlist; go to Print Setup; select an all text design; select Print; then Print Preview. Select the preview’s text, copy, and paste into the label creation software.

Hey, Apple, how about a cleaner way to get an editable playlist?

Lightscribe vs printable media
Color inkjets can produce very nice labels and the media is a little cheaper than Lightscribe. I did see a report of a printed disk that delaminated in an optical drive, destroying it, which seems to be an uncommon experience. Lightscribe disks are coated, so I wouldn’t expect that to be a problem.

On the other hand, once you’ve bought the disk, you don’t have to pay for ink or wrestle with carriers or what-have-you to get the disk printed. The total cost of ownership is probably similar once ink coat is factored in.

The StorageMojo take
I’m hell on my car CDs, so I’m not totally sold on paying the extra money for Lightscribe media, but maybe you are more careful than I am. I definitely like being able to see a playlist on the disk. Since I use a laser printer, inkjet printable media isn’t all that attractive.

The sweet spot for Lightscribe, IMHO, is low-volume back up media. You can label each disk with a fair amount of detail and file it. When restore time comes, you don’t have to guess what you’ve got. That is worth something in peace-of-mind.

Comments welcome, of course. How do you label your backup media? Other insights into labeling media, inkject media printing or ???

Lost In Space: Man’s First Step On the Moon

February 2nd, 2007 by Robin Harris in Backup, Off-Topic

People of a certain age remember when Apollo 11 landed on the moon, and a large percentage of us were watching the live broadcast of man’s first step to the lunar surface. The picture was fuzzy and grainy, and the audio was none too good either. But you could see and hear something and it was a thrilling moment shared by hundreds of millions of people around the world.

Actually, the picture was a lot better than we ever knew
According to an article in the Washington Post, the slow-scan TV system had much better quality than we ever saw. According to WaPo, the signal:

. . . was transmitted from the moon to ground sites in Australia and the Mojave Desert in California, where technicians reformatted the video for broadcast and transmitted long-distance over analog lines to Houston. A lot of video quality was lost during that process, turning clear, bright images into gray blobs and oddly moving shapes . . . .

The high-quality slow-scan TV pictures were preserved on tape, while commercial TV cameras captured the output of the tracking station SSTV monitors - a major loss of signal quality right there - and transmitted that feed to Houston and then on to us.

Backup hell
A few years ago it occurred to some of the folks who were at the tracking stations that it would be cool to see that original high-quality SSTV picture. NASA just had one machine that could read the tapes, so time was - and data formats - were marching on.

Houston, we have a problem
After several years of searching, the couldn’t find the tapes. The original record of this historic event is likely lost forever. Conspiracy theorists who insist the lunar landing was faked may be elated, but I’m bummed.

The StorageMojo take
The preservation of electronically stored information is no simple task, as the billions of dollars spent each year on RAID, backup software and archiving software and services attest. NASA could have benefitted from LOCKSS, had it existed back in the summer of ‘69. LOCKSS stands for Lots Of Copies Keeps Stuff Safe. If there is a better strategy I haven’t heard of it.

Comments welcome, of course. Moderation turned on because moderation is a virtue, except in the defense of liberty.



Next Article »
StorageMojo RSS Feed May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006 October 2006