From the category archives:

Information Management

Disk-based archive vs disk-based storage

by Robin Harris on Sunday, 27 January, 2008

What’s the difference?
I came across a thoughtful essay on the “Top Ten Differences between Disk-based Archive & Disk-based Storage” in the MatrixStore blog. MatrixStore is a Mac cluster-based disk archive for Apple’s to-be-announced-RSN Final Cut Server.

MatrixStore is focused on one market segment – video content archiving – but their comments seem to be generally applicable. With 2008’s likely focus on the disk-based backup and archive market, it is worth starting the conversation now.

Key points
SANs aren’t designed for archiving.

Reason 1.

If you are archiving your data, it’s probably because you don’t want to lose it.

Raison d’etre for a disk based archive? To keep data – safe. For a SAN? Speed of delivery, QoS… You wouldn’t put 256 bit delivery checksums into a SAN; SANs cut corners on flushing to disk; SANs don’t build in search or audit-trails, or security; SANs can down completely because of single-points-of-failure in the hardware; a bad software update in a SAN and…. Don’t do it. With nursing care and attention they can run fine for years, but they are inherently tightly coupled, software version sensitive, high maintenance, error prone and hardware technology dependent… even if they are brilliant at fast storage and delivery of information…

A disk-based archive must be: loosely coupled and free from dependencies between hardware components on independent nodes (surely the greatest example of a loosely coupled solution is the world-wide-web; you have no fear on the www that a server going down, say, hosting an IBM site, is going to bring down another in Cupertino!); free from requiring constant latest updates to software/firmware; able to guarantee safe delivery and storage of data; and basically, able to safely, securely store and protect data for year upon year, without complications, manual intervention, spanners…

Archives must be engineered for easy adoption of new technology
In storage everything is cheaper next quarter. So why buy now?

Reason 2.

There’ll be bigger, better, cheaper, more efficient disks in 2009, and in 2010, and in 2011…

Will there be bigger, better, cheaper, more energy efficient storage devices coming out this year, and every year that follows? Yes, of course there will be.

In your SAN do you have to mirror between like-sized devices? What happens when one of those devices goes down in 2 years time? Do you end up throwing away the good device? In your SAN can you bolt on new technologies as they arrive; holographic disks that store 10TB a shot, or new fibre connectors?

In ZFS can you decommission a part of a storage pool, replacing it with new storage devices without significant bleeding edge techniques and without disrupting the rest? Ideally, it be great to bolt new technology into an archive, as and when they arrive, rolling out old technologies if they reach the point of diminishing returns; to be able to do that whilst always seeing a single archive storage cluster; and without a maintenance or data migration headache; or should I say; without risk. A disk based archive can achieve that, if selected carefully.

Vendor handcuffs
Long-term storage and proprietary products don’t mix. Along with upgradeability-in-place, this should be high on customer checklists.

Reason 3.

Vendor tie-in is more like Vendor hand-cuffs.

OK – this isn’t strictly about SAN vs Disk based archiving; but fact of the matter is that most SAN/any other disk-based storage solutions tie you in to a particular vendor, which is great when they are supplying the ‘best-in-class’ solution of the moment at time of purchase, but not quite so clever when you come to upgrade that solution a year down the line and they aren’t offering the best in class anymore.

The archive should be vendor independent otherwise, for many reasons, you’re just creating tomorrow’s headache with a solution from yesteryear.

Stability and security

Reason 5.

Viruses. Hackers.

Choice one:

“out of the box” configured with encryption, firewalled, data locked down, all access to data routed through PPK, all maintenance functionality requiring 256 bit passwords.

Choice two:

bolt on each of the above to your favourite SAN/filesystem. Wait five years as your conglomerate of software solutions evolve (along with the workforce) and cross fingers. A disk-based archive must be secure out-of-the-box.

There’s more, of course, and if you are interested please read the whole essay and respond here with your thoughts so every one can see and respond.

The StorageMojo take
EMC’s upcoming backup and archive cluster, code-named Hulk/Maui (HW/SW), will drive a lot of customers to think about this topic. Of course, EMC’s famously disciplined sales force will scrupulously limit Hulk/Maui sales to B&A applications for the first several months weeks days hours after its release. Once the customer utters the magic word “Isilon” Hulk/Maui will suddenly be ready for enterprise use.

[I hope someone has mentioned this to the Maui engineers: forget about summer vacation.]

Disk-based backup and archive is a fast growing application with very different requirements from SANs, arrays and fast NAS boxes. Data migrations will be increasingly infeasible. Management has to be stoner-on-the-night-shift-proof. And the data can’t be held hostage by proprietary standards.

Companies do discontinue products or go bankrupt, after all.

Comments welcome, of course. Anything else?

{ 3 comments }

Microsoft RIFs old file formats – mea culpa

by Robin Harris on Wednesday, 9 January, 2008

Darn! It looks like I screwed up. I’m sorry. While Microsoft did disable a number of early Word and other file formats, it wasn’t as long a list as I thought.

Textual analysis
I take a text-heavy approach to the content on StorageMojo. I prefer to go to original source material, unpack the meaning and the context, and then give my take on it.

That usually works pretty well. But in this case it didn’t.

What happened?
I read a lot of technical documents. Most never get written about. But the Microsoft knowledge base article was an exception. Since Microsoft was the topic it also got a lot of attention from me and others

There is a lot of emotion around Microsoft. They are a big, powerful, immensely profitable and sometimes clueless corporation whose desktop monopoly is a fact of life for computer users and IT professionals.

I try to stay with the facts as best I can determine them. In this case I got confused by the KB article. That other people made the same mistake is small comfort and no excuse (see a Microsoft take here).

Lessons learned
Other than resolving to analyze content from Microsoft more carefully, I’m not sure what else I would do differently. I didn’t question their motives for the change, only the way it was handled.

However, I do have some suggestions for Microsoft.

  • Reducing functionality on an already purchased product is a problem. You should notify users that you are limiting product functionality and give them the opportunity to decline the update. Even if it is for their own good.
  • Suggesting that editing the registry or using esoteric admin tools to solve the problem is OK for the tech savvy. But what about my 85 year old neighbor Dorothy, whose computer is a lifeline to her great-grandchildren? Her late husband was an engineer, so she has files that go back quite a few years. Microsoft, you are both an enterprise and a consumer company. Own it.
  • Communication is worth spending money on. Tech writers tell me that Microsoft doesn’t pay very well and, as a result, it doesn’t get very good tech writing. Maybe MCSEs are used to the style, but it sure didn’t work for this reasonably tech-savvy consumer.

The StorageMojo take
Tech is complicated and sometimes people – like I just did – get it wrong. Listening to criticism and learning from mistakes is how we all get better, even Microsoft. I hope you’ll keep coming back to StorageMojo and I’ll keep doing my level best to make it worth your time.

Comments welcome, as always.

{ 2 comments }

Microsoft RIFs old file formats

by Robin Harris on Friday, 4 January, 2008

“They trusted us with their data? Will the fools never learn?”
The Service Pack 3 update to Office 2003 blocks over a dozen old file formats, effectively rendering the data inaccessible. Unless you are adept at the registry editing Microsoft cautions you against.

And they don’t warn you that you won’t be able to access the old files. Whee!

Check out my ZDnet article for the gory details. It isn’t pretty.

Update: While the SP3 does block opening a number of old file formats, the formats in question are older: all Word pre-6.0; PowerPoint pre-97; Excel 4.0 charts; dBASE II .dbf; Lotus and Quattro files; Corel Draw .cdr. See my mea culpa. End update.

Clueless droids?
How does the world’s largest software company make this kind of wrong-on-so-many-levels decision? Is there ANY adult supervision in Redmond?

The decision bespeaks a corporate culture that is painfully clueless about its customers. Gee, why would anyone want to access 5 year old Word documents?

Medical products marketing
Redmond’s blindness echoes that of Detroit’s for the last 50 years. “Safety doesn’t sell.” “Bigger is better.” “Good enough quality is good enough.” “Americans will never buy Japanese cars.”

Microsoft clearly doesn’t get the fact that their products are an intimate part of consumer’s lives, much as medicines are. When 8 bottles of Tylenol capsules were poisoned with cyanide in 1982, Johnson & Johnson quickly recalled 31 million bottles and spent on the order of $100 million dollars to restore consumer confidence in the Tylenol brand.

Would Microsoft spend a nickel to protect and reassure consumers? I give it a qualified “maybe.”

The StorageMojo take
In case anyone thought that archiving documents in proprietary formats was acceptable, this is your wake-up call. ASCII text and probably PDFs are OK. Everything else, including RTF – which Microsoft controls – is suspect.

With the growing focus on e-discovery, there should be a market for a high-speed “any format to .txt or .pdf” appliance. Producing unreadable softcopies won’t cut much ice in Federal courts.

Comments welcome, as always.

{ 7 comments }

Magic in the OLPC

by Robin Harris on Saturday, 15 December, 2007

Most criticism of the One Laptop Per Child PC centers on the cost for what is a low-spec computer. As ASUS with its Eee machine is proving, a low-cost conventional laptop can be pretty powerful. But that misses the point. The OLPC is a fundamental rethinking of the computing experience.

[photo courtesy OLPC]

This child’s review of the OLPC is the first hint that suggests that Laptop.org may have gotten it right. As the 9 year old’s father writes:

So Rufus is using his laptop to write, paint, make music, explore the internet, and talk to children from other countries.

Because it looks rather like a simple plastic toy, I had thought it might suffer the same fate as the radio-controlled dinosaur or the roller-skates he got last Christmas – enjoyed for a day or two, then ignored.

Instead, it seems to provide enduring fascination.

I had returned from Nigeria not entirely convinced that the XO laptop was quite as wonderful an educational tool as its creators claimed. I felt that a lot of effort would be needed by hard-pressed teachers before it became more than just a distracting toy for the children to mess around with in class.

But Rufus has changed my mind.

With no help from his Dad, he has learned far more about computers than he knew a couple of weeks ago, and the XO appears to be a more creative tool than the games consoles which occupy rather too much of his time.

OLPC roots
Even though the OLPC is the only notebook whose industrial design chops rival those of Apple, its real innovation lies in software. Building on educational theorist Seymour Papert’s work – he invented the Logo language – the OLPC’s re-thinks the relationship between man and machine.

OLPC differences
The OLPC has activities instead of applications.

Activities are distinct from applications in their foci—collaboration and expression—and their implementation—journaling and iteration.

The collaboration comes in the form of built-in mesh networking that allows all local OLPCs to talk to each other.

By exploiting this connectivity, every activity has the potential to be a networked activity. We aspire that all activities take advantage of the mesh; any activity that is not mesh-aware should perhaps be rethought in light of connectivity. As an example, consider the web-browsing activity bundled with the laptop distribution. Normally one browses in isolation, perhaps on occasion sending a friend a favorite link. On the laptop, however, a link-sharing feature integrated into the browser activity transforms the solitary act of web-surfing into a group collaboration.

The connectivity seems to be powerful. Young Rufus is conversing with other kids who send him messages in Spanish from his home in England. How does that work?

Expression is the goal of the activities and collaboration. Rather than downloading music, the laptop is equipped to create music. The rethinking extends to the file system:

The objectification of the traditional file system speaks more directly to real-world metaphors: instead of a sound file, we have an actual sound; instead of a text file, a story. In order to support this concept, activity developers may define object types and associated icons to represent them.

Another aspect of the system’s UI is a focus on the Journal. This is more than written documentation of what a child has done.

The Journal combines entries explicitly created by the children with those that are implicitly created through participation in activities; developers must think carefully about how an activity integrates with the Journal more so than with a traditional file system that functions independently of an application. The activities, the objects, and the means of recording all tightly integrate to create a different kind of computer experience.

I’ll be interested to see how children who grow up with the OLPC think about computers. I fear we have a generation of children whose creativity has been permanently stunted by the desktop metaphor.

The StorageMojo take
Negroponte’s biggest mistake is that he did not market the OLPC in the industrialized world first. All the good intentions in the world won’t convince the 3rd world that something is good unless it has been embraced by the opinion leaders of the 1st world.

If I was Steve Jobs, I’d be taking a very close look at this machine to see what I could steal. Michael Dell could learn a few things too.

Comments welcome. OLPC has a beautiful web site.

{ 1 comment }

Internet video’s performance/quality vise

by Robin Harris on Wednesday, 5 December, 2007

Internet video is about where film was 100 years ago
I was talking to a company who will be announcing a video infrastructure solution when the CEO mentioned something he called the “video performance/quality vise.”

Here’s the problem: a video stream requires both capacity and bandwidth. Higher quality video requires more bits per second and more capacity. Bandwidth and capacity both cost money.

So as Internet video quality rises, the financial cost to provide the video rises too. An HD video stream is 4 Mbit/sec.

500,000 channels and somethin’ on
As cute as YouTube, et. al. are, they suck. Movies are small, picture and audio quality awful, and viewing options limited – like films 100 years ago.

Bandwidth limitations are part of the problem, at least here in the US. But those are being addressed, however slowly.

What happens when Internet video becomes competitive with broadcast TV in quality? Popularity will soar. As TiVo has shown, people love choice. And the Internet will have the most choice.

The price/performance/popularity vise
Digital Fountain’s raptor codes will change the Internet landscape for video. High quality video will drive be much more popular, just as long-form movies took film to the next level.

Bandwidth costs are dropping fast to pennies a GB. So infrastructure costs – especially storage – are critical to Internet video’s commercial success. The more popular it gets, the more storage will be needed. It is a huge opportunity.

The StorageMojo take
Massive data storage is still a very young technology. The ultimate cultural impact will be more profound than film because of the many-to-many nature of the Internet and the low barriers to entry. Should be fun!

Comments welcome, please. I don’t think the firm wanted me to mention their name, so I haven’t. If we get that cleared up I’ll update the post. Or maybe wait a while to write about them.

Update: Joe, thanks for catching the 4Mbit mistake I made. I corrected it above.

{ 7 comments }

Google thinks I’m a virus

by Robin Harris on Tuesday, 4 December, 2007

Google is sorry
I do a lot of research on the web using Google. Starting early last week I started getting these Google error messages:

The search term was “gutenberg” as in Gutenberg Bible.

This is happening 5-10 times a day. I enter the captcha and I’m on my way. But it is irritating.

What is going on?
The downside of “free” is non-existent customer service. I’ve written to Google’s comment address asking about this and, of course, no response.

I have seen reports that other people are experiencing this problem, so it isn’t just me. I’m running Mac OS 10.5.1 and as near as I can tell I am virus free. I even checked for the codec Trojan and it isn’t there.

There is a Windows XP machine on the home network, which has the virus protection our local Windows guru recommends. It is a business system and doesn’t get out much anyway.

The StorageMojo take
My sense is that the boffins in Mt. View tweaked something last week that started this. What makes a human-generated query look like a virus? Or a DoS attack? I’m stumped.

Comments and/or solutions welcome. Any thoughts?

Update: Ms. Mojo ran the virus/spyware/whatever software on her Windows machine and it located 17 suspicious files. Haven’t gotten the message since. Since Ms. Mojo is all business it gives me a new appreciation for just how vulnerable XP really is. Thanks to all who wrote in with suggestions.

{ 14 comments }

Mac ZFS debate

by Robin Harris on Monday, 15 October, 2007

I’ve been a fan of ZFS since I researched it over a year ago. I’ve also been happy with the progress ZFS is making on OS X.

So it was a bit of surprise when I saw (thanks Wes) that MacJournals, a developers web site, was all sideways about it.

A good conversation
Fortunately a former Mac file system developer, Drew Thaler, responded with Don’t be a ZFS Hater.

Another respected Mac developer, Michael Tsai, also responded with a thoughtful post.

The StorageMojo take
I follow the ZFS discussion on OpenSolaris, so I understand that the ZFS implementation has a ways to go. From a marketing perspective, ZFS or something like it is required if consumers are going to use computers as media centers for purchased content. Seeing a couple of thousand dollars worth of music, TV, movies and videos go poof! is a sure way to get tossed out of America’s living rooms.

I believe Apple developers have the Mojo to make ZFS use transparent for Mac customers. They certainly have the help of the Sun team and it is in the interest of both companies to make this work. Plus, don’t forget Apple’s “touchless” file system upgrade patent.

But MacJournals correctly points out that UFS was once thought – though without the level of support ZFS has enjoyed to date – to be the successor to HFS+ and that a similar fate may befall ZFS. While that is certainly a possibility – never say never around Steve Jobs – there are good business and marketing reasons for going forward with ZFS, regardless of what techies think. Apple will go forward with ZFS and make it the standard OS X file system within 2 years.

Comments welcome, as always.
Update: I’ve started editing comments on this post to keep them on topic and away from personalities. I regret not doing so sooner. Nonetheless the discussion is informative and if file systems interest you, well worth perusing.

{ 24 comments }

Sun’s adds Lustre to supercomputing

by Robin Harris on Wednesday, 26 September, 2007

What about Sun’s acquisition of Cluster File Systems, Inc.?
Yawn. CFSI was going out of business. Sun bought the assets, not the company.

Good for CFSI employees
They get a paycheck from a solvent company. They may even get some sensible marketing. Hey, it could happen.

What is Lustre?
Arguably the highest-end parallel file system. At the Seattle Conference on Scalability, founder Peter Braam spoke about current 25,000 node Lustre clusters and plans to 10x that number in the next 5 years.
Update: It appears the Lustre.org and the Lustreusers.org sites are suspended. Hm-m-m? Update II: They are back up.

Cool, huh?

So why aren’t they rich?
CFSI was a tech playpen, not a company. Like Formula 1 racing. Instead of Ferrari, CFSI had the national labs backing them. Great stuff, except nobody else has the problems the national labs have, so it limits the market.

Lustre will be facing some serious competition from pNFS once it gets baked into Linux and other operating systems. The fast-growing commercial HPC market will eat pNFS clusters up. Lustre isn’t part of that.

The StorageMojo take
Sun bought a hook into a customer base that, when budgets are good, can be very profitable. They also bought a technical team that is very knowledgeable about fabric interconnects, which in the shift to cluster storage and grids will be a very good thing for Sun.

Comments welcome, as always. OK, Lustre proponents, tell me where I’m wrong.

{ 16 comments }