What’s the difference?
I came across a thoughtful essay on the “Top Ten Differences between Disk-based Archive & Disk-based Storage” in the MatrixStore blog. MatrixStore is a Mac cluster-based disk archive for Apple’s to-be-announced-RSN Final Cut Server.

MatrixStore is focused on one market segment – video content archiving – but their comments seem to be generally applicable. With 2008’s likely focus on the disk-based backup and archive market, it is worth starting the conversation now.

Key points
SANs aren’t designed for archiving.

Reason 1.

If you are archiving your data, it’s probably because you don’t want to lose it.

Raison d’etre for a disk based archive? To keep data – safe. For a SAN? Speed of delivery, QoS… You wouldn’t put 256 bit delivery checksums into a SAN; SANs cut corners on flushing to disk; SANs don’t build in search or audit-trails, or security; SANs can down completely because of single-points-of-failure in the hardware; a bad software update in a SAN and…. Don’t do it. With nursing care and attention they can run fine for years, but they are inherently tightly coupled, software version sensitive, high maintenance, error prone and hardware technology dependent… even if they are brilliant at fast storage and delivery of information…

A disk-based archive must be: loosely coupled and free from dependencies between hardware components on independent nodes (surely the greatest example of a loosely coupled solution is the world-wide-web; you have no fear on the www that a server going down, say, hosting an IBM site, is going to bring down another in Cupertino!); free from requiring constant latest updates to software/firmware; able to guarantee safe delivery and storage of data; and basically, able to safely, securely store and protect data for year upon year, without complications, manual intervention, spanners…

Archives must be engineered for easy adoption of new technology
In storage everything is cheaper next quarter. So why buy now?

Reason 2.

There’ll be bigger, better, cheaper, more efficient disks in 2009, and in 2010, and in 2011…

Will there be bigger, better, cheaper, more energy efficient storage devices coming out this year, and every year that follows? Yes, of course there will be.

In your SAN do you have to mirror between like-sized devices? What happens when one of those devices goes down in 2 years time? Do you end up throwing away the good device? In your SAN can you bolt on new technologies as they arrive; holographic disks that store 10TB a shot, or new fibre connectors?

In ZFS can you decommission a part of a storage pool, replacing it with new storage devices without significant bleeding edge techniques and without disrupting the rest? Ideally, it be great to bolt new technology into an archive, as and when they arrive, rolling out old technologies if they reach the point of diminishing returns; to be able to do that whilst always seeing a single archive storage cluster; and without a maintenance or data migration headache; or should I say; without risk. A disk based archive can achieve that, if selected carefully.

Vendor handcuffs
Long-term storage and proprietary products don’t mix. Along with upgradeability-in-place, this should be high on customer checklists.

Reason 3.

Vendor tie-in is more like Vendor hand-cuffs.

OK – this isn’t strictly about SAN vs Disk based archiving; but fact of the matter is that most SAN/any other disk-based storage solutions tie you in to a particular vendor, which is great when they are supplying the ‘best-in-class’ solution of the moment at time of purchase, but not quite so clever when you come to upgrade that solution a year down the line and they aren’t offering the best in class anymore.

The archive should be vendor independent otherwise, for many reasons, you’re just creating tomorrow’s headache with a solution from yesteryear.

Stability and security

Reason 5.

Viruses. Hackers.

Choice one:

“out of the box” configured with encryption, firewalled, data locked down, all access to data routed through PPK, all maintenance functionality requiring 256 bit passwords.

Choice two:

bolt on each of the above to your favourite SAN/filesystem. Wait five years as your conglomerate of software solutions evolve (along with the workforce) and cross fingers. A disk-based archive must be secure out-of-the-box.

There’s more, of course, and if you are interested please read the whole essay and respond here with your thoughts so every one can see and respond.

The StorageMojo take
EMC’s upcoming backup and archive cluster, code-named Hulk/Maui (HW/SW), will drive a lot of customers to think about this topic. Of course, EMC’s famously disciplined sales force will scrupulously limit Hulk/Maui sales to B&A applications for the first several months weeks days hours after its release. Once the customer utters the magic word “Isilon” Hulk/Maui will suddenly be ready for enterprise use.

[I hope someone has mentioned this to the Maui engineers: forget about summer vacation.]

Disk-based backup and archive is a fast growing application with very different requirements from SANs, arrays and fast NAS boxes. Data migrations will be increasingly infeasible. Management has to be stoner-on-the-night-shift-proof. And the data can’t be held hostage by proprietary standards.

Companies do discontinue products or go bankrupt, after all.

Comments welcome, of course. Anything else?