What’s the difference?
I came across a thoughtful essay on the “Top Ten Differences between Disk-based Archive & Disk-based Storage” in the MatrixStore blog. MatrixStore is a Mac cluster-based disk archive for Apple’s to-be-announced-RSN Final Cut Server.
MatrixStore is focused on one market segment – video content archiving – but their comments seem to be generally applicable. With 2008’s likely focus on the disk-based backup and archive market, it is worth starting the conversation now.
Key points
SANs aren’t designed for archiving.
Reason 1.
If you are archiving your data, it’s probably because you don’t want to lose it.
Raison d’etre for a disk based archive? To keep data – safe. For a SAN? Speed of delivery, QoS… You wouldn’t put 256 bit delivery checksums into a SAN; SANs cut corners on flushing to disk; SANs don’t build in search or audit-trails, or security; SANs can down completely because of single-points-of-failure in the hardware; a bad software update in a SAN and…. Don’t do it. With nursing care and attention they can run fine for years, but they are inherently tightly coupled, software version sensitive, high maintenance, error prone and hardware technology dependent… even if they are brilliant at fast storage and delivery of information…
A disk-based archive must be: loosely coupled and free from dependencies between hardware components on independent nodes (surely the greatest example of a loosely coupled solution is the world-wide-web; you have no fear on the www that a server going down, say, hosting an IBM site, is going to bring down another in Cupertino!); free from requiring constant latest updates to software/firmware; able to guarantee safe delivery and storage of data; and basically, able to safely, securely store and protect data for year upon year, without complications, manual intervention, spanners…
Archives must be engineered for easy adoption of new technology
In storage everything is cheaper next quarter. So why buy now?
Reason 2.
There’ll be bigger, better, cheaper, more efficient disks in 2009, and in 2010, and in 2011…
Will there be bigger, better, cheaper, more energy efficient storage devices coming out this year, and every year that follows? Yes, of course there will be.
In your SAN do you have to mirror between like-sized devices? What happens when one of those devices goes down in 2 years time? Do you end up throwing away the good device? In your SAN can you bolt on new technologies as they arrive; holographic disks that store 10TB a shot, or new fibre connectors?
In ZFS can you decommission a part of a storage pool, replacing it with new storage devices without significant bleeding edge techniques and without disrupting the rest? Ideally, it be great to bolt new technology into an archive, as and when they arrive, rolling out old technologies if they reach the point of diminishing returns; to be able to do that whilst always seeing a single archive storage cluster; and without a maintenance or data migration headache; or should I say; without risk. A disk based archive can achieve that, if selected carefully.
Vendor handcuffs
Long-term storage and proprietary products don’t mix. Along with upgradeability-in-place, this should be high on customer checklists.
Reason 3.
Vendor tie-in is more like Vendor hand-cuffs.
OK – this isn’t strictly about SAN vs Disk based archiving; but fact of the matter is that most SAN/any other disk-based storage solutions tie you in to a particular vendor, which is great when they are supplying the ‘best-in-class’ solution of the moment at time of purchase, but not quite so clever when you come to upgrade that solution a year down the line and they aren’t offering the best in class anymore.
The archive should be vendor independent otherwise, for many reasons, you’re just creating tomorrow’s headache with a solution from yesteryear.
Stability and security
Reason 5.
Viruses. Hackers.
Choice one:
“out of the box†configured with encryption, firewalled, data locked down, all access to data routed through PPK, all maintenance functionality requiring 256 bit passwords.
Choice two:
bolt on each of the above to your favourite SAN/filesystem. Wait five years as your conglomerate of software solutions evolve (along with the workforce) and cross fingers. A disk-based archive must be secure out-of-the-box.
There’s more, of course, and if you are interested please read the whole essay and respond here with your thoughts so every one can see and respond.
The StorageMojo take
EMC’s upcoming backup and archive cluster, code-named Hulk/Maui (HW/SW), will drive a lot of customers to think about this topic. Of course, EMC’s famously disciplined sales force will scrupulously limit Hulk/Maui sales to B&A applications for the first several months weeks days hours after its release. Once the customer utters the magic word “Isilon” Hulk/Maui will suddenly be ready for enterprise use.
[I hope someone has mentioned this to the Maui engineers: forget about summer vacation.]
Disk-based backup and archive is a fast growing application with very different requirements from SANs, arrays and fast NAS boxes. Data migrations will be increasingly infeasible. Management has to be stoner-on-the-night-shift-proof. And the data can’t be held hostage by proprietary standards.
Companies do discontinue products or go bankrupt, after all.
Comments welcome, of course. Anything else?
I agree with points you highlight, and all 10 points in the original MatrixStore blog entry. Still, big chunks of the archiving market are not well served by any large vendorr. MatrixStore’s own products and recommended hardware violate several of their reasons, including 2, 3, and 5.
We decided several years ago to host our online archives on NFS servers. We have at least 4 different vendors – depending on how you count the “who owns them this week” saga of the Snapserver family – and some major changes in technology. Most recently, we’re tossing our old NAS appliances for one Sun Thumper – still runs NFS, no changes to filesystem structure, more-or-less transparent swap out for the clients. To bad Sun can’t put a “stoner-on-the-night-shift-proof” user interface on Thumper-as-NFS-server. Maybe, someday Sun will get a clue about storage, but I’m not holding my breath.
We use a vendor-neutral format for our disaster recovery tapes – TAR. So if the Thumper goes poof, and Sun doesn’t make them any more, we can mount the new NFS server and restore the file system without any proprietary hardware or software.
EMC and the other big players will target the new, legal-department-driven, CYA archiving market. MatrixStore and others are targeting the video asset market. Both have more money to throw at these problems than we can afford. Neither market will know that they are missing lots of important features, and will be crying in their milk in a few years when their major chunks of their “archives” are corrupted or inaccessible due to RAID 5 write holes and other tricks the industry doesn’t like to talk about.
MediaStore claims to be a complete archive solution – as long as your disaster recovery plan includes setting up a distant mirror site (double the license fees to MediaStore) at the end of a *very* fat and expensive network pipe. No doubt EMC et al will push similar solutions too increase their margins.
For disaster recovery (an important element of a full Archiving solution), I prefer the value-bandwidth-security proposition of the proverbial “station wagon full of tapes screaming down the Interstate”. (Feel free to substitute your favorite portable media and physical transport mechanism). Security? How many times has a virus or operator error wiped out all of your offline media?
Good to see agreement on points of how to make a disk archive and as for the feeling of being ripped off by anything that smacks of proprietary; I totally have to agree. The commoditisation of disk-based archives both has to, and is happening; the shame is that there are so many half-hitched ‘seat of the pants’ solutions out there that having glaring gaps in their long-term feasibility.
Solutions, sometimes by individuals (or communities) going their own route; other times with some help from companies who have loftier goals than “lock-inâ€; always should start by letting you decide what hardware he wants to use both now and in the future (point 2, 3!)
Situation today from major vendors is: “You must take our software, our hardware, our support package, our future for the disk based archive”. Then you get shown a pretty little picture of shiny equipment; don’t ask questions such as “what happens next year†cos you know the answer “oh! We’ll support whatever comes out…†when you know darn well that they’ll only support what they want to, for as long as they want to, and at a price they want to.
Is there a place for any company to provide software that supports the ten points in the article, or will people always prefer to “build it themselvesâ€, whatever effort that entails?
The answer/solution we came to is: let’s bolt a piece of generic software on to hardware that doesn’t force you to make hardware decisions; doesn’t force you to stop picking up new hardware solutions as they become available, and that does provide full support for all ten points in the article such as data delivery/security/checking/etc.
MatrixStore isn’t perfect – e.g., it only runs on MacOSX, but it does fully support you picking your own hardware (a lot of non-apple devices can be attached to apple h/w) and a full, build-it-yourself Linux version is close (Linux version is available now, but has to be compiled by the company). For security it strips down the OS (linux/macosx) so bare that even ssh is switched off, along with a few other tricks to lock data down. So, yes, as long as you want the software managing the communications (etc) to the devices, you have to run the software, but no hardware tie in and a lot of flexibility should you ever decide you don’t want to use the software for some reason (without ending up with hardware doorstops).
As for offsite backup – not sure if I do or don’t agree on that – data pipes (especially IP based ones) aren’t so expensive nowadays – are there any decent cost comparisons out there? Anyway, the (Uk) government are probably wishing they took encryption a LOT more seriously in whichever form offsiting data takes!!
Rex said:
“I prefer the value-bandwidth-security proposition of the proverbial “station wagon full of tapes screaming down the Interstate†….. Security? How many times has a virus or operator error wiped out all of your offline media?”
Would those be the same wagons that regularly lose copious amounts of data because the driver left the cab open when he stops to spend a penny? You only have to google ‘tape data loss’ for the amount of data going walkies through human error especially in transit.
Not to mention a whole raft of new cases coming out of the closet. It’s becoming ‘de rigueur’ for public organisations to lose data in the UK. I do not see that trend slowing whilst we rely so heavily of ultra-portable storage media. When reasonable amounts of money or precious jewels are moved around the carrier is usually cuffed to the briefcase or two scary looking kevlar suited chaps laden with CS gas canisters storm past you on way to their armored personnel carrier. When 65million confidential records are sent outside an agency in the UK some spotty teenager puts it in his back pocket making a mental note to pop down the post office after his fag break.
Why is a secure pipe for data transmission not a solution to consider? In either case encryption and strong policies on access are key.
On your other point I agree, it is all to easy for data to get wiped by an incompetent or disgruntled sysadmin. This why a disk-based archive solution simply needs to protect that data from being wiped out over the network, even by an administrator with a penchant for sudo commands and all the passwords under the sun.