So here I am in a cloudy and rainy San Diego, visiting Storage Networking World. This is a show for big data center types. Typical opening question: “So how many data centers do you have?” But there is frequently some interesting stuff presented amidst the vendor driven chaff that might have meaning for the SMB market.
With a 25x data compression factor, the winner, yesterday, is Diligent Technologies (are all the good names are taken?). They claim their technology enables data volume compression that is over 10x what ordinary data compression achieves — a real breakthrough. Common compression algorithm are lucky to get 2x compression.
So if you have 100 GB to back up, their product, Protectier (see name comment above) can turn it into 4GB, something you could burn onto a DVD in a few minutes. All in all, a wonderful product for SMB’s — but they aren’t selling it to SMB’s (good marketers must be scarce too).
Having spent some time looking at compression algorithms in my mis-spent youth, I was very sceptical of the 25x reduction claim. I was gradually cornering the charming but less-technical than me Melissa, when up walked Neville Yates, Diligent’s CTO, whose movie-star good looks and English accent give no clue to his manly technical chops, which are impressive.
The way Diligent achieves it exceptional compression ratio is by comparing all incoming data to the data already arrived. When it finds an incoming stream of bytes similar to an existing series of bytes it compares the two and stores the differences. The magic comes in a couple of areas, as near as I can make out given Neville’s natural reticence on the “how” of the technology.
First, one has to be smart about how big the series of bytes before worrying about trying to compess it, since if it’s too short there won’t be much or any compression. Secondly, the system needs a very fast and efficient method of knowing what is has already received so it can know when it is receiving something similar. And it all has to be optimized to run in-line at data rate speeds on a standard server box — which runs the cool and reliable Linux OS.
The big plus to this technology besides the compression ratio, is its reliability. Since there is no assumption that two files are the same just because their metadata is, the problem of not backing up something you mistakenly thought was already backed up (a problem with file-based de-duplication software) is eliminated. Further, since the software operates on byte-streams, it can compress anything: email, databases, archives, mp3’s, encrypted data or whatever weird data format your favorite program uses.
So naturally I am a bit disappointed that this wonderful technology is targetted to large data centers, even though I understand Diligent’s thinking. A viral marketing, disruptive technology approach would be to release a consumer version, that maybe offers just 10x compression, but proves to hundreds of thousands of people in a few months that the technology really works. Then the data center guys — the smart ones anyway — will be calling Diligent.