The economics of massive scale-out storage systems has thrown a harsh light on legacy enterprise storage. Expensive, inflexible, under-utilized data silos are not what data intensive enterprises need or – increasingly – can afford.
That much is obvious to any CFO who can read Amazon Webe Services’ pricing. But how to get from today’s storage islands to tomorrow’s storage ocean?
The CFO’s problem: the investment in the hardware, software and operational changes is all upfront. The payback takes years, and C-level execs are right to be skeptical.
An ideal solution would preserve their existing investment and let them add commodity storage as needed. Which is what Primary Data intends to do.
Primary Data’s idea is to provide a scale-out metadata service that gives centralized control and management, while staying out of the data path. A single enterprise name space with many data paths.
As co-founder and CTO David Flynn said Monday,
Separating the control channel from the data channel – pulling the metadata out from amongst the data – the metadata that describes the files, directories, access control and other things is today commingled with the data objects. Therefore the data object has no identity that is consistent as that data object gets stored on different systems. Move that object and it becomes a different object and you’re stuck managing multiple copies. . . .
Blocks, files and objects, oh my!
The metadata service essentially turns all files into objects – files + traditional file system data – whether they are written as blocks, files or objects. Look up the file you want on the metadata service, get its location, and access it directly.
Thus it doesn’t matter whether that file is stored on block or file or object storage. Once the name is resolved the data can be accessed through any protocol because
Centralizing control makes sense. Rarely used files – most of them – can be moved to cheap storage. I/O intensive data can be moved to SSDs. Critical data can be replicated multiple times and across geographies.
Disaster recovery and backup are simplified because enterprise-wide snapshot policies ensure that data is protected and available. The snapshot layer is not in the data path, so it’s fast and painless.
They’ve made the metadata service infrastructure as bulletproof as possible. The data director runs on a scale-out cluster architecture with SSDs for speed and bandwidth.
If the data director fails the data is still there and accessible by individual servers. You lose the global namespace for a while, but it can be rebuilt by reading object metadata – just as with any object store – without losing access or data.
The StorageMojo take
David Flynn and Rick White – the founders of Fusion-io and Primary Data – are back with another big idea. Primary Data is a much bigger idea than PCI flash cards, and the payback won’t be as quick.
By using existing storage assets though, PD has removed one of the big obstacles to modernizing enterprise storage. Firms can start small, try out the technology, train some people, validate the payback and then extend its use.
The big challenge, which PD is well aware of, is to become a trusted enterprise vendor quickly. A targeted POC campaign with big-name customers is the most likely route to quick – 2-3 years – enterprise acceptance.
The RAID array has had a great 25 year run. But as Google, Amazon and others have shown, the future belongs to scale-out storage. Primary Data may have the key that unlocks it for the enterprise.
Courteous comments welcome, of course.
I don’t understand how you can separate the “control” plane from the “data” plane using existing protocols. Or does this presume that software will be installed on the host computer that does this mapping? Or is this just an API system which means that this is only interesting for new development?
For example, I’d like to hear how / if I can use this with my existing VMware or Oracle installations.
Lou, NFS 4.1 – parallel NFS or pNFS – does this too. It is different from most protocols but not new or untried.
Robin