One more datapoint and we’ll have a trend: NoSQL databases managing metadata. It’s obvious in retrospect: use a scalable big data tool to handle scale-out metadata. Maybe not a requirement today, but surely will be with even bigger data tomorrow.
Metadata is a fraction of the user data set, but it gets hammered much more. As more metadata is found useful the hammering will get more insistent.
Nutanix, whose CTO and co-founder, Mohit Aron, was a developer of the Google File System, uses MapReduce. Nutanix achieves it scale due to its distributed metadata, masterless architecture – powered by MapReduce jobs that run in the background.
Druva, a backup company for mobile devices, also uses a NoSQL database to manage storage metadata. They say they’ve found that NoSQL scales over an order of magnitude better than relational in similar applications.
A company that shall remain nameless is porting Hadoop to their backend. The customer won’t be able to access Hadoop for their work – it is strictly for the system’s internal use.
It is a proof of concept so it isn’t a 3rd data point, but they see the potential advantages. Call it data point 2½.
The StorageMojo take
Small advances are the building blocks of disruption. RAID made it possible to build available storage using cheap disks. Consumer adoption of PCs made disks even cheaper. Moore’s Law made RAID controllers cheaper and faster, or faster and more capable.
A virtuous circle of disruption.
The basic architecture of scale-out storage systems – purpose-built software on clustered commodity hardware – has been stable. But this is the beginning of scale-out storage 2.0: taking scale-out technology developed for users and incorporating it into the storage infrastructure itself.
These ideas are bubbling up among the latest startups and among the establishment players. At some point the old RAID architectures will be well and truly broken, able to compete in smaller and smaller niches until the revenue can’t justify more investment.
Of course vendors have been making RAID controllers out of servers for years now, and those servers can run any software they want. But at some point the explicit and implicit assumptions in the old architecture crash into current realities – either in cost, development time, feature completeness or management overhead – and then we move on.
Courteous comments welcome, of course. I learned about Nutanix at the last Tech Field Day “The Independent IT Influencer Event” which paid for my travel expenses to Silicon Valley.