A commenter recently asked
Archivas was focused on archive, do you expect the new solution to sustain performance for primary storage as well?
Which is a good question, if you know what “primary” means. Do we?
Tiers of a clown
10 years ago we all agreed on 1st tier or primary storage: block-based; RAID 5; enterprise FC or SCSI drives; SCSI, FC or ESCON host connects; optimized for transactional workloads; and large mirrored (with 1 notable exception) caches. When SANS took off we stuck FC switches in front of the boxes and called it good.
But something happened to that consensus: iSCSI; NFS; CIFS; SSD; MEMcache; Internet scale-out; Infiniband; 10GigE; storage & processor virtualization; CDNs; web-serving; pNFS; and lower-cost out-sourced high-scale infrastructure (i.e. cloud). And more – such as non-SQL data management – is coming.
Will the real primary storage please stand up?
Amazon runs a high-growth $25B/yr business on scale-out storage, servicing millions of customers, taking real money and shipping real goods, 7x24x365. Smells like enterprise spirit.
Is Amazon’s storage “primary” and, if so, what makes it primary?
Yes, it is primary storage. No, it isn’t the logo that makes it so.
Workload & service level
It’s tempting to consider workload, but what workload? IOPS? Bandwidth?
How about parallelism? Web service is highly parallel. ACID database updates less so.
And what about files vs blocks? Blocks don’t require as much processing as files, as the host is handling the file system.
It is clear that most files aren’t often accessed. Does primary storage for files mean availability and reasonable performance? Or is there little difference between archive and primary for files?
NetApp is deduping primary storage. Others will follow, whether it makes sense or not, at least in messaging. Skeptics ask “If it is deduped, is it really primary?”
The StorageMojo take
We do a disservice to customers if we talk about “primary” storage as a class of equipment. It isn’t.
Primary storage is whatever works as primary storage for your application. Bare SATA drives Velcro’d to motherboards to a big cluster of DMXs. Both are in use in major enterprises for mission critical applications – and they both work.
The 60 year secular trend to cooler data is the cause – an inverse of Moore’s Law. As the average accesses of data declines, technologies that meet the need at a lower cost become attractive, find a market, and grow. Niche products become mainstream – and perhaps “primary” – for their markets.
At the same time Moore’s Law is working its magic: creaky slow 10Mbit Ethernet becomes 10GigE. Board level controllers become chips. Storage software migrates from firmware to a stack running on commodity processors. Yesterday’s “archive” storage is tomorrows “primary” storage for the right apps.
Even the term “enterprise” is losing its meaning. As firms begin the 10 year migration to private clouds for cooler data, commodity hardware – servers, unmanaged switches, SATA drives – will be knit by cluster software that may even be open source. It is “enterprise” because an enterprise is using it.
This why all the big iron vendors are migrating their software from embedded firmware to stacks running on commodity processors and operating systems. For the mainstream market the commodities are fast enough and the economics are compelling.
If if works for you, it’s primary.
Courteous comments welcome, of course. BTW, I’m getting a briefing from HDS on the old Archivas product, so maybe I’ll have more to say RSN.