What is “primary” storage?

by Robin Harris on Monday, 26 July, 2010

A commenter recently asked

Archivas was focused on archive, do you expect the new solution to sustain performance for primary storage as well?

Which is a good question, if you know what “primary” means. Do we?

Tiers of a clown
10 years ago we all agreed on 1st tier or primary storage: block-based; RAID 5; enterprise FC or SCSI drives; SCSI, FC or ESCON host connects; optimized for transactional workloads; and large mirrored (with 1 notable exception) caches. When SANS took off we stuck FC switches in front of the boxes and called it good.

But something happened to that consensus: iSCSI; NFS; CIFS; SSD; MEMcache; Internet scale-out; Infiniband; 10GigE; storage & processor virtualization; CDNs; web-serving; pNFS; and lower-cost out-sourced high-scale infrastructure (i.e. cloud). And more – such as non-SQL data management – is coming.

Will the real primary storage please stand up?
Amazon runs a high-growth $25B/yr business on scale-out storage, servicing millions of customers, taking real money and shipping real goods, 7x24x365. Smells like enterprise spirit.

Is Amazon’s storage “primary” and, if so, what makes it primary?

Yes, it is primary storage. No, it isn’t the logo that makes it so.

Workload & service level
It’s tempting to consider workload, but what workload? IOPS? Bandwidth?

How about parallelism? Web service is highly parallel. ACID database updates less so.

And what about files vs blocks? Blocks don’t require as much processing as files, as the host is handling the file system.

It is clear that most files aren’t often accessed. Does primary storage for files mean availability and reasonable performance? Or is there little difference between archive and primary for files?

NetApp is deduping primary storage. Others will follow, whether it makes sense or not, at least in messaging. Skeptics ask “If it is deduped, is it really primary?”

The StorageMojo take
We do a disservice to customers if we talk about “primary” storage as a class of equipment. It isn’t.

Primary storage is whatever works as primary storage for your application. Bare SATA drives Velcro’d to motherboards to a big cluster of DMXs. Both are in use in major enterprises for mission critical applications – and they both work.

The 60 year secular trend to cooler data is the cause – an inverse of Moore’s Law. As the average accesses of data declines, technologies that meet the need at a lower cost become attractive, find a market, and grow. Niche products become mainstream – and perhaps “primary” – for their markets.

At the same time Moore’s Law is working its magic: creaky slow 10Mbit Ethernet becomes 10GigE. Board level controllers become chips. Storage software migrates from firmware to a stack running on commodity processors. Yesterday’s “archive” storage is tomorrows “primary” storage for the right apps.

Even the term “enterprise” is losing its meaning. As firms begin the 10 year migration to private clouds for cooler data, commodity hardware – servers, unmanaged switches, SATA drives – will be knit by cluster software that may even be open source. It is “enterprise” because an enterprise is using it.

This why all the big iron vendors are migrating their software from embedded firmware to stacks running on commodity processors and operating systems. For the mainstream market the commodities are fast enough and the economics are compelling.

If if works for you, it’s primary.

Courteous comments welcome, of course. BTW, I’m getting a briefing from HDS on the old Archivas product, so maybe I’ll have more to say RSN.

{ 5 comments… read them below or add one }

Jeff Treuhaft July 26, 2010 at 2:46 pm

Could not agree more Robin! For a growing majority of IT environments the features of a ‘Primary Class’ platform are much more critical than the device model number or benchmark score in a lab environment. Integrity protections, Data Security, Snapshots, Replication, these are the touchstones of “primary class” storage for most IT people. One of the challenges you raise is that as vendors broaden the ways they deliver devices that can be used by primary the workloads can increase. Multiple stacks running on disparate hardware integrating many drive technologies sounds great but it multiplies the day to day workload of an average IT adminstrator who now has to watch potentially 10s or even 100s of boxes for issues and to protect against data loss. This is why with the right primary class features backed by a true SLA most IT shops can greatly benefit today from the use of enterprise storage services like Zetta. All the primary class features and advanced data protections in an on demand business model without the complexity of trying to figure out how to erect and then maintain a private cloud of storage.

Anonymous July 26, 2010 at 7:50 pm

Yesterday’s “archive” storage is tomorrows “primary” storage for the right apps.

Shouldn’t that be Yesterday’s “Primary” storage is Tomorrow’s “Archive” storage.

ie: FC Disks are for archival storage of the MEMCache Database.

Robert Pearson July 27, 2010 at 12:49 am

One Operational Definition of “Primary Storage” is that this is the stored Information that produces 80% of your revenue for the company.
The second part of this definition is, “Which of your stored Information, if you should lose it completely, would put you out of business?”.
Easy to ask, difficult to answer. Involves many egos and turf wars.

The most important step to define the “Primary Storage” question is the attitudinal shift to the Information Centric view from the historic Technology Centric view. Boxen (Technology) are good for egos and turf wars.

I am always torn between making the “boxen” really smart or keeping them really dumb. The “big-brained” people feel the best paradigm is “dumb boxen” and very intelligent edges. That seems to be where we are headed.

The Information Centric Storage view makes the boxen (Technology) disappear by becoming seamless, transparent and invisible.

The Information Centric view is more interested in the ROI/TCO ratio of Managed Units of Information. TCO is easy to get, difficult to get correctly. ROI is difficult to get all the time. I can’t even get people to agree on the Information, which if you lose it, will put you out of business, and the Information that generates 80% of your revenue.

The ROI/TCO ratio is the key parameter in the “Well-being Index”.
The Well-being Index is one possible solution to having an important measure of the Information well-being. Call it anything you like.
An acceptable Well-being Index takes into account ROI/TCO, SLA, Context, Content, Findability, Information High Availability (IHA), Information Integrity (II), which Information, if lost, will put you out of business, which Information generates 80% of revenue and the User Experience (UX).

The “rich” Storage environment seems to be beyond the visible horizon. By “rich” it is meant that if ILM, data de-duplication, in-stream data compression, Content Typing and Records Management, etc. for archives is desired it will be an automatic Lower Metric in the Storage tools.

A much more important Storage consideration is the comment ”Our goal for replication throughput is 1 terabyte per hour”.
1 terabyte per hour is expensive up front. It may be beyond your IT pocketbook. But what part of 1 terabyte can you afford? Do you know? Do you know how to determine this? Do you know what the benefits will be?

That would be a nice rich Storage tool to have. What level of “Speed Limit of the Information Universe” does my IT infrastructure need to support and deliver Findability, User Experience, Information High Availability, Information Integrity, Disaster Recovery, and Business Continuity?

Would it be nice to have the Information Integrity software deliver a “Well-Being Index (Status)” through the “Well-Being API”?

Would it be nice to have the “Well-Being Index” show you visually that ILM was invoked for “xxx”bytes of Information.
That data de-duplication operations were performed on “xxx”bytes of Information?
The Content that was managed by Information Integrity in the last “xxx”hours? And how that Content relates to the IT Portfolio Management, Lines of Business and the company Portfolio Management?

Would it be nice to have a Storage tool with an interface like Ms. Dewey? http://en.wikipedia.org/wiki/Ms._Dewey
You just click on the Ms. Dewey-Storage and tell it to show you whatever it is you want to know. Like the Well-Being Index and API, the current Speed Limit of the Information Universe, the SLA status, the current status of Findability, User Experience, Information High Availability, Information Integrity, Disaster Recovery, Business Continuity, etc.
The possibilities are endless…

Lee Johns July 27, 2010 at 6:28 am

I agree Robin. The interesting implications of the move from firmware to software for storage have not been fully explored yet. No matter if it is primary, secondary or archive, if it is an application running on industry standard server platforms there will be fundamental shifts in market dynamics.

Don MacVittie July 27, 2010 at 8:41 am

Great write-up. I could niggle at some of the detailed “whys”, but since I agree with the whole, that would be a waste of time.

I’ve done the “velcro and motherboard” thing to get a mail server back up quickly from a non-disk hardware failure. Since it “worked”, it stayed that way longer than you’d expect.


Leave a Comment

Previous post:

Next post: