Yes, Virginia, the storage industry will survive the crisis
Economists and business leaders generally agree that the current, as yet unofficial, recession will be the worst we have seen since the Great Depression. The credit bubble has popped and we are facing global de-leveraging that will take years to unwind.
De-leveraging is fancy term for “a lot less money rolling around.” The computer industry started after the Great Depression so this will be the worst times we’ve ever seen.
How bad will it get for storage?
Storage is a special case. Disk drives underlie everything we do and they show no sign of slowing their capacity increases and price drops.
Data growth rates are a little less certain – contracting businesses produce less data – but the economic advantages of online data continue to grow as cost per gigabyte drops. Even in the financial sector someone is going to have to unravel all of those credit derivative swaps and synthetic securities that the “rocket scientists” – heckuva job, guys! – developed.
Where will this impact IT operations? Right in the heart of the array business.
A little smarter, a lot cheaper
Assume 80% of all business data is unstructured. And suppose 80% of that data is stored on storage arrays that are optimized for transactional data.
If RAID arrays average $6/GB today and cluster storage averages $2/GB we can begin to estimate the potential impact. In a perfect world 64% – 80% of 80% – of all corporate data could be migrated from high cost storage arrays to much lower cost storage clusters.
If the storage array business is a $21 billion a year today that means there is roughly a total available market of $13 billion of IT spend that could go to storage clusters. If storage clusters are 1/3 the price of storage arrays that suggests a total storage cluster business of $4 billion a year.
That ignores, of course, the traditional impact of sharply lower storage costs: a rapid increase in the amount of data stored. Online and easily searched data is much more valuable than data is stored on paper or tape. A first-order guess is that in today’s market there is the potential for an $8 billion a year storage cluster IT spend.
That’s the theory, anyway. The reality is that most IT professionals will not give up the storage arrays they know and love without a fight. But the economic pressure will be unrelenting.
Winners and losers
This won’t be a rapid process. The early not-very-good storage arrays came out in 1990 and took 8 years before sales reached 50% of the capacity of enterprise storage. The economic advantages of cluster storage are greater and the pressure to contain costs much stronger today. It will be 6 years before half of all enterprise storage capacity sales are in storage clusters.
The winners will be those companies that embrace and extend the capability of storage clusters the soonest. Among large companies HP and EMC appear to have the lead. Among the small companies several will be purchased while others will continue to grow as independent entities.
The losers? IBM appears to have no discernible strategy. NetApp is bogged down in its efforts to integrate the GX global namespace with the contradictory requirements of its traditional Data OnTap code base.
Sun has good building blocks but will fail if they lead with Lustre. HDS will wait until the market is defined to start moving – but that may be too late. This is a software play in more ways than one.
Smaller companies in the array business have a steep learning curve with cluster storage. Expect most of them to fade over time. There will be opportunities for OEM suppliers to the mid-tier vendors.
The StorageMojo take
The age of the raid array is coming to an end. They won’t disappear anymore than mainframes have. But they will become much less common. The array business will see single-digit sales drops and general long-term stagnation. The storage cluster business will show robust growth.
The race for storage cluster dominance is still young. There are many variables where newcomers and existing players can find or fumble important advantages. Can storage clusters be effectively productized? Or will integration requirements favor service-oriented companies? How will flash be best integrated into storage clusters? How will the SMB market be cracked?
The economic crisis does not create new trends. It accelerates existing ones. IT professionals should not underestimate the power and impact of the current crisis on once sacrosanct IT budgets.
IT likes to talk about “business partnership.” Now is the time for action. Show the CFO that you know how to do more with less and you’ll be a partner. Insistence on business as usual is the wide road to a pink slip.
Courteous comments welcome, of course. Disclosure: I’ve recently done some work for HP on their announced but not-quite-shipping Extreme Data Storage 9100. I was impressed.
Hi Robin, could you explain a little bit more what you mean by clustered storage?
marc > Clustered storage are solutions like the ones provided by Isilon (multiple commodity boxes, each one having its own disks, unified by some sort of global filesystem to show clients only one array), or Hulk project for EMC. This kind of functonnality can also be provided by some softwares, look up names like Lustre or GlusterFS.
[Disclosure: I work for a company which develops cluster storage software]
IMHO arrays are / were good in storing and delivering large amounts of data, which were without arrays impossible. Together with some rather straightforward methods to devliver reliability / performace plus (aka RAID), it all naturally fits.
I absolutely agree with Robin, that the end of the arrays is coming nevertheless. One has to just open his eyes and see what is happening at the current storage technology frontier. Existing storage solutions/products are somewhat lacking intelligence and courage to put the pieces together. SSDs & colleges make up a whole different game when it comes to manage (sic!) the data stored, not just to manage access to the stored itself.
The whole effort around management of storage will / is becoming the most important factor. One example (out of many) regarding a hard discs’ performance:
a) if you are about to write 2gb of data in a random way (with lets say ~200IOP/s), this takes ~20000 seconds, that is ~6h
b) if you are about to write 80gb of raw data in a sequential way (80mb/s), it takes 1024 seconds (or 20times __less__ for 40times the amount of data).
I am sure the storage experts (you) already know the numbers. For a long time I am asking myself: why don’t the storage system producers apply sophisticated algorithms to this field of problem and start managing(sic!) the data? (please read this question literally)
We have taken the road of better algorithms, and we are surprised of our results. Performance as well as price of clustered storage solutions will change the game / the economics of storage. The game has to change, as the problems of storage management (time, cost, compliance) will dictate the change towards intelligence of the whole storage layer.
Amen.
Where do you see the expansion of the “low end” of the storage market.
It was not so long ago that some analysts were forecasting that while the data growth in large customers will continue, a lot of smaller companies would enter the storage market looking for external or networked storage. You see a lot of shops implementing their first SAN to support VMware for example.
While these are low cost and low(er) margin, some forcasts I saw were suggesting that more then half of the storage market growth ($$) would come from this segment.
How is clustered storage going to benefit a customer with <45 physical disks? or do you not see this as a valid trend?
Owen,
@ Marc,
Isilon is clustered tech but it starts in the same price bracket at primary storage arrays.
Lustre is now owned by Sun I believe..
@Robin,
Hopefully the downturn in the economy will make organisations re-evaluate why they spend $6000 for a TB when as you say 75-80% of the content should be kept on more appropriate, commodity based clustered storage platforms.
Hello Owen,
I see especially the “small shops”-market as a great possibility to make clustered storage go mainstream. As these are being hit by the downturn (some say, they will get hit harder than e.g. large corporations sitting at the top of the economic food chain), the small shops will have a greater interest in spend less money for more storage capacity/performance. I guess, the very same shops will more be willing to go new, “non-traditional” ways (e.g. deploying commodity / off-the-shelf components for storing their data), than larger corporations with more money to spend.
Let’s imagine, there were a software that could turn every network of pcs into a storage cluster. Depending on the price, this software could be a real “game changer”. [Then, ultimatively, the server market economics will become the storage market’s economic btw. I am sure you know, what this will lead to…]
How would this benefit a customer? Every customer could start and freely combine, build, etc. his/her own storage cluster. Customers could probably let storage and computing capacities (e.g. hypervisors) merge into one set of machines. imho, this would be of a great benefit for even the smallest company.
Interesting point – what is clustered storage? It looks like the next big thing (as soon as you can prise those expensive enterprise arrays from the storage specialists in enterprise environments), but what the heck is it?
I take it to imply a few things:
– Commodity hardware (lets face it, x86)
– Scale-out, rather than scale up (cpu is closely aligned to storage)
– Low(er) cost
– Global filesystem/unified namespace
– Commodity interconnect (1/10GbE or perhaps IB)
Examples would be Isilon, Hulk, XIV, GX, EqualLogic, etc.
I’d have thought that smaller companies looking to implement storage networks for VMware would be looking at iSCSI/NFS solutions rather than trying to implement their first FC SANs (just as vendors are starting to talk about the end of FC).
Chris
…waiting for you to write something about Sun Storage 8000 a.k.a. OpenStorage…
Ahem,…. I wanted to hear from Robin. I know the rest of us have slightly different ideas of what clustered storage is – I wanted to know what ROBIN was thinking when he wrote this post.
It’s Sun Storage 7000, and it’s not “clustered”, so apparently it’s too little too late 😀
Owen,
Good question! Back in the 80’s DEC sold low-end VAXclusters that used DSSI – a variation on SCSI IIRC – that was quite popular. It was limited to a maximum of ~28 disks. The most popular config was 3 nodes because you could lose a node and keep functioning at – usually – 100% and still have redundancy. With commodity server/disk boxes a tri-node cluster with dual Ethernet interconnects would offer SMB Peace Of Mind for a low cost. Note: this would be a compute/storage complex – running user apps & providing storage – not a pure storage-only cluster. As long as CPUs get faster, why not? And BTW, SMB is very much a channel play.
Marc,
I prefer a big tent definition of storage clusters: as long as you can add nodes in global name space, I call it good. Isilon’s backend Infiniband: fine. NetApp’s GX namespace front end cluster: on the ragged edge but still a cluster. Panasas’ basically commodity but with extra tricks hardware: groovy. Symmetric, assymetric, inband, out-of-band, shared-nothing, shared-everything: hey, they are all clusters.
It is when you get into economics and performance that the design choices affect customer decisions. My hazy crystal ball says commodity hardware and interconnects running on open source OS will be the predominate model in 5 years. Is the storage cluster software open source or proprietary – I don’t know.
HTH,
Robin
If you are really paying $6 per GB for array storage (even high end stuff, assuming you don’t go for the smallest disks in RAID-1) for volume then whoever is negotiating the deal is not doing their job very well.
Low end SATA-based array storage can be procurred for something closer to $2.50 per GB usable (ie RAID protected) with the right commercial deals. I’m being conservative – it’s possible to do better than that.
[Disclosure – I work for HP]
Robin,
Thanks for the ExDS comments.
We’re certainly seeing more customers look to discern between a performance / protection tier (AKA classic arrays) for mission / business critical applications and a capacity optimized tier (which is where you’d expect to see commodity based infrastructure) for almost everything else.
I think the hardware model is inevitably going to move towards a commodity (or industry standard) based approach for much of the unstructured content – the cloudier issue is what the software model will look like – for some of the mega-apps (Oracle et al) it looks like more of the classic array functionality will be driven by the app itself (like Exadata) but that drives deeper silo’s (each of the apps gets managed individually).
Would be interested to understand how much enthusiasm there is from the end-user community on an open-source model.
Ian
As a low-end storage customer, I agree with Owen. Traditional arrays or NAS (Windows NAS, StoreVault, DS3000) are actually cheaper than clusters. Realistically, storage clustering requires complex (and thus expensive) software that cancels out the cost savings of commodity hardware. Maybe pNFS will improve the situation.
The recession is going to accelerate changes. Many of these converts will choose online storage to ease their pain. Still ends up in a cluster. http://tinyurl.com/6cqjpj
BlueArc has always believed in being able to do more with fewer devices.
As your definition requires, we have for years offered a GNS capability and the option to cluster multiple nodes (8 as of this writing) in a single namespace, but each box on its own can also support up to 4 petabytes in capacity with file systems as large as 256 terabytes. Clustering is absolutely a great option, but we believe customers should cluster fewer devices and reduce the amount of floorspace, power, cabling and management necessary to address their performance and scale needs.
Great to see comments from Marc of 3Par here as well. When Robin talks, the industry listens! 🙂
BlueArc is so ridiculously overpriced, and always will be due to the custom hardware, that it’s not even a realistic option for this discussion.
That’s like saying a DMX can front end a PB, or even better that a USPV can front-end 247PB. When it costs more than most companies make in a decade, it’s a moot point.
We’re talking about commodity hardware (read off the shelf x86 servers) running cheap or free software to create a storage cluster. My personal bet is that Sun will be the ones to do it first if they continue down their current path. They’re halfway there already, and have the building blocks to flesh it out. Their biggest obstacle will be their shareholders stomach for such unconventional methods.
Sun’s new storage platform is interesting, and impresses in a netappish kind of way, but they have yet to commit to building a commodity storage cluster platform. I keep hoping they will.
You can get clustered iSCSI from Dell for bear $1.50/GB if you go buy in chunks of those new 48 disk systems. And that’s in unit sizes of “1”.
If pNFS is your friend, I suggest negotiating with Panasas. They can compete with NetApp’s prices if you negotiate for a larger procurement.
Isilon can be had for a good price with negotiation also. I just set up an Isilon cluster. It really is as easy as they say: just plug in the systems into the IB backplane and click a button on the front LCD and away you go. Cluster OS upgrades are easy too: upgrade one node, upgrade them all. It’s really easy.
–Joe.
Owen, Wes, DEC’s DSSI was a combined compute/storage cluster, not a general-purpose storage cluster. The SMB market – with a tilt towards the M – would go for this as an appliance. Not so different in concept, now that I think of it, from the highly successful AS400.
Steve, I may be behind on the pricing. My general observation for the last 5 years is that the disk capacity part of an array is only 5-10% of the cost. Low-end commodity server/storage boxes do better at ~25%. And that is using Dell’s low-end server pricing where they charge $500 for a 1 TB SATA drive that costs them $100.
I should look into commodity server/storage pricing and get updated.
Robin
What effect do you think cloud storage and the accelerate rate of multi-core processors in servers have on the new economics of storage for the various NAS/SAN manufactures?
Has the tipping point occurred?
Robin:
“I should look into commodity server/storage pricing and get updated.”
Check out Sun’s new storage platform.
As for commodity servers/storage. Here’s a corporate rate Dell:
Dell 2950iii 2U rackmount, 2xL5410 processor, 32GB RAM, 2 73GB 10K drives in RAID 1, dual-redundant power, remote management card, and standard 5×10 maintenance:
$3,752
PERC6/e external SAS adapters, $591 EA
Dell MD1000 SAS-attached JBOD chassis, 15 1TB drives, basic maintenance: $10,828.
Or with 15 450G 15K SAS drives: $14,268.
Dell partners will do better than the “standard corporate rate” in many circumstances, however above quoted will be pretty representative of Dell. Generally these prices will be better than any of the big american vendors, although I wonder if Sun’s new 48 disk JBOD would produce a better $/GB ratio…
Joe.
Marc,
Your comments about storage clusters are quite real. I do have a concern that you have stated that IBM has no vision in this arena. Yet we acquired XIV for just this aspect. XIV is truly a leader in this arena and we at IBM expect to really see large growth in the storage market with this class of storage device.
Can you provide more input on why you believe IBM has no vision in light of the XIV acquisition.
Lloyd
It seems that the end result of all this discussion is that “cluster storage” will be taking the place of “array storage” soon. In my opinion, this will only be the case if the “cluster storage” can provides more than just cost efficiencies for primary storage. Unless the cluster storage also provides the same reliability, scalability, and supportability as the larger monolithic arrays, then cluster storage will be relegated to the lower “Tier”
In basic concept, I actually agree with Robin though. Lower cost modular arrays can acltually integrate seamlessley with intelligent software to provide the same reliability and operational benefits of larger arrays. I term this concept “Optimized Data Services”, where intelligence in the fabric layer provides all the beneficial data services such as continuous protection, replication, virtualization, migration, deduplication, encryption, and thin provisioning services for mission critical and non-critical applications based on SLA.
The benefit? Storage becomes a commodity, operations are simplified, backup becomes a service (and the actual backup process goes away) capital expenditures are greatly reduced, DR is just always on, RTO is ZERO, RPO is 15 Min for both physical and virtual servers, utilization skyrockets, and since the solution is self healing, everyone can go home and drink beer at night rather than worry about servers or storage going down.
Keep it simple.
I think you are spot on Christopher. “Cluster storage” will take the place of “array storage” in the same way as mini-computers took the place of mainframes, and Wintel servers then took the place of minis. That is, it *won’t* replace it, but it will probably take a large part of the market. There is likely to be a need for a very long time for highly available, highly performant storage and that is what more traditional array storage will continue to provide.
For me, the interesting question is how much of the cluster storage market will belong to cloud services, not by storage systems owned by individual organisations. I suspect a large proportion will live offsite in the cloud.