StorageMojo





Robin Harris    


The top storage stories of 2008

The world of data storage is changing faster than it has since the mid-90’s amid the rise of hardware arrays and storage networks. Looking back 2008 will be seen as a pivotal year. The big news, in rough ascending order:

FCoE
Though production-ready products are still in the future, the broad vendor embrace of Fibre Channel over Ethernet signaled the beginning of the end for the Fibre Channel physical layer. The storage companies who profited from a decade of Balkanizing the storage network market will have Cisco calling the shots.

Brocade, in particular, needs good strategy advice. Maybe one of these days they’ll get it.

Blu-ray tanks
Call me old-fashioned, but I have a soft spot for removable media. So I’m sorry to see Sony screw the pooch with Blu-ray’s big-studio-friendly licensing and hate-the-customer DRM. And sinking the PS3 as well.

The good news: you can put HD content on a standard DVD - just not as much; and there’s an upscaling dvd player - the Oppo Digital DV-983H that upscales ordinary DVDs to near Blu-ray quality. Yes, even better than the upscaling on a Blu-ray player. One less reason to pay the Blu-ray tax.

2.5″ drives
Rumor has it that Seagate is designing its last generation of 3.5″ drives, which augurs the switch to SFF in desktop and enterprise systems. 3 years ago 2.5″ drives were 1/5th the capacity; today the gap is 1/3 the capacity and a much smaller price differential.

At some point it will occur to Seagate’s top management that 1.8″ and 2.5″ drives are the disk industry’s best answer to flash. Now, if Seagate were in the I/O business, it would be a different story.

Zero-maintenance storage
Xiotech and Atrato introduced storage boxes that guarantee capacity, performance and uptime with no maintenance for 5 and 3 years respectively. These are storage game-changers.

That Seagate sold ISE to Xiotech after spending years developing it has to be one of their biggest blunders ever, several notches above buying Xiotech in the first place. The ISE is, in effect, a super disk that Seagate could have sold to all its enterprise disk customers.

Flash
2008 is the year that every major vendor - with the laudable exception of laser-focused WD - announced alliances and/or plans to enter the flash drive market. High-end SSDs will displace 15k high-end disks in the next 3 years.

But flash-in-disk-clothing is the near/medium-term solution. Fusion-io and Violin are on the winning architectural track. Flash belongs between the CPU and disk layers: that’s where we’ll get the most benefit for the added cost.

Hey, disk vendors: want to stick it to Intel, Micron and Samsung? Buy one of them. You are in the I/O business, not the disk business.

Commodity-based cluster storage
EMC’s Atmos, HP’s Extreme Storage 9100 and IBM’s XIV are commodity-based cluster storage. The important thing is the storage mainstream has embraced storage clusters based on commodity hardware and mostly open-source software. That’s what Google did years ago and soon many companies will.

Yes commodity hardware saves real money, as I and Bill Mottram of Data Mobility Group found out when we ran the numbers on HP’s 9100 vs Isilon, NetApp and Sun. We’ll see if Atmos is on the latest EMC price list when I do the updates later this month.

The StorageMojo take
2009 will be a great year for the hungry and flexible. The ongoing financial train wreck is trouble for Big Iron fans in the data center.

Fortunately, help is on the way. Look for my 2009 forecast before the end of 2009.

Courteous comments welcome, of course. Of the companies mentioned I’ve done work for HP and Fusion-io.

Shh! Disk drive at work.

January 2nd, 2009 by Robin Harris in Disk, Video

Funny and provocative video (thanks David!) from Sun demonstrating 2 things:

  • 15k drives are vibration sensitive - in this case to a shout a couple of inches away.
  • That Sun’s Fishworks analysis suite enables realtime analysis of storage behavior.

Bad, bad, bad vibrations. . . .
Vibration issues are old news to the mechanical engineers who design enclosures. The Tsunami Harddisk Detector uses that fact for earthquake detection.

A commenter on Brendan Gregg’s Fishworks engineering blog claims that replacing fans with bad bearings improved array throughput. That can’t look good to civilians.

The StorageMojo take
With any luck at all we should see a spate of competing videos proclaiming that Brand X enclosures pass the “shout test.” Of course, they’ll probably have to use Fishworks for the demos, which is all to Sun’s good.

I left Sun a decade ago. This reminds me that many of Sun’s best promo ideas come from the techies, not the marketers.

Fishworks is the kind of great technology that has always been at the root of Sun’s appeal. Translating that appeal into sales is Sun’s marketing challenge.

Courteous comments welcome, of course. Fishworks includes Bryan Cantrill’s brilliant Dtrace which I’ve been a fan of for years. Update: The too-often-right-for-his-own-good Wes says the drives aren’t 15k drives. I couldn’t find an RPM citation so I struck the 15k out above. A citation would be appreciated, eagle-eyed readers. End update.

Cloud storage is a component

December 22nd, 2008 by Robin Harris in Architecture, Cloud computing & storage

The cloud storage hype has been bothering me for some time (see Are there economies of scale in storage?). Even more irritating than the “storage as a service” meme.

The problem with cloud storage is threefold:

  • The availability isn’t as good as a disk drive.
  • The performance is limited by network latency and bandwidth, i.e. terrible by traditional storage standards.
  • The economies of scale for the clusters behind cloud storage remain undefined - but the pay-as-you-go model has a distinct CapEx advantage.

In short, it is a new class of storage with a distinct set of benefits and problems. The problem for product designers: how to best conceptualize cloud storage.

Componentry
That’s where thinking of cloud storage as a component makes sense. It is a storage device with distinct levels of reliability, availability, performance and cost.

Flash, disk drives, DRAM, SRAM, arrays, tape are all storage components. They all get built into storage devices or even - in marketing jargon - “solutions.”

Why not a service?
Dismissing cloud storage as a lousy service - limited bandwidth, high-latency, subject to Internet squalls and provider goofs - isn’t enough. It isn’t a service because it requires a framework of enabling technology around it to be useful - which is why it is a component.

A car wash, haircut or a Google search is a service. You show up in your car, with your hair or a browser and a complete transaction occurs. A job completes.

A component is instrumental in job completion, but it doesn’t do the job. A disk drive is a component. It needs power and a front-end interface - USB, 1394, eSATA - for the simplest use case.

Service, component, who cares?
A key problem in research is asking the right question. A key problem in marketing is giving the right answer to the right question.

In this case, the right question is “how can cloud storage be used to create a compelling value proposition for our target customers?”

The StorageMojo take
The number of similar cloud services on the market suggests the wrong question is being answered. While there is a market for raw cloud storage - as Amazon’s S3 has shown - the real opportunity is incorporating it as a component - in a solution to a business problem.

This is an example of where the common technical meaning of the term “service” - the provision of a discrete function within a systems environment - differs from the common marketing meaning. As a result marketing and engineering are talking at cross-purposes - again! - about a developing market.

As the most successful consumer storage products of the last decade - USB thumb drives and the iPod - show, embedding storage into an attractive package is key. Amazon may be the world’s largest supplier of OEM cloud storage, but the real money will be made by those who build it into a convenient solution.

Courteous comments welcome, of course.

Garth Gibson on supercomputer storage

December 16th, 2008 by Robin Harris in Backup, Clusters, Video

Garth Gibson, is one of the authors of the original RAID paper (pdf), CMU professor, founder of the Parallel Data Lab, founder and head of the Petascale Data Storage Institute and founder and CTO of Panasas, a maker of parallel clustered NAS systems. I caught up with him at the Seattle Scalability Conference in June and taped about 40 minutes of conversation.

After much procrastination I got the videos edited and up on YouTube a month ago. In 16.5 minutes Garth covers large scale file systems, massively parallel supercomputer failure modes and backup strategy. Panasas provides the back-end storage file server for 6 of LANL’s supercomputers, including the world’s fastest: a 1 petaflop sustained machine named Roadrunner.

Here is part I:

And here is part II:

The StorageMojo take
The problems that LANL is having today will be much more common in 10 years. Not that we’ll all be running huge informatics apps or 6 month simulations of nuclear weapon decay, but if they can figure out the software, many of us will have 64 core - or more - desktop systems equal to a respectable commercial HPC installation today.

Maybe we’ll be running personal informatics jobs, looking for nascent trends in, oh, I don’t know, fantasy sports or parallel NFS. Or hosting virtual 3D game worlds that keep evolving even when we’re away. Somebody will think of something cool to eat those cycles and terabytes.

Courteous comments welcome, of course. I did some work for Panasas last year. I also like their leadership on parallel NFS and object-based storage.

Many-cores hit the memory wall

December 8th, 2008 by Robin Harris in Architecture, Future Tech

Everyone in the data storage industry knows about the gap between I/Os per second of disk drives and processor I/O requirements. But there is a similar problem facing DRAM support of many-core chips.

Named “the memory wall” by William Wulf and Sally McKee in their 1994 paper Hitting the Memory Wall: Implications of the Obvious (pdf) they described the problem this way:

We all know that the rate of improvement in microprocessor speed exceeds the rate of improvement in DRAM memory speed - each is improving exponentially, but the exponent for microprocessors is substantially larger than that for DRAMs. The difference between diverging exponentials also grows exponentially; so, although the disparity between processor and memory speed is already an issue, downstream someplace it will be a much bigger one.

According to an article in IEEE Spectrum that time is almost upon us. Sandia national labs simulations predict that once there are more than 8 on-chip cores conventional memory architectures will slow application performance.

Update 1: here’s the graph from Sandia. It took me quite a while to figure out what it was saying - thanks commenter! - so I didn’t publish in the original post. As I said on ZDnet this morning:

Performance roughly doubles from 2 cores to 4 (yay!), near flat to 8 (boo!) and then falls (hiss!).

Many-cores fall over the performance cliff.

Many-cores fall over the performance cliff.


End update 1.

James Peery of Sandia’s computation, computers, information and mathematics research group is quoted saying “after about 8 cores, there is no improvement. At 16 cores, it looks like 2.” The memory wall’s impact is greatest on so-called informatics applications, where massive amounts of data must be processed, such as sifting through data to determine if a nuclear proliferation failure has occurred.

John von Neumann emphasized the point in his First Draft of a Report on the EDVAC (pdf)

This result deserves to be noted. It shows in a most striking way where the real difficulty, the main bottleneck, of an automatic very high speed computing device lies: At the memory.

Gee, bandwidth is important. I thought it was all IOPS.

Help is on the way
The Spectrum article notes that Sandia is investigating stacked memory architectures, popular in cell phones for space reasons, to get more memory bandwidth. Professor McKee has also worked on the Impulse project to build a smarter memory controller for

. . . critical commercial and military applications such as database management, data mining, image processing, sparse matrix operations, simulations, and streams-oriented multimedia applications.

Update 2: Turns out Rambus has a 1 TB/sec initiative underway. Goals include:

  • 1 TB/s memory bandwidth to a single system on a chip
  • Suitable for low-cost, high-volume manufacturing
  • Works for gaming, graphics and multi-core apps

The Terabyte Bandwidth Initiative is an initiative, not a product announcement. They’ll be rolling out some of the technologies in 2010 with next-gen memory specs. Courtesy of Rambus is this slide describing some of their issues:

Alas, it doesn’t look like Intel’s late-to-the-party on-board memory controller and Quick Path Interconnect in Nehalem will help us get ahead of the problem. And with multi-core, multi-CPU system designs, how do you keep the system from looking like a NUMA architecture?
End update 2.

The StorageMojo take
Given Intel’s need to create a market for many-core chips, expect significant investment in this engineering problem. It isn’t clear to what extent this affects consumer apps, so solutions that piggyback on existing consumer technologies - like stacked memory from cell phones - will be the economic way to slide this into consumer products like high-end game machines.

Expect more turbulence at the peak of the storage pyramid, which will further encourage the rethinking of storage architectures. That is a good thing for everyone in the industry.

Courteous comments welcome, of course. If anyone wants to make the case that von Neumann was wrong, I’m all ears.

Flash and the new storage pyramid

December 4th, 2008 by Robin Harris in Architecture, Enterprise, Future Tech, SSD/Flash Disk

I got a note from David Flynn, co-founder and CTO of Fusion-io (disclosure: I’ve done work for them) in response to The new storage pyramid. He makes several points about the nature of the array model that I wish I’d made.

Well worth the read.

David Flynn’s note:
Geat analysis Robin.

And, great comments.

My $.02 ….

I think it’s not just about the proprietary nature, the somewhat better performance and features, and the high markups that differentiates “storage arrays” from “clustered storage”.

It’s actually more to do with the vertically integrated nature of the business model of the companies in the array building business. This leads to proprietary architectures, higher margins and, true, somewhat better performance and features.

Let me explain through an analogy…

We used to get graphics workstations from SGI, Apollo, and other vertically integrated vendors, who sold everything end-to-end, down to the monitors and their own proprietary OS’s. These guys commanded HUGE margins - partly to reward their risky investment in solving a worthy, complex problem.

Similarly, the military (and other few others who could afford a million dollar price-tag) used to get flight simulators from Evans&Sutherlands who were also vertically integrated and insanely expensive. You even had niche vendors like Intergraph doing 3D graphics information systems who could justify their own proprietary architectures.

At least for a while.

They were all doing 3D graphics in one form or another. And, now, they are all GONE - thanks to the emergence of a component, the 3D graphics card.

With enough capability to be applicable across all of these different verticals, the 3D graphics accelerator has now shattered the benefit of running a vertically integrated business.

Today, there are myriads of “integrators” who make graphics workstations, flight simulators, GIS systems, etc. at very low margin by comparison. And, they do it by pulling together off-the-shelf components - all commoditized down to the software that provides even the high-value features.

They might have been inferior to the proprietary solutions at first, but not anymore.

Now, what happens when you introduce to the storage industry a component that commoditizes and trivializes the linch-pin reason for expensive proprietary disk arrays, namely the caching tier - using NAND flash.

Once anyone can easily get the performance across any use case (OLTP, OLAP, Data Warehousing, BI, VOD, content caching, etc. etc.) you no longer need vertical specific, highly tuned, proprietary solutions from vertically integrated companies.

Every capability that doesn’t migrate into the component itself becomes nothing but commoditized software to be layered on top by any number of interchangeable integrators. Things like replication, disaster recover, backup, dedup, and so on just become commoditized software that can run anywhere.

This is a classic Adam Smithian market evolution. What used to be a single, vertically integrated provider becomes a layered market where some people build the components, others integrate them (with some bit of value add), and you go to having many players competing on many levels.

And prices go down.

But, thankfully, (for those of us in the business of creating this componentized building-block) volume, productivity, and efficiencies all go up.

So, actually everyone wins. Including society as a whole.

Well, almost everyone wins. Everyone, that is, except for the proprietary array vendors who get caught by the innovators dilemma and a business model that used to be the correct one, but no longer is.

This generally makes them the slowest to simplify their proprietary infrastructures around the commoditized component - to help justify their investment into their heroic proprietary solutions.

In an effort to protect their margins, they endeavor to make things seem as complicated as possible. They do this, say, by preferring that NAND be forced to pretend to be an HDD and be put into HDD drive bays behind HDD protocols, where it has little ability to simplify things or get much additional performance.

They are the last to come out and say it can be simplified. Instead they’ll tell you you must have features X, Y, Z. And, see, those aren’t as good as with our proven architecture.

Let’s take high availability as an example. They aren’t going to tell you that a “shared nothing” strategy - where two separate RDBMS servers with terabytes of direct attached NAND inside of each use off-the-shelf log-shipping for asynchronous replication (or query replication to do it synchronously) to get fault tolerance.

No, they aren’t going to tell you that it’s actually simpler, more cost effective, and, here’s the real kicker… more fault tolerant to share nothing, than to use shared storage - no matter how fault tolerant they claim their monolithic storage array is, it’s still shared.

I’m not saying this market transformation is going to happen by tomorrow. But, given the geometric growth of the performance gap between processors and storage, and the geometric decline in cost of NAND flash - leading to a “Moore’s Law Squared” effect in the benefit to cost ratio - it is going to happen faster than people would think. Even considering the “stodgy” nature of storage folks who are in the business of obsessively caring for precious bits.

It doesn’t hurt that in this global recession companies are looking for ways to reduce costs while still needing to grow throughput. So, there’s more of a willingness to look at different, innovative ways to skin the cat.

I agree with you Robin. It will be a fait accompli by 2015.

David Flynn
CTO, Fusion-io

The StorageMojo take
Technology diffusion is a complex mashup of secular trends, technology development, individual creativity and happenstance. But the current direction of the high-end storage market points to the greatest change we’ve seen since the early 90’s and the advent of arrays.

The “Moore’s Law Squared” effect is particularly intriguing. Humans are terrible at estimating the impact of power functions, so this one is likely to be even more surprising than we dream.

Courteous comments welcome, of course.

The new storage pyramid

December 2nd, 2008 by Robin Harris in Clusters, Enterprise, Future Tech

OK, it is still a pyramid
Predictions of the storage array’s death struck some commenters as premature. Commenters raised a host of issues:

  • Cost. Low-end storage arrays are cheaper than clusters.
  • Complexity. The complexity of clustered hardware - all those cables and boxes - increases management costs
  • Functionality. “Unless the cluster storage also provides the same reliability, scalability, and supportability as the larger monolithic arrays. . . ” it won’t supplant traditional arrays.
  • Cost pt. II: Lower-cost modular arrays, combined with a software layer that knits them into a seamless whole, could provide a full-service storage infrastructure complementing today’s virtual servers.

History repeats itself
The issues are similar to the mainframe vs everybody arguments of the last 40 years. Within living memory mainframes from IBM and the 7 dwarfs - Burroughs, Sperry Rand, NCR, RCA, Honeywell, CDC and GE - went through the same process monolithic storage arrays will.

Mainframes faced the same negatives: costly; complex management; inflexible; limited applications; and optimized for batch computing in an interactive world. Proponents argued the positives: reliability; scalability; efficiency; security; and control.

Reinventing the wheel - without end
Mainframes were expensive because they were a) low-volume products and b) had high (60%+) gross margins. Each mainframe architecture had its own processors, peripheral interconnects, networks, OS, application software and sales and support groups.

Every mainframe company had to solve all the problems every other mainframe company did - at enormous cost.

Mainframes today
Mainframes are far from dead, but they are very different today. There are fewer vendors; they use commodity processors, networks and interconnects; run open source software such as Linux; and adjusted for inflation they are much cheaper.

That is the future of big monolithic arrays.

Monolithic arrays tomorrow
That we still have as many large arrays and vendors is due to the fact that the vendors have already gone far down the mainframe path. Commodity server motherboards, Linux, SATA drives and Xyratex enclosures are all common in high-end arrays, helping cut costs.

But at a fast approaching point, cutting costs isn’t enough. Vendors have to give customers good reasons to keep buying the big iron. The traditional mantra of availability, performance, scalability and supportability won’t hold customers forever.

Why?

Moore’s Law keeps moving the tiers
The industry has been pushing tiered storage in multiple guises for decades: HSM; ILM; and now, cloud storage. But customers embrace tiers out of necessity, not love.

The powerful visual picture of the layer-cake storage pyramid is deceptive. The x and y axes are cost and capacity, but they are only proxies for the application requirements of the layer above.

Array vendors want to believe that there will always be an “array layer” in the storage pyramid. But why should there be?

As Moore’s Law keeps moving commodity server performance up, the performance envelope of commodity-based storage systems will enlarge. With the commoditization of 10GigE, flash, 6 and 12 Gbit SAS and a 10x increase in areal density, the bandwidth to exploit higher CPU performance will push today’s “archive” cluster storage into monolithic array territory. At a lower price, too.

The software that ties commodity hardware together will improve, weakening the availability argument. If performance is bandwidth driven, pNFS will close the deal for clusters. Scalability goes to clusters today and will only improve with time. Supportability isn’t owned by hardware companies - plenty of software-only companies have cracked the code.

Here’s the future storage pyramid

The storage pyramid in 2015

The storage pyramid in 2015

Won’t arrays disappear?
No, but they will change. For example, they’ll look a lot more like cluster storage under the sheetmetal and GUI. Flash will be an integral part of the architecture - and not as a disk drive. There will be less add-on software, because more will be built in.

Arrays will continue to support legacy interconnects, such as FC and FCoE - remember, this is the future we’re talking about - and legacy OS’s that commodity-based storage won’t. Storage is a conservative part of IT and arrays won’t disappear.

The StorageMojo take
I was at DEC when the company was growing fat selling VAXen. Many predicted that PCs would be the death of the minicomputer companies, but it took 8 years to hit DEC.

There is life after arrays. Minicomputers still exist - and are selling more than ever - but the business model is totally different. The loss of 30 gross margin points forced the issue.

Storage requirements will keep growing. But the days of 60%+ gross margins are drawing to a close. Survivors will follow classic military strategy: concentration of force; short supply chains; and clear objectives.

Courteous comments welcome, of course.

Stupid storage failures

November 25th, 2008 by Robin Harris in Architecture, Disk, SSD/Flash Disk

Valiant but doomed
The ZFS discussion thread had an interesting comment from Sun’s Jeff Bonwick, architect of ZFS, on storage device failure modes. How do you know a disk or a tape has failed?

You don’t. You wait, while the milliseconds stretch into seconds and maybe even minutes. Jeff states the problem - and Sun’s solution - this way:

. . . we’re trying to provide increasingly optimal behavior given a collection of devices whose failure modes are largely ill-defined. (Is the disk dead or just slow? Gone or just temporarily disconnected? Does this burst of bad sectors indicate catastrophic failure, or just localized media errors?) . . . there’s a lot of work underway to model the physical topology of the hardware, gather telemetry from the devices, the enclosures, the environmental sensors etc, so that we can generate an accurate FMA [Fault Management Architecture] fault diagnosis and then tell ZFS to take appropriate action.

With all due respect to Jeff, that solution seems iffy: how will you ever keep up with all the devices and firmware levels needed to make that work?

A community of prima donnas
There are lots of messy failure modes in computer systems. The literature around the Byzantine Generals Problem (Wikipedia - for a rigorous treatment download The Byzantine Generals Problem by L. Lamport et.al) tackles the problem of the malicious server in a community of network servers. That is a hard problem.

Knowing whether a storage device is alive, dead or only sleeping shouldn’t be so hard. They have powerful 32-bit processors - more powerful than a VAX 780 - and lots of statistics on what the drive is doing.

It seems like a disk could give a modulated heartbeat signal to drivers - “ready” “reboot” “caught in retry hell” “dead” - to decrease uncertainty.

The StorageMojo take
Drive vendors may think that non-standards for drive condition reporting are a form of lock-in, but that misses the bigger picture: the quality and timeliness of condition reports - even with a standard format - would be a competitive differentiator.

At the margin it would help slow the move to commodity-based cluster storage by enabling array vendors to improve their error handling and perceived reliability. It would also help disks versus flash SSDs, whose perceived reliability is partly due to the gap between user-judged drive “failures” and vendor “no trouble found” test results.

Storage systems all know how to deal with disk failures - they have to. So drive vendors, how about getting together to help make knowing a drive’s status a lot easier? Hey, IDEMA, make yourself useful!

Courteous comments welcome, of course.

Economic crisis and the storage industry

November 19th, 2008 by Robin Harris in Clusters, Enterprise, Future Tech

Yes, Virginia, the storage industry will survive the crisis
Economists and business leaders generally agree that the current, as yet unofficial, recession will be the worst we have seen since the Great Depression. The credit bubble has popped and we are facing global de-leveraging that will take years to unwind.

De-leveraging is fancy term for “a lot less money rolling around.” The computer industry started after the Great Depression so this will be the worst times we’ve ever seen.

How bad will it get for storage?
Storage is a special case. Disk drives underlie everything we do and they show no sign of slowing their capacity increases and price drops.

Data growth rates are a little less certain - contracting businesses produce less data - but the economic advantages of online data continue to grow as cost per gigabyte drops. Even in the financial sector someone is going to have to unravel all of those credit derivative swaps and synthetic securities that the “rocket scientists” - heckuva job, guys! - developed.

Where will this impact IT operations? Right in the heart of the array business.

A little smarter, a lot cheaper
Assume 80% of all business data is unstructured. And suppose 80% of that data is stored on storage arrays that are optimized for transactional data.

If RAID arrays average $6/GB today and cluster storage averages $2/GB we can begin to estimate the potential impact. In a perfect world 64% - 80% of 80% - of all corporate data could be migrated from high cost storage arrays to much lower cost storage clusters.

If the storage array business is a $21 billion a year today that means there is roughly a total available market of $13 billion of IT spend that could go to storage clusters. If storage clusters are 1/3 the price of storage arrays that suggests a total storage cluster business of $4 billion a year.

That ignores, of course, the traditional impact of sharply lower storage costs: a rapid increase in the amount of data stored. Online and easily searched data is much more valuable than data is stored on paper or tape. A first-order guess is that in today’s market there is the potential for an $8 billion a year storage cluster IT spend.

That’s the theory, anyway. The reality is that most IT professionals will not give up the storage arrays they know and love without a fight. But the economic pressure will be unrelenting.

Winners and losers
This won’t be a rapid process. The early not-very-good storage arrays came out in 1990 and took 8 years before sales reached 50% of the capacity of enterprise storage. The economic advantages of cluster storage are greater and the pressure to contain costs much stronger today. It will be 6 years before half of all enterprise storage capacity sales are in storage clusters.

The winners will be those companies that embrace and extend the capability of storage clusters the soonest. Among large companies HP and EMC appear to have the lead. Among the small companies several will be purchased while others will continue to grow as independent entities.

The losers? IBM appears to have no discernible strategy. NetApp is bogged down in its efforts to integrate the GX global namespace with the contradictory requirements of its traditional Data OnTap code base.

Sun has good building blocks but will fail if they lead with Lustre. HDS will wait until the market is defined to start moving - but that may be too late. This is a software play in more ways than one.

Smaller companies in the array business have a steep learning curve with cluster storage. Expect most of them to fade over time. There will be opportunities for OEM suppliers to the mid-tier vendors.

The StorageMojo take
The age of the raid array is coming to an end. They won’t disappear anymore than mainframes have. But they will become much less common. The array business will see single-digit sales drops and general long-term stagnation. The storage cluster business will show robust growth.

The race for storage cluster dominance is still young. There are many variables where newcomers and existing players can find or fumble important advantages. Can storage clusters be effectively productized? Or will integration requirements favor service-oriented companies? How will flash be best integrated into storage clusters? How will the SMB market be cracked?

The economic crisis does not create new trends. It accelerates existing ones. IT professionals should not underestimate the power and impact of the current crisis on once sacrosanct IT budgets.

IT likes to talk about “business partnership.” Now is the time for action. Show the CFO that you know how to do more with less and you’ll be a partner. Insistence on business as usual is the wide road to a pink slip.

Courteous comments welcome, of course. Disclosure: I’ve recently done some work for HP on their announced but not-quite-shipping Extreme Data Storage 9100. I was impressed.

Atmos: EMC rolls the dice

November 17th, 2008 by Robin Harris in Off-Topic

EMC’s Atmos, the product formerly known as Hulk/Maui, has gotten the full EMC marketing machine treatment. With a twist: EMC is rolling the dice on an unproven concept.

If it’s eat lunch or be lunch, EMC prefers to dine. I like it.

The pig
I covered Atmos’ academic antecedents - OceanStore and Antiquity - in an earlier post. After looking at the announcement material it is clear that Atmos offers far less than the Berkeley folks envisioned.

They may want to get there, but they aren’t there yet. That’s why we have v1 software.

Squinting past the hype
There are some oddities in the announcement.

  • No customer endorsement. Normal EMC announcements always have joyful customers endorsing the product. For a product that has been shipping since June - according to some EMC bloggers - that Atmos doesn’t is unusual.
  • “Powerful object metadata and policy-based information management capabilities . . . .” Atmos is not a file system - file systems exist on the client - so the lack of OceanStore’s introspective data management feature is ugly.
  • How do you access it? Most attention has focused on REST and SOAP. It does support CIFS, NFS and IFS (Installable File System - haven’t seen that in a while). The latter are more important.
  • Centera vs Atmos. EMC is at great pains to claim that Atmos doesn’t compete with Centera. Obviously it does, since it would be trivial to add the Centera’s features to a cheaper storage infrastructure.
  • EMC tossed out the IBRIX cluster file system in favor of something they gen’d up fairly quickly. A CFS is non-trivial so one must wonder how stable and feature-rich the local storage pools are.

The perfume
All the touting of the policy-based management doesn’t answer the need for introspective object management. In OceanStore, the storage system doesn’t know about relationships between objects - it isn’t a file system - so introspection is important for the system to react intelligently to change.

Let’s say that a webpage with a video on it has links to other videos and multi-megabyte downloads. The policy system in Atmos relies on the user to specify the content’s policy. But if the videos and downloads are specified with different policies, the availability of each component on the page will vary when it catches fire on the web.

An introspective system would note that these objects are associated and move/replicate them together. Introspection isn’t easy, but in a billion object system, humans just get in the way.

The StorageMojo take
None of the big storage companies is doing more to shake up the industry than EMC. Atmos is bold, whatever you think about its chances.

The important point is that EMC is embracing, however gingerly, commodity storage for enterprise customers. They aren’t the first with sub-$2/GB bulk storage, but CIOs listen to them.

Atmos batters EMC’s core value prop with a beta+ product for a not-sure-it-exists nascent market. Atmos is EMC’s boldest move since the original Symm. It may also turn out to be its most successful. Or not.

Atmos seems to have an unusual dispensation from profitability in the interests of giving the technology and the market time to mature. This speaks to a seriousness of purpose that competitors would be wise to note.

From an architecture perspective it isn’t clear whether the overhead of an Atmos is worth the cost. Perhaps a simpler content delivery network structure would deliver 95% of the benefit of Atmos at half the cost.

Right now the product is far from fully baked. EMC will no doubt learn valuable lessons about what the global 5000 and ISPs need from Internet-era storage. Competitors who wait too long will be looking at a steep learning curve.

Google’s Jeffrey Dean is actively looking for an integration strategy to knit together their global collection of data centers into a single namespace. While they have special requirements their reluctance to embrace an OceanStore-like architecture suggests that global cloud storage hasn’t reached a technical consensus.

Make no mistake: Atmos is huge. Whether it wins or someone else does is beside the point. The battle for massive-scale commercial storage has been joined.

Courteous comments welcome, of course.

The computer science behind EMC’s cloud storage

November 12th, 2008 by Robin Harris in Architecture, Clusters, Enterprise, Future Tech

EMC has announced Hulk/Maui, now known as Atmos. I’m flying to Boston today and don’t have access to EMC’s announcement documents.

But I have something better: the papers that provide the theoretical underpinning for Atmos. They provide an in-depth background that isn’t often available for new products.

These papers have too many interesting details to summarize them all. Here are some points that strike my fancy. YMMV.

If you want to understand Atmos these papers are essential. Details of EMC’s implementation will differ of course, but the underlying architectural trade-offs and management issues remain.

A 10 trillion file store
In 2000 a UC Berkeley paper OceanStore: An Architecture for Global-Scale Persistent Storage, authored by John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, and Ben Zhao, laid out the architecture of what is now Atmos. EMC provided funding for the research and Patrick Eaton went to work for EMC a couple of years ago.

The abstract says:

OceanStore is a utility infrastructure designed to span the globe and provide continuous access to persistent information. Since this infrastructure is comprised of untrusted servers, data is protected through redundancy and cryptographic techniques. To improve performance, data is allowed to be cached anywhere, anytime. Additionally, monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through proactive movement of data.

The design center: 1 billion users; each storing 10,000 files. 10 trillion files. Utility storage indeed!

A cluster of clusters
OceanStore is a software layer that creates a global storage cluster. While the paper simply refers to servers, the servers can be clusters as well.

EMC’s engineers chose to use a 3rd party cluster product - IBRIX I think - for the local data stores so they could focus on the layer that glues the sites together. Each local store can itself be a petabyte or more.

Update: several commenters assure us that IBRIX is not the local cluster file system. EMC is using some open source software in Atmos. End update.

Untrusted infrastructure
A key goal of the paper and its prototype was to assume untrusted infrastructure - a phrase that fairly sums up today’s Internet. Only clients are trusted with cleartext - all stored content is encrypted - but most servers are assumed to be working correctly and to help maintain file consistency.

Nomadic data
A global storage system has a unique requirement for locality. But it also needs to be able to store data anywhere, anytime to maintain persistence in the face of outages and catastrophes. Thus data has to be separated from its physical location.

Files are encrypted at the source and stored as persistent objects with unique Global User ID’s (GUID). OceanStore has no knowledge of a file’s objects, so it relies on introspection, a mechanism that notes correlations among objects.

Thus the system moves highly correlated objects together, reducing the latency problems that a non-introspective object store faces in a global infrastructure.

Ciphertext
The paper notes that restricting OceanStore to ciphertext limits what can be done with the data. But there is more flexibility that you might suppose.

The operations compare version, compare-size, compare-block, and search are all possible. In addition there are several feasible update operations, such as replace-block, insert-block, delete-block and append.

Applications
Multi-petabyte data stores for scientific, security or commercial applications are obvious applications. But telcos and ISPs are most interested in mobile apps.

The authors call out email as an apt OceanStore application.

OceanStore alleviates the need for clients to implement their own locking and security mechanisms, while enabling powerful features such as nomadic email collections and disconnected operation. Introspection permits a user’s email to migrate closer to his client, reducing the round trip time to fetch messages from a remote server. OceanStore enables disconnected operation through its optimistic concurrency model—users can operate on locally cached email even when disconnected from the network; modifications are automatically disseminated upon reconnection.

APIs
OceanStore offered its own API. But the authors also developed facades for the base API that emulated a Unix file system. a transactional database and a World Wide Web gateway.

Replication
OceanStore used erasure codes, not unlike the mechanism Cleversafe uses for its distributed data store system. Replica management is a major task for a global system and the paper goes into some detail on their solutions.

The 2nd paper
A 2nd paper, Antiquity: Exploiting a Secure Log for Wide-Area Distributed Storage (available at the same link above) published last year, expands on the OceanStore work.

. . . the secure log interface implemented by Antiquity is a result of breaking OceanStore into layers. In particular, a component of OceanStore was a primary replica implemented as a Byzantine Agreement process. This primary replica serialized and cryptographically signed all updates. Given this total order of all updates, the question was how to durably store and maintain the order? . . . The secure log structure assists the storage system in durably maintaining the order over time. The append-only interface allows a client to consistently add more data to the storage system over time. Finally, when data is read from the storage system at a later time, the interface and protocols ensure that data will be returned and that returned data is the same as stored.

Finally, self-verifying structures such as a secure log lend themselves well to distributed repair techniques. The integrity of a replica can be checked locally or in a distributed fashion. In particular, we implemented a quorum repair protocol where the storage server replicas used the self-verifying structure. The structure and protocol provided proof of the contents of the latest replicated state and ensured that the state was copied to a new configuration.

The StorageMojo take
Bravo! EMC is taking cutting edge computer science and turning it into a product. I’ll comment on the specifics of Atmos later.

New storage paradigms are rare. To have so many academic papers on the underlying technology is rarer still.

EMC would never provide this much information themselves - it would slow down the sales cycle. But these papers - and the couple of dozen others on the OceanStore site - provide implementors with a wealth of technical background.

Comments welcome, of course. Anybody want to comment on what these papers mean for the patentability of Atmos?

How bad do the ads suck?

November 10th, 2008 by Robin Harris in Off-Topic

I’ve been working with IDG to monetize StorageMojo through ad sales without much success. The latest iteration of the process you may have noticed: the ad that covers the page until you click “close.” They pay OK, but they aren’t the difference between hamburger and steak.

Which, BTW, you are welcome to do as soon as you like. Please don’t suffer through them on my account.

I think I was told that the ad would only appear like once a week per viewer, but I don’t know if that is correct or true.

Anyway, I invite StorageMojo readers to comment. What are the right limits for ads on StorageMojo?

Are the “roadblock” ads - I think that is what these coverall ads are called - too much? How much advertising is OK?

The StorageMojo take
I make no apologies for being a capitalist tool. But I also don’t want to drive off readers either. So let me know what you think.

If anyone has a line on a low-overhead ad network that pays reasonably well for a high-quality audience, I’d love to hear about it.

Courteous comments welcome, of course. Especially on this topic. Wes, thanks for the tickler and yes, I think I know where you are.

Flash-talking with Fusion-io

November 7th, 2008 by Robin Harris in Off-Topic

Fusion-io commissioned me to create a video with David Flynn, Fusion-io co-founder and CTO, talking about their architecture and the benefits of high bandwidth NAND flash. Even though I’ve been researching flash for a couple of years, some of David’s comments surprised me.

Flash doesn’t make a good disk
Anyone who cares to can track how my view of flash has evolved. From early enthusiasm, based on my happy experience with a flash-based HP Omnibook 300 - the original netbook - in the ’90s, to increasing skepticism.

The “aha” moment came at the Flash Memory Summit in August, when an industry panel agreed that

. . . NAND flash is best seen as an extension to DRAM and a layer between DRAM and disk - not as the guts of a disk drive replacement.

BTW, I started skeptical on Fusion-io and have become a convert. Go figure.

The learning continues
Fusion-io isn’t the only company offering flash storage in a non-disk format, but they do seem to be furthest along. I think their perspective is way more important than, say, Seagate’s. Here’s the video.

The StorageMojo take
Every time a new technology appears, our first impulse is to recreate the products of the old technology with it. Such is the case with flash.

We’ve run into the limits of the old disk/RAID/array/SAN paradigm. With storage clusters, flash and changing workloads we now face the exhilarating - and sometimes frightening - prospect of re-architecting our storage infrastructures.

Fusion-io won’t be the final word on flash, but they’ve made a great start. Not to mention a real head start.

Courteous comments welcome, of course.

Blu-ray is dead. Now what?

October 30th, 2008 by Robin Harris in Off-Topic

The window for Blu-ray success is rapidly closing (see Blu-ray is dead. Heckuva job, Sony!). Which means that 50 GB writable disks will never cost $0.35 a piece.

What is a storage hungry consumer to do?

Massive removable/transportable storage
Together cheap CD/DVD media, thumb drives and ever-growing file sizes killed floppies, Zip drives and all the other removable magnetic disk media. Removable optical media may be next.

Historically, successful PC removable media have stayed in a fairly narrow capacity band relative to hard drives - somewhere between 10x and 50x. If the average PC has a 500 GB hard drive then a removable media between 10 GB and 50 GB is needed.

Dual-layer DVDs are just on the ragged edge of that number while dual layer Blu-ray could handle hard drives up to 2.5 TB. If Blu-ray will never achieve the ubiquity and low cost of DVDs what will fill the gap?

Meet the candidates
Flash drives are a promising alternative. Large capacity thumb drives are available today for about $2/GB. In 2 years those drives will be $.50 a gigabyte or less.

That doesn’t really compare to a Blu-ray writable media at $2 for 50 GB in 2 years, if that comes to pass. But flash drives do have advantages in size weight, ruggedness, and the ubiquity of USB ports.

Cloud storage is another option. For example, Mediasilo offers a service that stores and password protects individual files so the owner can control distribution.

Network bandwidth is the key bottleneck. Countries that have much faster Internet access than the United States could put this to work with some fairly large files. Here in the USA however it won’t be practical for large files for years to come.

Disk drives may be the most promising option. Smaller drives with better shock specs - check out Toshiba’s new 250 GB 1.8 inch drive - could handle just about any transportable application.

If drive vendors would start specifying their drives as an archive medium - even if it’s just five years - they might be able to sew up the home and SOHO archive market. The iVDR initiative is a step in the right direction, but like Blu-ray they are trying to serve too many markets.

Mempile and Inphase, 2 different approaches to high-capacity optical storage, are unlikely successors. Both appear viable for niche markets, but massive investment is required to take them to consumer ubiquity. What app will take them there?

The StorageMojo take
Removable media are the fruit flies of the storage industry: fast breeding and short lives. The pace of innovation is a good thing but a consumer standard needs a 10 year life.

Letting Hollywood drive next-generation media formats doesn’t work. Downloading will supplant physical media for consumers.

There will be no optical replacement for the writable DVD that offers its ubiquity, low cost and interchangeability. Like removable consumer magnetic media, optical has reached a dead end because there is no consumer driver for 250 GB+ optical media.

The good news: consumers will want to archive. There is a huge opportunity for the company that can figure it out.

Courteous comments welcome, of course.

Cool kit at SNW

October 26th, 2008 by Robin Harris in Off-Topic

Every silver lining has a cloud
As Storage Networking World’s go this was quiet. It is always good to get together with industry colleagues. And to be able to look executives in the eye when I asked them hard questions.

Everybody is antsy about the economy. With the IPO market shut down and the VC’s pulling back, now is not the time to be an undercapitalized or overspending startup. Horde cash and market wisely.

It is also not a good time to be flogging the Same Old, Same Old overpriced kit. With layoff rates rising fast and American underemployment at its highest level in 15 years “change” is in the air.

IT pros should be asking themselves “am I part of the solution, or part of the problem?” Because the gimlet-eyed finance guys are looking at you and asking the same thing.

Good news
The good news: there are options today that weren’t available during the dot-bomb meltdown.

The bad news: you’ll have to do some homework and talk to some new vendors to figure out what these companies can do for you.

StorageMojo’s favorite companies at SNW

  • Hifn I have had a hard time getting a handle on this company - starting with pronouncing the name - but I’ve got it now: a storage network appliance that encrypts, compresses and, in a newly announced feature, de-duplicates. They don’t sell direct to end-users but maybe your integrator or SAN vendor resells them.
  • Xiotech CEO Casey Powell announced in double digit quarter over quarter growth figures due to the success of their Integrated Storage Element product. ISE is a game changer for the storage industry, but Xiotech needs to do a better job of articulating the benefits to non-IT execs.
  • Storewize offers an appliance that sits between clients and file servers. It compresses and deduplicates data before it gets to the filer. Not only does it stretch storage capacity but it also improves filer performance by reducing bandwidth requirements. Works at GigE wirespeed.
  • Permabit has been shipping their cluster-based Enterprise Archive for two quarters. Permabit’s CTO, Jered Floyd, has a blog with a great post on why - among other things - it is time to stop talking about content-addressable storage (CAS).

However, my favorite this SNW is Axxana (see Axxana fixes the speed of light). They are a game-changer for high-end disaster protection. Too bad you can’t buy them today.

The StorageMojo take
Compared to the dot-bomb fiasco, storage companies are in better shape today. 8 years ago over-funded startups were buying big Symms and E10000s by the truckload. When the music stopped a lot of new kit was selling on Ebay for pennies on the dollar.

Endemic over-buying is not today’s problem. As the world-wide deleveraging continues the push to do more for less will accelerate.

Somebody is going to have to unravel the underlying value of all those toxic credit default swaps and securitized mortgages. That will take a lot of data storage.

The industry will take a hit, no doubt. But you won’t see the kind of dot-bomb u-turn that saw EMC sales drop several billion in one year.

The world’s financial industry, whose growth has exceeded worldwide GDP growth for the last 30 years, is in for a long spell of below average growth. Health care looks like a winner though.

Courteous comments welcome, of course.



Next Article »
StorageMojo RSS Feed January 2009 December 2008 November 2008 October 2008 September 2008 August 2008 July 2008 June 2008 May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007