Cloud storage symposium impressions

by Robin Harris | Thursday, January 22, 2009 | Architecture, Cloud computing & storage, Future Tech | 12 comments

Some quick impressions from the SNIA cloud storage symposium.

Not everyone believes in economies of scale
At least one presenter questioned whether there are economies of scale that justify the higher latency and lower bandwidth of cloud storage. I recently wondered about that myself.

But since then I’ve checked James Hamilton’s work on cold bulk storage. I’m now comfortable that there are significant economies of scale – at least for rarely accessed data – in a well-architected and managed large-scale data center.

There’s one more problem with economies of scale: some strategists and analysts are unclear on the concept. “Why can’t any enterprise do what Amazon or Google do?” they ask.

If 500 petabytes costs less per GB than 10 PB does, then the economic pressure to build 500 PB data centers is constant. If your company only needs 10 PB you will never be as cost-effective as the 500 PB data center.

Every class A data center has its own diesel generator set, but they get 99% of their electricity from the power company. Why? Because it’s cheaper.

ZFS flagged
Several presenters mentioned Sun’s ZFS as a significant enabler of cloud storage, none more enthusiastically than Joyent’s Ben Rockwood, author of the excellent Cuddletech blog. Ben made a compelling point about OpenSolaris: given Solaris’ industrial strength and many cool features – like Dtrace and ZFS – why wouldn’t you use it instead of Linux?

Most surprising company
You know the cluster storage software company with hundreds of customers, the leader in healthcare image storage and archiving, whose resellers include HP and IBM, with 10s of petabytes under management? Me neither.

Meet Bycast. They’ve been in business 10 years. Coolest feature: you can set it up so you don’t have to back up the data. Yes, people are doing that in production today.

The StorageMojo take
Ever since Google built a huge, low-cost storage infrastructure from commodity parts, the proprietary array business has been living on borrowed time. Optimized for structured transactional data, traditional cached RAID arrays will be around for many years to come, but expect a long decline.

The growth in file data, especially consumer digital content, has made data both cooler and massive. As network bandwidth improves, remote storage becomes more attractive.

There are 3 key elements to the cloud puzzle:

Economies of scale. The steeper the slope the faster data will migrate to remote storage.
Network bandwidth. A faster network makes remote storage more compelling.
Component, product, solution? People don’t want to buy storage – they want save and protect their important data.

Courteous comments welcome, of course. Get copies of the presentations here.

12 Comments

Rob on Thursday, 22 January, 2009 at 6:46 pm

Don’t forget about latency!

http://www.communities.hp.com/online/blogs/datastorage/archive/2009/01/06/don-t-pull-on-superman-s-cape-these-storage-industry-experts-do-just-that.aspx

“”Local disk subsystems these days typically deliver data response times of five milliseconds or less. What does the Internet yield(i.e. in the context of Atmos storage performance)? To research this we conducted a few simple tests. To sample typical Internet delays, we pinged 15 of the most popular sites as listed by Alexa, the Web Information Co., once a second for a period of one minute with the following results:
â€¢ Average Latency 72 ms
â€¢ Maximum Latency 142 ms
â€¢ Minimum Latency 25 ms ”
This should cast great performance concerns for using Atmos for any application that has transactional or interactive performance requirements”

So yeah, applications that aren’t response time sensitve, go right ahead.
I can overhear a conversation in a few years:
“It’s the 21st century, why is this freakin’ email so stinking slow!!! I’m going to
tear my hair out!”

But in fairness, I think the story is changing even as we speak. Private clouds, yeah that’s the ticket!

http://chucksblog.emc.com/chucks_blog/2009/01/the-emergence-of-private-clouds.html

Certainly wouldn’t want your cloud “out there” where the response times are IO punishing. I think physics is going to keep IO close to the servers for business apps. Maybe archived medical images are a remote or outsourced cloud fit.
David Magda on Thursday, 22 January, 2009 at 7:36 pm

Ben did a presentation called “Storage in the Cloud” at the first OpenSolaris Storage Summit in September 2008:

http://blogs.sun.com/video/entry/open_storage_summit_ben_rockwood

Not sure how similar they are, but thought you might be interested.
Storagezilla on Thursday, 22 January, 2009 at 8:07 pm

Why not ZFS? Good question. I can’t get it out of the box with Linux (Well without using FUSE) but the most compelling reason is that the Linux community has thrown it’s weight behind Btrfs. Where they lead a lot of commercial entities will follow. Yes it’s not finished and ZFS has had more people hammering on it over a longer period of time but the ZFS team have to now see it’s head lights in the rear view mirror and notice that Oracle is doing the driving.

People will hang on ext3/4 until btrfs gets to where it needs to be and at the speed it’s moving that’ll be sooner than later.
Jeff Mancuso on Friday, 23 January, 2009 at 7:24 am

The big hurdle right now is easily and efficiently getting your data to and from the cloud. S3 achieves reasonable success, but not everyone wants to write a bunch of custom code to move their data over HTTP. Other players have their own custom solution.

Economies of scale will play the biggest role in the long run, but I look first to Robin’s post on Cloud Storage as a component [http://storagemojo.com/2008/12/22/cloud-storage-is-a-component/]. We’re trying to solve part of this problem with our software, called ExpanDrive [www.expandrive.com – mac, windows, soon linux]. It mounts SFTP as an aggressively optimized filesystem. It’s fast, it feels good, but most importantly it is EASY.

Without filesystem access to cloud storage, the developer overhead of getting data to and from the cloud in a well understood manner will be the biggest limiting factor – in my opinion.
Nick Brown on Friday, 23 January, 2009 at 12:37 pm

Just to point out, the link to James Hamilton’s article on bulk storage is:-
http://perspectives.mvdirona.com/2008/12/22/TheCostOfBulkColdStorage.aspx

The link you gave is missing the ‘a’ in ‘aspx’.
Cheers.
Wes Felter on Friday, 23 January, 2009 at 1:47 pm

Rob, clearly the answer is to move your servers into the same cloud that holds your data. 🙂

I agree that many people will need filesystem semantics. Amazon has made some progress in this direction with EBS; it’s only a matter of time before we have iSCSI and NFS working well in the clouds.

I wonder how many Atmos racks you have to buy to get Hamilton-like economics, although iI doubt EMC will tell us.
Jered Floyd on Friday, 23 January, 2009 at 1:57 pm

I think Cloud is definitely going to happen, but I’m really concerned about the maturity of the systems today. These people are trying to innovate on both product and service at the same time, and that’s a real challenge. With mature products, outsourced hosting can provide better uptime than you can manage yourself, at least cheaply, but it’s not clear that’s the case yet with services like S3 — there have been extended outages. It’s horrible to feel helpless in that situation, and you don’t get compensation for lost business.

We really need standard interfaces, I humbly suggest XAM, to allow this market to mature. Also, the types of data that can be stored in Cloud are limited. For services hosted in data centers, it’s great because there’s lots of connectivity. For applications at a business, going out to the cloud over their commodity internet connection will be a pain. A T-1, still what many folks have, is just outrageously slow for storage. This is the biggest challenge to online backup and recovery today, for example.

–Jered
Brainy on Saturday, 24 January, 2009 at 11:22 am

Regarding btrfs, it is true that the LInux community throws all their weight behind btrfs.

The problem is: It’s a little too late.

According to the development team, btrfs should be ready for production in 2012. Timetables slip, and as we have seen with ZFS, the first appearance in Solaris wasn’t production ready for most environments.

In the meantime ZFS has taken huge steps forward. There have been many enhancements, especially with SSD, that btrfs has still to implement.

Not to forget that ZFS support is soon ready in Mac OS X, FreeBSD and NetBSD, whereas btrfs is limited to Linux due to the GPL license. So no easy migration is possible (e.g. zfs send/receive, or just plugging disks in).

Let’s say btrfs is ready in 2012, ZFS will already be proven for many Petabytes of data (e.g. Lustre for ZFS coming out next year). Btrfs will just not have this trust level on day one, while ZFS will have new features by then (de-duplication, automatic data migration etc.)

To be honest, I want my data to be save today, and not in 2012, so Linux is not an option for me.
xfer_rdy on Monday, 26 January, 2009 at 9:43 am

This is what Larry Ellison said cloud computing: “The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do, The computer industry is the only industry that is more fashion-driven than women’s fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?”

Richard Stallman (the GNU guy) says:”It’s stupidity. It’s worse than stupidity. It’s a marketing hype campaign.” They feel its every thing we are already doing.

Thinking about it, really what is this cloud “stuff”? Listening to the Ben Rockwood video, I’m left with the impression its a branding for “outsourcing” IT functions. (That will work until a company can’t get to their data for 3 or four days.) Or is “the cloud” just a revitalization of Web 2.0/Federated applications ? That every CIO has spent millions on and not seen a return ? Or is “the cloud” the new “virtualization”, an ambiguous term for over bloated and over complicated failover ? Virtualization was a pig in lipstick with “needs”.

If we did that to virtualization (failover), what will happen to cloud computing and cloud storage? When we look at the sky will we be forever reminded of careers in ruin, mass unemployment and customer base that will never trust again ?

I think both Ellison and Stallman are “dead nuts” on. What clinched for me, is when I hear a backing store, like ZFS, is a primary enabler of cloud storage or in the grander scheme, cloud computing? Is anyone really listening or has a Timothy Leary devotee been spiking the koolaid at the conferences? Are we so desperate that we will blindly chase what we are told is the next shiny object? Mind you we are being “told” its shiny, we can’t even tell if it is for ourselves. I think we are looking for an excuse to ignore our common sense.

The other day I can across a company that is selling a storage appliance with some version control software on it. They are calling it cloud storage. The rebranding from “networked” to “cloud” is out of control. There will be a backlash because of this practice.

The “cloud” mutual admiration societies are missing the point. And for storage, SNIA is so far off the mark the rest of the cloud computing industry don’t even know they exist. And as the rest of the cloud world races toward the sea, trying to standardize based on company’s products; SNIA with XAM ( the new hba API) has fallen down a deep well.

Cloud computing and cloud storage are not technologies, platforms or solutions. They are use models; how businesses and consumers expect applications and the internet will work for them, how they will use it, if it will anticipate their needs and give them capabilities they never before considered. Sort of the “Do What I Mean Not What I Say” interface. (remember virtualization ?)

A metaphor for users is; people staring at clouds in the sky on a summer day; and seeing elephants, dogs, fish, cars, cooked turkeys and cranberry sauce, their aunt martha.. For everyone in the computing industry, we’re looking up and only seeing dollars.

Does anyone believe this can be pulled off in a reasonable amout of time when there is such disparity between the users and the providers?

Do you think the storage industry should sell another rebranded, unreliable, “needy pig in lipstick” that didn’t work before; when people and companies are downing financially? Are we becoming no better than the banks and credit card companies ?
Han Solo on Monday, 26 January, 2009 at 1:17 pm

>why wouldnâ€™t you use it instead of Linux?

Because Linux can do the cool same things with BTRFS, and without SLOW-laris.

http://storagezilla.typepad.com/storagezilla/2008/09/rise-of-the-btrfs-startups.html
Pete Steege on Thursday, 29 January, 2009 at 7:45 am

How are storage cloud companies handling the “first backup” issue? Multiple terabytes or petabytes that need to be migrated to the cloud initially?

The incremental part of the process is a no-brainer.
David Slik on Thursday, 29 January, 2009 at 7:33 pm

Pete is absolutely right in pointing out that the initial ingest of bulk data into cloud storage is a significant issue, especially when bandwidth between the customer and the cloud provider is limited.

We’ve found that this is a very important part of enabling customers to take advantage of off-site storage, and I’ve written about our experiences and thoughts on my blog.

http://intotheinfrastructure.blogspot.com/2009/01/jump-starting-off-site-storage.html