Clouds over Berkeley: the RADLab reviews cloud computing pt. 1

by Robin Harris on Wednesday, 18 February, 2009

Cloud computing: it’s here; it’s real; and it’s cheap

UC Berkeley’s Reliable Adaptive Distributed Systems Laboratory has published a paper entitled Above the Clouds: A Berkeley View of Cloud Computing (pdf). It is a spirited and thoughtful response to “bah, humbug” critics as diverse Larry Ellison and Richard Stallman – and some attendees at the SNIA cloud storage symposium.

The demand
The RADlab team (Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia) identifies 3 demand drivers for the cloud computing:

  1. The illusion of infinite computing resources available on demand.
  2. The elimination of an up-front commitment.
  3. The ability to pay for use of computing resources on a short-term basis as needed.

These are all eminently desirable. I remember when IBM offered 7 year leases on mainframes – and people took them!

The supply
Economies of scale drive the supply side. The authors refer to a table of data provided by former Microsoft data center architect – and current AWS architect, James Hamilton. I took the liberty of changing some labels: EDC is enterprise data center; IDC is Internet data center.

Cost advantage of Internet data centers over enterprise data centers

Cost advantage of Internet data centers over enterprise data centers

There is another fundamental physical advantage: it is cheaper to ship photons than electrons. Not only do DWDM fiber optics offers vast capacity, but the value of the cargo – data – is higher.

Next up: obstacles and opportunities for cloud computing.

The StorageMojo take
Historically, every 20 years a fundamental shift in computing worldview upsets the industry applecart: batch vs realtime in the early 60’s; PC vs timeshare in the early 80’s; and now commodity scale-out clusters vs bespoke data center infrastructures. It takes 20 years to articulate the possibilities of these changes and to create the next generation technology that leads to the next shift.

Only after a generation has grown up in the new paradigm can the next leap occur. Latterly, only after a hundred million Internet-connected PCs were installed could an ad-based search engine be a viable business proposition.

Courteous comments welcome, of course. I’l be commenting later on cloud storage. I’m not as pessimistic as some readers.

{ 6 comments… read them below or add one }

David Slik February 19, 2009 at 2:42 am

Based on the last pricing analysis that I did, my feeling is that the cost listed for IDC storage is high. And private cloud EDC storage (such as that provided by Bycast) is far, far lower than the typical all-in price listed here for enterprise storage, and can even be less than the listed IDC storage costs.

On the other hand, the network costs seem way too low. In many places, even in dense metro environments, customers are paying orders of magnitude higher rates for their Internet bandwidth.

I’m looking forward to your next blog post about cloud storage.

Chris February 19, 2009 at 6:32 am

Personally, I am very excited about cloud computing, but I work with companies who still do batch computing on mainframes. I’m still trying to figure out how to take advantage of cloud computing with clients who want to know the physical address of every location their data is stored so I’m looking to private cloud computing products. They just don’t seem to offer as competitive advantages – at least at our scale.

TimC February 19, 2009 at 8:22 am

The way you’re quoting your “IDC” numbers leads me to believe you’re talking about something like “theplanet”. That’s great that it’s way, way cheaper, but at the end of the day when corporate exchange for a large enterprise is down for 12 hours because their staff will “get to it when they get to it” (which is exactly why they can do 1 admin for 1,000 servers), the honeymoon will end.

Taylor February 19, 2009 at 12:18 pm

I don’t know where they get $95/Mbps/mo as a rate for a medium-sized datacenter. I haven’t seen numbers anywhere near that high in the US/Europe. Then again, I’ve never priced datacenter bandwidth in someplace like Portland, ME or Lincoln, NE or what have you. I wonder if they’re basically saying the “EDC” would have to be located in the city nearest the home office.

On the flipside, those cheaper IDC BW rates don’t make much of a difference if they are shipping all the data to/from the IDC from the home office in Butte, MT. They either need a fat pipe on that end still, at $$$ rates, or else they’re not transferring enough data for that cost to be the deciding factor, I would think.

Steve Jones February 20, 2009 at 3:51 am

It would be interesting to see some more up to date costs figures as they are dated from 2006 and what is included in those storage service costs. From what I know of our internal enterprise standard storage arrays, then we are currently at about double that of the stated IDC cost. Of course many of our costs are historical – kit bought over several years, some of it still in depreciation, but even that which isn’t attracts expensive maintenance costs. In our case a lot of what would have been previously placed on enteprise stroage arrays is now being held on commodity arrays which have a wholly different cost base. However, a big cost to most large enteprises is moving data off of legacy systems – that’s often a much more expensive exercise than the cost of the old storage. Modern infrastructure can help, but there’s the rub – you jhave to get the old data and apps off there first.

However, one thing is very clear. For many purposes data has to be close to processing power for bandwidth and, even more importantly, latency issues. I think pure cloud storage has only limited appeal for many enterprises. Performance, scalability and availability matter hugely. Try running a large call centre with predictive dialing integrated into CRM systems to see what I mean.

It’s going to take time to work out how this shakes down. I have an idea that it might appeal more to small and mid-range customers first. For large enterprises with highly integrated infrastructure and systems with all that entails, then it’s going to initially be a bureaux service for a few discrete applications – probably invoked by a department with a bit of budget that can’t get the central IT system to deliver on time. With large enterprise struggling with configuration management, software licensing, asset management, support and so on for what are often thousands of disparate applications, this could add a further complication It will be rather like that when CIOs found that small, but critical parts of their Enterprise’s IT systems, now relied on an ancient server sitting under a desk in an office somewhere depending on the support of a contractor that was lost in a cost cutting exercise two years earlier.

On demand computing has its hidden costs. Control in IT enterprises matters – it’s not just the costs of trhe hardware or even systems support that matters. Enterprise IT is often a fight between the need for flexibility and speed on one side with and control and management on the other site. Complex, hybrid arrangements across different infrastructure provision with inconistant monitoring and management at different layers palgues many Enterprises.

Blissex February 22, 2009 at 3:32 am

I have read the Berkeley study and it seems to me to be quite naive as to business realities… The cloud clearly has some uses (more on this later) but it has some really big issues for ordinary businesses:

* The price advantage of a large site is claimed (claimed!) to be 3-5x that of a small site. But it would be insane for most businesses to rely solely on one cloud provider, and most likely there would be a requirement for local (not cloud) backup of all data as well. Add the cost of a much bigger internet connection, and it is not clear that there is a cost advantage at all for data intensive business.

* Cloud providers offer a “best effort” pricing, which is somewhat different from what onsite computing resources do in typical enterprises. I would be interested to see what kind of pricing they would offer for a specific SLA with consequential damages for example.

* The insanity of relying on a single cloud provider arises from having a single copy of the data in the hands of a third party is way too large a risk; and never mind that if you stop paying or have any dispute with the cloud provider you lose access to your data, which the cloud provider might even delete, even if nonpayment is temporary. Even just making a backup of the data might be too expensive (unless done continuously incrementally).

* Moving business data processing to the cloud, even assuming a strong SLA from the cloud provider, trades local data center availability risk for internet connection availability risk. The cloud may be 99.999% available guaranteed (again, I’d like to see the pricing for that), but is the full internet connection to it equally guaranteed?

But for one thing just imagine the complications in liquidation or bankruptcy of having all enterprise data with a service that won’t make that available unless they get paid first and ahead of all other creditors, and they want to get paid for all time the data has been on their servers.

What are the businesses that will take advantage of clouds? Well, those where lots of processing of small amounts of data is common, those where the business is intrinsically network oriented and data volumes are small, those that cannot get enough reliability locally, those where initial capital is scarce.

So for example offshoring businesses in 3rd world countries offering web based services are a perfect fit: they get access to 1st world class infrastructure, without any initial capital outlay, and most of their data needs are small.

Leave a Comment

{ 2 trackbacks }

Previous post:

Next post: