In our last episode we reviewed the paper’s supply and demand drivers. Now we look the investigator’s top 10 obstacles and opportunities for cloud computing.

Adoption, growth and business obstacles
The paper identifies 10 obstacles and their associated opportunities. The first 3 are adoption obstacles, the next 5 are growth obstacles adn the last 2 are policy and business obstacles to adoption. Obstacles are bold and the opportunities follow them

  1. Availability of Service. Use Multiple Cloud Providers to provide Business Continuity; Use Elasticity to Defend Against DDOS attacks
  2. Data Lock-In. Standardize APIs; Make compatible software available to enable Surge Computing
  3. Data Confidentiality and Auditability. Deploy Encryption, VLANs, and Firewalls; Accommodate National Laws via Geographical Data Storage
  4. Data Transfer Bottlenecks. FedExing Disks; Data Backup/Archival; Lower WAN Router Costs; Higher Bandwidth LAN Switches
  5. Performance Unpredictability. Improved Virtual Machine Support; Flash Memory; Gang Scheduling VMs for HPC apps
  6. Scalable Storage. Invent Scalable Store
  7. Bugs in Large-Scale Distributed Systems. Invent Debugger that relies on Distributed VMs
  8. Scaling Quickly. Invent Auto-Scaler that relies on Machine Learning; Snapshots to encourage Cloud Computing Conservationism
  9. Reputation Fate Sharing. Offer reputation-guarding services like those for email
  10. Software Licensing. Pay-for-use licenses; Bulk use sales

Service availability and performance unpredictability are the deal killers. If there aren’t acceptable answers – or acceptable applications – cloud computing is DOA.

Availability
Here’s a table from the paper:

Recent cloud services outages

Recent cloud services outages

The authors argue that users expect Google Search levels of availability and that the obvious answer is using multiple cloud providers. The also offer an interesting argument on the economics of Distributed Denial of Service (DDoS) attacks:

Criminals threaten to cut off the incomes of SaaS providers by making their service unavailable, extorting $10,000 to $50,000 payments to prevent the launch of a DDoS attack. Such attacks typically use large “botnets” that rent bots on the black market for $0.03 per 14 bot (simulated bogus user) per week. . . . Suppose an EC2 instance can handle 500 bots, and an attack is launched that generates an extra 1 GB/second of bogus network bandwidth and 500,000 bots. At $0.03 per bot, such an attack would cost the attacker $15,000 invested up front. At AWS’s current prices, the attack would cost the victim an extra $360 per hour in network bandwidth and an extra $100 per hour (1,000 instances) of computation. The attack would therefore have to last 32 hours in order to cost the potential victim more than it would the blackmailer. . . . As with elasticity, Cloud Computing shifts the attack target from the SaaS provider to the Utility Computing provider, who . . . [is] likely to have already DDoS protection as a core competency.

Depending on the level of Internet criminality going forward, that last point will prove decisive for some customers.

Unpredictable performance
As always, I/O is the issue:

Our experience is that multiple Virtual Machines can share CPUs and main memory surprisingly well in Cloud Computing, but that I/O sharing is more problematic. Figure 3(a) shows the average memory bandwidth for 75 EC2 instances running the STREAM memory benchmark [32]. The mean bandwidth is 1355 MBytes per second, with a standard deviation of just 52 MBytes/sec, less than 4% of the mean. Figure 3(b) shows the average disk bandwidth for 75 EC2 instances each writing 1 GB files to local disk. The mean disk write bandwidth is nearly 55 MBytes per second with a standard deviation of a little over 9 MBytes/sec, more than 16% of the mean. This demonstrates the problem of I/O interference between virtual machines.

Virtual machine I/O problem

Virtual machine I/O problem

The opportunity, then, is to improve VM I/O handling. They note that IBM solved this problem back in the 80’s, so it is doable. The key is getting people to pay for the fix.

What about the other 8?
They aren’t deal killers. Yes, it will take a while to figure out data confidentiality and auditability, but there is nothing intrinsic that says it can’t be done.

The StorageMojo take
Is cloud computing viable? Of course. AWS claims 400,000 users – 80,000 active under the 80/20 rule – commercial viability is a given.

Some think it is an issue, especially for storage – reasoning from the Storage Networks and Enron debacles of the early millennium. All that proved is that buying enterprise kit means enterprise costs: you can’t undersell the data center using their gear.

But Google and Amazon have shown that commodity-based, multi-thousand node scale-out clusters are capable of enterprise class availability and performance – at costs that “name brand” servers and storage can’t match. How much more proof do people need?

The issue is cognitive: the implicit assumption that cloud computing must compete with enterprise metrics. The funny thing is most enterprise systems and storage don’t need enterprise availability.

In the early 90’s Novell PC networks averaged around 70% availability – pathetic even in those days. Yet they spread like wildfire outside the glass house over IT opposition.

The lesson: if the reward is big enough the LOB will accept performance and availability far less than what they expect from IT. If they can get useful work done for 1/5th the cost they’ll accept some flakiness. Uptime is a means to an end – not an end in itself.

In the near term existing apps aren’t going to move to the cloud. It is new apps that aren’t feasible with today’s enterprise cost structures. Longer term – 10+ years – we’ll be surprised, just as the mainframe guys were in the 90’s.

The current economic crisis – the impoverishment of the industrialized world – means brutal cost pressures on IT for the next 5 years or more. Successful IT pros will help the LOB use cloud computing, even if it doesn’t provide “enterprise” availability and response times.

Courteous comments welcome, of course. One gem from the paper is a wake-up call for Cisco:

One estimate is that two-thirds of the cost of WAN bandwidth is the cost of the high-end routers, whereas only one-third is the fiber cost. Researchers are exploring simpler routers built from commodity components with centralized control as a low-cost alternative to the high-end distributed routers. If such technology were deployed by WAN providers, we could see WAN costs dropping more quickly than they have historically.