Cisco’s UCS limited scale

by Robin Harris on Friday, 30 April, 2010

Should a Unified Computing System be able to scale to meet growing application needs? Of course, but the network must scale with it.

The problem: Cisco’s enterprise architecture doesn’t scale. And that isn’t only the StorageMojo take. Gartner VP Joe Skorupa doesn’t think so either. Here is what he said:

The promise that a single converged data center network would require fewer switches and ports doesn’t stand up to scrutiny,” Mr. Skorupa said. “This is because as networks grow beyond the capacity of a single switch, ports must be dedicated to interconnecting switches. In large mesh networks, entire switches do nothing but connect switches to one another. As a result, a single converged network actually uses more ports than a separate local area network (LAN) and storage area network (SAN).

This is not new news: uplinks have always been expensive and difficult to size. Essentially you end up with a NUMA – non-uniform memory access – architecture, which has never been popular for good reason.

Capacity cheap, network bandwidth expensive
Amazon and Google don’t use fancy switches close to the servers because a) they can’t afford to, and b) they don’t attach storage to the switches. This is why DAS – direct-attach storage – is popular in scale-out architectures.

Not only is DAS latency lower, the bandwidth is higher and much cheaper. A fully expanded 40 port 6100 switch – the one that supports UCS – is well over $1k per 10GigE port.

You can buy a low-end server for that with as much DAS bandwidth and GigE as well.

Lower latency and higher bandwidth are important for server performance as well. DAS may be tougher to manage, but it pays continuing dividends.

The StorageMojo take
The mystery of UCS – and other converged stacks – deepens. As Mr. Skorupa’s comments imply, the UCS architecture doesn’t scale well.

Cisco marketing needs to stop drinking their own bathwater and take a deep clear look at what they really have. Positioning UCS as something it cannot be is not a long term strategy for success.

Yes, Cisco and presumably the other stacks offer evolutionary enhancements over existing blade-based confederations. And compared to IBM’s z-series mainframes they seem to be more scalable. (z-series fans are welcome to weigh in with their views.)

A software layer that makes local DAS management as easy as centralized SAN management – and it should be easier – will be a big win for the enterprise. And for the company that proves they can do it.

Courteous comments welcome, of course.

Related posts:

  1. A deep dive into Cisco’s UCS One of the highlights of the Gestalt IT tour was...
  2. Consolidated I/O for virtual data centers Xsigo (see-go) produces an I/O consolidation appliance whose elegance impresses....
  3. MaxiScale’s Web-scale file system A new web scale – they claim linear scaling to...

Related posts brought to you by Yet Another Related Posts Plugin.

{ 15 comments… read them below or add one }

nate Friday, 30 April, 2010 at 9:23 am

Exactly, that’s just one of the shortcomings of the design. But that doesn’t stop the Cisco kids out there from eating out of Cisco’s hand like it’s the only hand in the universe.

Sad really.

But there are really (at least)two distinct designs out there, scale out of course works well for the googles, amazons etc of the world, but really the app has to be geared towards that. Today of course most apps are not. So they cannot leverage such an infrastructure that relies heavily on distributed DAS storage.

Wrote a bit on it here(Why I hate the cloud) http://www.techopsguys.com/2010/02/09/why-i-hate-the-cloud/

So there is and will continue to be a market for non-scale out architectures for some time to come(I’d think at least a decade scale out is still in it’s earliest of stages), the development skills required to properly handle scale out I think is beyond the reach of most people at this point, it will take many years for scale out designs to be able to support regular workloads and provide high availability at the same time. Developers are still grasping with how to scale to multiple cores within a single system, they probably don’t want to hear about having to scale across multiple systems in a seamless manor.

For my own needs I am pursuing HP c Class blades and 3PAR storage for virtualization infrastructure and SGI Cloudrack for scale out computing. Two very different solutions for two very different problems.

Omar Sultan Friday, 30 April, 2010 at 11:42 am

Robin:

I am always happy to engage in a productive conversation, but, to be honest, I struggled with this post a bit?

With regards to Mr Skorupa’s comments, as we discussed in Boston, a whole cottage industry has sprung up around taking his comments out of context. To paraphrase, his perspective is that much of the value of a converged network is currently sitting in the access layer, while the economics of wholesale conversion to a unified fabric (host to target) is murkier and less compelling. Especially for enterprises that have happily functioning FC SANs in place. This is a perspective that we also happen to hold.

I am not sure I see how Skorupa’s comments tie back to UCS scalability. Northbound, you can run 4Gb or 8Gb FC out of a UCS, so I am not sure how that is any different than any other blade system out there.

Are you arguing that DAS is better than FCoE or iSCSI (“other converged stacks”)–I guess it would depend upon the application–but don’t see what this has to do with UCS?

Are you arguing against blade architectures (“existing blade-based confederations”). Again, not particularly UCS specific and we happen to offer UCS rack servers (with anywhere from 4-16 drive bays).

Would love to understand what we can be doing better, but I need a bit more detail here.

Regards,

Omar Sultan
Cisco

Ryan Friday, 30 April, 2010 at 11:46 am

NUMA is very popular… The AMD Opteron and recent Intel Xeons have a NUMA memory architecture.

Jacob Marley Friday, 30 April, 2010 at 7:07 pm

Isn’t a software layer to manage DAS across servers, NUMA for central storage?

scale out computing isn’t going to happen anywhere except at large computing utilities, i.e. google/amazon/yahoo.

10GE is still expensive but one 10GE port on a server provides a lot more IO than four (aggregated or multipathed) 1GE ports. The new x86 server chips from Intel/AMD combined with the latest 10GE NICs can finally pump close to 10GE speeds so (like the transition from FastE to 1GE) over the next few years 10GE costs will (hopefully) come down.

Right now we deploy VMWare ESX hosts with four 1GE ports where two are dedicated for iSCSI and two for data traffic.

The next refresh will probably be single 10GE for both iSCSI and data traffic with priority queues (built into newer NICs) and switch side QoS to provide fairness.

Paul Saturday, 1 May, 2010 at 1:46 pm

Besides making “local DAS management as easy as centralized SAN management,” there’s a related business opportunity (as nate hints above) in making enterprise apps scale out gracefully into DAS-dense clouds.

Damir Lukic Sunday, 2 May, 2010 at 4:27 pm

Every multi-CPU system is based on NUMA architecture. NUMA is simply a way of providing each processor with its’ own memory space, so that it doesn’t starve.

The only thing to consider is how UCS handles memory. But it does it in a typical way – each blade has it’s own set of CPU’s and memory modules. OK, UCS has the Cisco’s unique memory multiplexer chip so it can handle much more memory than blades from other vendors. But, memory subsystem is not shared between blades which can be possibly solved using RapidIO or Infiniband (like it is used in EMC Symmetrix V-Max). At least I believe so. :)

About DAS – yes, DAS is offering more bandwidth and throughput, but is not nearly as scalable as an external storage array device. On the other hand, there is no way of multipathing DAS subsystem, so high availability is a problem. Also, there is a way to use DAS subsystems as a type of external storage array – simply using virtualization layer. There are products like Datacore, Lefthand, even VM6 VMex can be used in MS HyperV environment to provide the virtualization layer for storage making all DAS devices from all servers shown as a single storage device.

Steve Chambers Monday, 3 May, 2010 at 6:41 am

Robin, I just don’t get your article at all, just like you don’t get UCS.

Joe Skarupa doesn’t get it, either, so you’re in good company.

http://viewyonder.com/2010/03/25/chicken-little-in-the-unified-data-center-starring-joe-skorupa-of-gartner/

Steve

andrew Thursday, 6 May, 2010 at 3:52 am

Yep, I don’t get this article either – das, numa, uplinks huh?

Christopher Reed Monday, 17 May, 2010 at 1:21 pm

I read this article. Twice just to be sure but are you arguing against centralized storage with the DAS comments? Also, Cisco UCS is not properly described as blade servers. It is modular computing at its foundation. I can have 12 x 8GB FC ports to a SAN out of one 6140 at line rate. 24 in a redundant configuration. How many SAN disk would it take to push that? From a network perspective each blade has 20Gig of connectivity. Is there a server that can truly push that bandwidth? Could we dive a bit deeper?

Steve Kaplan Monday, 17 May, 2010 at 4:43 pm

Robin,

While I quite like most of what you write, this article is a notable exception. At a meta level, the challenge is that you fail to consider UCS in the context of its intended purpose which is an optimized hosting platform for virtual infrastructure. Under this type of architecture, discussions about DAS and $1,000 pizza box servers become, for example irrelevant.

Before tackling the theme of the article which is that the UCS doesn’t scale, I wanted to address the undertone which is that the UCS 61000 “switch” is expensive. Cisco doesn’t, of course, make a 6100 switch. The UCS 6100 is a Fabric Interconnect that manages the UCS blades and aggregates uplink traffic. Quibbling over its cost is pointless. What is important is how well the UCS enables the overall vast savings and other benefits of a virtualized data center in respect to alternative solutions. The unified fabric, unified management, extended memory technology, stateless computing and VN Link capabilities put UCS in a class by itself for enabling optimal performance, provisioning and management of a virtual infrastructure. Additionally, the reduced switches, adapters and management module requirements along with superior power usage efficiencies typically make the UCS the least expensive option as well.

You say that, “You can buy a low-end server for that $1K/10G port”. That $1K port supports up to 4 blades, so the cost isn’t nearly as high as portrayed. While it is true that you can fill up your data center with 1RU pizza box servers, how will you manage them? How will you power and cool them? How are you going to justify the periodic upgrade costs, cabling, SFPs, maintenance contracts, rack space, network switch ports, SAN switch ports, and PDU, UPS and generator slices required? How will you back up all of the DAS disks?

In terms of the Cisco’s enterprise architecture not scaling, compared to what? 1 RU pizza box servers running GigE? Stating that a converged network uses more ports than separate LAN and SAN networks due to interconnects is a bit laughable. Any data center network of a significant size requires switch interconnects. These may use a relatively small number of uplink ports or, as intimated by Mr. Skorupa, they may require entire switches. The point is that this requirement exists regardless of whether the environment is traditional or converged. It is simply a matter of best practices in reducing single points of failure. What is also overlooked is that there will be an equally complex set of SAN switching which also may require a large number of switch interconnects. This will, of course, result in a fairly significant amount of cable bloat, not to mention potential cooling issues, rack space issues and management overhead. Generally speaking, full mesh LANs are a bad idea. The only reason to do this is because your core switch can’t handle the traffic load. This may be the primary driver behind the Nexus 7000 with its high backplane capacity.

Maintaining that DAS offers better scale out than SAN is irrelevant in a virtual infrastructure since we are always going to utilize shared storage for capabilities such as HA, FT, vMotion, etc. Even in an increasingly anachronistic physical data center, however, I would argue that more frequent disk failure, recovering from disk failure, reboot requirements for upgrades and lack of I/O performance in addition to the mentioned management challenges make SAN storage a superior option. Furthermore, new servers often utilize either a SCSI (320 Mbit/s) or SAS (3 Gbit/s or increasingly 6 Gbit/s at higher cost). The cost for this storage is not much less than the typical Fibre Channel back-end component of enterprise storage arrays.

Attempting to build a case for distributed architecture by referencing Amazon and Google is misleading. Large Web portals like this are the exception, not the rule, and utilize purpose-build systems designed from the ground up to function in a distributed farm – more comparable to high-performance computing clusters. Most business applications do not work this way, and probably won’t for a very long time, if ever.

Anoni-mouse Thursday, 20 May, 2010 at 4:46 pm

It would seem based on these and the previous post’s (most though not all) responses that the Cisco marketing hounds hath been unleashed?? Disconcerting there Cisco!

Robin Harris Thursday, 20 May, 2010 at 9:14 pm

Wait ’til they see what else I have coming!

Robin

Steve Kaplan Friday, 21 May, 2010 at 10:39 am

Anoni-mouse,

In response to your comment about “Cisco marketing hounds”, I work for a solutions integrator and Cisco partner, but never heard a thing from Cisco about this article. In fact, I was alerted to it when it was included as a link on an email sent out by an HP rep titled, “IT Critics Declare HP Dominance Over Cisco”. The StorageMojo piece, despite all of the critques by me an others, is still the best of the 5 links included in the email.

Robin, you say you have more coming. I hope you will specifically address the comments to your post.

Bob Tuesday, 15 June, 2010 at 12:23 pm

I think before you scale your opinion you need to a) More fully understand the UCS product and b) talk to the numerous customers who are using UCS today and see if this is indeed the problem you state and C) stop taking Gartner as gospel, most people feel that you have to pay for a favorable opinion from Gartner. Stop letting the tail wag the dog.

Sal Collora Thursday, 5 August, 2010 at 10:44 pm

DISCLOSURE: I work for Cisco and sell UCS every day.

Robin, have you used Cisco UCS? Have you spoken to customers who have chosen it over competitive offerings? Do you really understand the power of shared storage, especially in VMWare environments? Sure, in massively scaled out environments like Amazon and Google, sure, DAS makes sense. But there are only a handful of those environments. The overwhelming majority of real businesses have more modest requirements, and the performance of NetApp and EMC as companies proves that people like shared storage, and the size of the blade market confirms people want to use these systems.

I actually install these systems and teach people how to use them. The fact that people are blogging about UCS and sign up as references show the power of the system. I’ve been here over 12 years and this is not just a marketing thing.

Leave a Comment

Previous post:

Next post: