Should a Unified Computing System be able to scale to meet growing application needs? Of course, but the network must scale with it.
The problem: Cisco’s enterprise architecture doesn’t scale. And that isn’t only the StorageMojo take. Gartner VP Joe Skorupa doesn’t think so either. Here is what he said:
The promise that a single converged data center network would require fewer switches and ports doesn’t stand up to scrutiny,” Mr. Skorupa said. “This is because as networks grow beyond the capacity of a single switch, ports must be dedicated to interconnecting switches. In large mesh networks, entire switches do nothing but connect switches to one another. As a result, a single converged network actually uses more ports than a separate local area network (LAN) and storage area network (SAN).
This is not new news: uplinks have always been expensive and difficult to size. Essentially you end up with a NUMA – non-uniform memory access – architecture, which has never been popular for good reason.
Capacity cheap, network bandwidth expensive
Amazon and Google don’t use fancy switches close to the servers because a) they can’t afford to, and b) they don’t attach storage to the switches. This is why DAS – direct-attach storage – is popular in scale-out architectures.
Not only is DAS latency lower, the bandwidth is higher and much cheaper. A fully expanded 40 port 6100 switch – the one that supports UCS – is well over $1k per 10GigE port.
You can buy a low-end server for that with as much DAS bandwidth and GigE as well.
Lower latency and higher bandwidth are important for server performance as well. DAS may be tougher to manage, but it pays continuing dividends.
The StorageMojo take
The mystery of UCS – and other converged stacks – deepens. As Mr. Skorupa’s comments imply, the UCS architecture doesn’t scale well.
Cisco marketing needs to stop drinking their own bathwater and take a deep clear look at what they really have. Positioning UCS as something it cannot be is not a long term strategy for success.
Yes, Cisco and presumably the other stacks offer evolutionary enhancements over existing blade-based confederations. And compared to IBM’s z-series mainframes they seem to be more scalable. (z-series fans are welcome to weigh in with their views.)
A software layer that makes local DAS management as easy as centralized SAN management – and it should be easier – will be a big win for the enterprise. And for the company that proves they can do it.
Courteous comments welcome, of course.
Exactly, that’s just one of the shortcomings of the design. But that doesn’t stop the Cisco kids out there from eating out of Cisco’s hand like it’s the only hand in the universe.
Sad really.
But there are really (at least)two distinct designs out there, scale out of course works well for the googles, amazons etc of the world, but really the app has to be geared towards that. Today of course most apps are not. So they cannot leverage such an infrastructure that relies heavily on distributed DAS storage.
Wrote a bit on it here(Why I hate the cloud) http://www.techopsguys.com/2010/02/09/why-i-hate-the-cloud/
So there is and will continue to be a market for non-scale out architectures for some time to come(I’d think at least a decade scale out is still in it’s earliest of stages), the development skills required to properly handle scale out I think is beyond the reach of most people at this point, it will take many years for scale out designs to be able to support regular workloads and provide high availability at the same time. Developers are still grasping with how to scale to multiple cores within a single system, they probably don’t want to hear about having to scale across multiple systems in a seamless manor.
For my own needs I am pursuing HP c Class blades and 3PAR storage for virtualization infrastructure and SGI Cloudrack for scale out computing. Two very different solutions for two very different problems.
Robin:
I am always happy to engage in a productive conversation, but, to be honest, I struggled with this post a bit?
With regards to Mr Skorupa’s comments, as we discussed in Boston, a whole cottage industry has sprung up around taking his comments out of context. To paraphrase, his perspective is that much of the value of a converged network is currently sitting in the access layer, while the economics of wholesale conversion to a unified fabric (host to target) is murkier and less compelling. Especially for enterprises that have happily functioning FC SANs in place. This is a perspective that we also happen to hold.
I am not sure I see how Skorupa’s comments tie back to UCS scalability. Northbound, you can run 4Gb or 8Gb FC out of a UCS, so I am not sure how that is any different than any other blade system out there.
Are you arguing that DAS is better than FCoE or iSCSI (“other converged stacks”)–I guess it would depend upon the application–but don’t see what this has to do with UCS?
Are you arguing against blade architectures (“existing blade-based confederations”). Again, not particularly UCS specific and we happen to offer UCS rack servers (with anywhere from 4-16 drive bays).
Would love to understand what we can be doing better, but I need a bit more detail here.
Regards,
Omar Sultan
Cisco
NUMA is very popular… The AMD Opteron and recent Intel Xeons have a NUMA memory architecture.
Isn’t a software layer to manage DAS across servers, NUMA for central storage?
scale out computing isn’t going to happen anywhere except at large computing utilities, i.e. google/amazon/yahoo.
10GE is still expensive but one 10GE port on a server provides a lot more IO than four (aggregated or multipathed) 1GE ports. The new x86 server chips from Intel/AMD combined with the latest 10GE NICs can finally pump close to 10GE speeds so (like the transition from FastE to 1GE) over the next few years 10GE costs will (hopefully) come down.
Right now we deploy VMWare ESX hosts with four 1GE ports where two are dedicated for iSCSI and two for data traffic.
The next refresh will probably be single 10GE for both iSCSI and data traffic with priority queues (built into newer NICs) and switch side QoS to provide fairness.
Besides making “local DAS management as easy as centralized SAN management,” there’s a related business opportunity (as nate hints above) in making enterprise apps scale out gracefully into DAS-dense clouds.
Every multi-CPU system is based on NUMA architecture. NUMA is simply a way of providing each processor with its’ own memory space, so that it doesn’t starve.
The only thing to consider is how UCS handles memory. But it does it in a typical way – each blade has it’s own set of CPU’s and memory modules. OK, UCS has the Cisco’s unique memory multiplexer chip so it can handle much more memory than blades from other vendors. But, memory subsystem is not shared between blades which can be possibly solved using RapidIO or Infiniband (like it is used in EMC Symmetrix V-Max). At least I believe so. 🙂
About DAS – yes, DAS is offering more bandwidth and throughput, but is not nearly as scalable as an external storage array device. On the other hand, there is no way of multipathing DAS subsystem, so high availability is a problem. Also, there is a way to use DAS subsystems as a type of external storage array – simply using virtualization layer. There are products like Datacore, Lefthand, even VM6 VMex can be used in MS HyperV environment to provide the virtualization layer for storage making all DAS devices from all servers shown as a single storage device.
Robin, I just don’t get your article at all, just like you don’t get UCS.
Joe Skarupa doesn’t get it, either, so you’re in good company.
http://viewyonder.com/2010/03/25/chicken-little-in-the-unified-data-center-starring-joe-skorupa-of-gartner/
Steve
Yep, I don’t get this article either – das, numa, uplinks huh?
I read this article. Twice just to be sure but are you arguing against centralized storage with the DAS comments? Also, Cisco UCS is not properly described as blade servers. It is modular computing at its foundation. I can have 12 x 8GB FC ports to a SAN out of one 6140 at line rate. 24 in a redundant configuration. How many SAN disk would it take to push that? From a network perspective each blade has 20Gig of connectivity. Is there a server that can truly push that bandwidth? Could we dive a bit deeper?
Robin,
While I quite like most of what you write, this article is a notable exception. At a meta level, the challenge is that you fail to consider UCS in the context of its intended purpose which is an optimized hosting platform for virtual infrastructure. Under this type of architecture, discussions about DAS and $1,000 pizza box servers become, for example irrelevant.
Before tackling the theme of the article which is that the UCS doesn’t scale, I wanted to address the undertone which is that the UCS 61000 “switch” is expensive. Cisco doesn’t, of course, make a 6100 switch. The UCS 6100 is a Fabric Interconnect that manages the UCS blades and aggregates uplink traffic. Quibbling over its cost is pointless. What is important is how well the UCS enables the overall vast savings and other benefits of a virtualized data center in respect to alternative solutions. The unified fabric, unified management, extended memory technology, stateless computing and VN Link capabilities put UCS in a class by itself for enabling optimal performance, provisioning and management of a virtual infrastructure. Additionally, the reduced switches, adapters and management module requirements along with superior power usage efficiencies typically make the UCS the least expensive option as well.
You say that, “You can buy a low-end server for that $1K/10G port”. That $1K port supports up to 4 blades, so the cost isn’t nearly as high as portrayed. While it is true that you can fill up your data center with 1RU pizza box servers, how will you manage them? How will you power and cool them? How are you going to justify the periodic upgrade costs, cabling, SFPs, maintenance contracts, rack space, network switch ports, SAN switch ports, and PDU, UPS and generator slices required? How will you back up all of the DAS disks?
In terms of the Cisco’s enterprise architecture not scaling, compared to what? 1 RU pizza box servers running GigE? Stating that a converged network uses more ports than separate LAN and SAN networks due to interconnects is a bit laughable. Any data center network of a significant size requires switch interconnects. These may use a relatively small number of uplink ports or, as intimated by Mr. Skorupa, they may require entire switches. The point is that this requirement exists regardless of whether the environment is traditional or converged. It is simply a matter of best practices in reducing single points of failure. What is also overlooked is that there will be an equally complex set of SAN switching which also may require a large number of switch interconnects. This will, of course, result in a fairly significant amount of cable bloat, not to mention potential cooling issues, rack space issues and management overhead. Generally speaking, full mesh LANs are a bad idea. The only reason to do this is because your core switch can’t handle the traffic load. This may be the primary driver behind the Nexus 7000 with its high backplane capacity.
Maintaining that DAS offers better scale out than SAN is irrelevant in a virtual infrastructure since we are always going to utilize shared storage for capabilities such as HA, FT, vMotion, etc. Even in an increasingly anachronistic physical data center, however, I would argue that more frequent disk failure, recovering from disk failure, reboot requirements for upgrades and lack of I/O performance in addition to the mentioned management challenges make SAN storage a superior option. Furthermore, new servers often utilize either a SCSI (320 Mbit/s) or SAS (3 Gbit/s or increasingly 6 Gbit/s at higher cost). The cost for this storage is not much less than the typical Fibre Channel back-end component of enterprise storage arrays.
Attempting to build a case for distributed architecture by referencing Amazon and Google is misleading. Large Web portals like this are the exception, not the rule, and utilize purpose-build systems designed from the ground up to function in a distributed farm – more comparable to high-performance computing clusters. Most business applications do not work this way, and probably won’t for a very long time, if ever.
It would seem based on these and the previous post’s (most though not all) responses that the Cisco marketing hounds hath been unleashed?? Disconcerting there Cisco!
Wait ’til they see what else I have coming!
Robin
Anoni-mouse,
In response to your comment about “Cisco marketing hounds”, I work for a solutions integrator and Cisco partner, but never heard a thing from Cisco about this article. In fact, I was alerted to it when it was included as a link on an email sent out by an HP rep titled, “IT Critics Declare HP Dominance Over Cisco”. The StorageMojo piece, despite all of the critques by me an others, is still the best of the 5 links included in the email.
Robin, you say you have more coming. I hope you will specifically address the comments to your post.
I think before you scale your opinion you need to a) More fully understand the UCS product and b) talk to the numerous customers who are using UCS today and see if this is indeed the problem you state and C) stop taking Gartner as gospel, most people feel that you have to pay for a favorable opinion from Gartner. Stop letting the tail wag the dog.
DISCLOSURE: I work for Cisco and sell UCS every day.
Robin, have you used Cisco UCS? Have you spoken to customers who have chosen it over competitive offerings? Do you really understand the power of shared storage, especially in VMWare environments? Sure, in massively scaled out environments like Amazon and Google, sure, DAS makes sense. But there are only a handful of those environments. The overwhelming majority of real businesses have more modest requirements, and the performance of NetApp and EMC as companies proves that people like shared storage, and the size of the blade market confirms people want to use these systems.
I actually install these systems and teach people how to use them. The fact that people are blogging about UCS and sign up as references show the power of the system. I’ve been here over 12 years and this is not just a marketing thing.
Just dun really get what you try to meant here, I will suggest you to understand how the SAN or NAS storage work in traditional computing and study little bit in depth to understand how UCS will overcome the challenge compare to the traditional open x86 system from others vendor. my point of view about scalability, if UCS does not scale as it promise, it will just be another blade like the Brand H, I and D is doing. this platform will turn many of the medium or small size data center to a server room again.
HP’s Datacenters are completely Cisco free.
So what about Cisco’s datacenters ….?
Marc;
Yes, Cisco’s main data center in Texas (DC1) is HP free, with a rapid migration of other sites from HP to Cisco.
John Manville, VP of Cisco’s DC’s has also demonstrated (in detail) how the TCO of HP ($3,600/server/quarter) has gone down to ($1,600/server/quarter) with UCS.
Looks like a lot of people on this thread need to do some homework and verify facts before they speak.
As for scale, I don’t know the final count in Cisco’s DC1, but last number I heard was over 6,000 servers.
UCS doesn’t scale???
Sounds more like the HP F.U.D. machine is in full swing vs. the Cisco marketing machine.
And no, I’m not affiliated with Cisco in any way. I just follow Data Center designs including such as John Manvilles (Cisco) and James Hamilton (Amazon).
This article doesn’t make sense to me are you being paid to bash Cisco UCS? DAS over Central storage and 1k servers rack servers that only works in very small enviornments yet it is in an article titled Cisco UCS limited Scale? The whole 1k for a 10gb port is completely random and out of context. What relation does a 10gb fabric extender port have to a 1k 1RU server (haven’t these been replaced by VMs at this point anyways?) I have 8 Intel X5680s and 384gb of ram per 1k 10gb port on my 6120 which allows me to run well over 100 VMs with equivlent resources to the 1k 1RU server and still have room for HA capacity. If an enterprise were to deploy 1k servers for everything they would go broke paying for datacenter power cooling and floor space.
I read the post three times and I still can’t connect to what he is talking about.
For those that dig the scale out no-SAN architecutres but understand that the complexities it invites there’s now at least one option for the enterprise that deals with both compute and data in the same architecture. Not pimping the product but they may have hit on something …. http://www.nutanix.com
Abstracting out physical resources into logical units, with a sort of global controller to orchestrate available processing, memory, power, and storage seems the desired design. The Cisco Fabric Interconnect / Cisco UCS-B system does not provide this paravirtualization.
A plug& play / stackable / pay-as-you-grow offering has the least barrier to entry. (Cisco M-Series?)