I’ve had comments, both written and verbal, about how hard it is to buy an X4500. As I noted in Computerworld that isn’t the surprise. I like the X4500 because it breaks the current model of available and high performance storage based on the 18-year old idea of a RAID controller.
Jonathan Schwartz, Sun’s CEO, published a post on Sept 18, 2006 in his blog where he talks about the increasing difficulty in explaining to analysts how Sun divvies up its revenues. He uses the X4500 as his example.
. . . we recently introduced a new product, code named “Thumper” . . . . It’s a 2-way, general purpose server, with 24 terabytes (yes, Tb [sic]) of storage, running Solaris and ZFS. It has very interesting performance (2 Gigabytes per second sustained i/o, for the geeks in the crowd), and pricing (well under $50,000 – less than $2 per Gig). Its service profile is what makes it most interesting, however: because it runs Solaris/ZFS, as the drives fail (and all disk drives eventually do), the system requires no maintenance. Instead, it either slows down, or shrinks (customers can choose) – but the integrity of customer data is never at risk. It’s a reliable system built from inherently unreliable parts, a fundamental design principle of the internet.
. . . Performance and efficiency are tremendous, in part because there’s no network latency – because there’s no network. (Just ask Joyent about their experiences.)
Now here’s the challenging part.
. . . it’s part server, part application platform, and part storage product. Customers pay only one price, but in the pursuit of transparency, how should we categorize the revenue? – as server, storage or software product? It obviously contains all three. For now, we’re calling it storage – which underrepresents our server and software business.
Mr Schwartz isn’t the only one who is confused
Actually it appears the X4500 is falling between the cracks in Sun’s organization because it is an innovative, cost-effective product. The revenue goes to the storage group, the product manager is from the server group, and the expertise about what makes it really cool is in the software group. In this podcast interview the product manager offers only two uses for the X4500: video surveillance and supercomputing.
A modest suggestion
Sun’s problem is bigger than confused X4500 marketing. The economic advantages of large-scale cluster-based internet data centers is beginning to sweep over a very conservative IT marketplace. These infrastructures have a very different calculus regarding management, software, network and storage tradeoffs than enterprise IT. It will take a team of smart marketing and engineering types to build relationships with the top customers to understand where Sun – or any other vendor – could add value and win business.
The move to RAD (reliable, adaptive, distributed) computing argues for a dedicated product group to be set up to investigate, document, propose, develop and market products and services. This is bigger than the X4500. This is the future of computing and the next big growth engine for software, server, network and storage companies.
Robin,
are you not worried about single point failures and the inability to replace disks on-line?
Richard, the drives are hot-swap SATA drives, according to the x4500 spec. Is there a physical access issue I don’t know about? ZFS also supports hot spares, so out of the 48 drives you could set aside several hot spares. Or, if a drive fails, it gets (through software) pulled out of the pool and its data redistributed to remaining drives. Leave the dead drive unless you need the capacity.
As to how x4500’s should be used:
-inside a Google File System-type environment, where GFS assumes failures and distributes data across multiple nodes for redundancy – no problem. In fact, it occurs to me that Google may be the target market for the x4500. Instead of one server per terabyte, which is roughly where they are now, Google could have one fast server for every 24-48 TB, depending on disk size.
-as a single server with local storage using ZFS – not much problem. The server motherboard is a single point of failure, but once burned in it will run for years. Replace the mother board and your data is still all there. With ZFS data integrity is better than equivalent non-ZFS servers. So that case works.
-in a cluster. ZFS isn’t a cluster file system, so I don’t think the x4500 would work well there.
What do you think?
Robin
Robin,
Yes, the physical access to disks is a worry… the chassis needs to be withdrawn fully from the rack. This should work… but some potential customers may be worried about vibrations, cabling… etc. ZFS will delay the need to replace disks … but this is not different to a typical RAID scenario…. other ZFS-related advantages aside.
The second issue is that of a single ‘controller’ …. this is not a problem to Google FS … or anyone willing to replicate, but I am not sure how Googles “commodity’ based statistics work with a resolution of 24GB per chassis… this could be interesting.
Passive backplane is fine, providing it can be replaced easily (I doubt) …. and if ZFS can cope with re-ordered disk slots … which I am sure it can.
All that aside… this is a SATA backend …. and single point failures are difficult to eliminate without overdesign and perhaps further reduction in reliability that comes with the existing schemes for dual-porting SATA disks.
It is a bold packaging move …. and I wish them the very best.
Richard
I made a post about this at:
http://www.drunkendata.com/?p=607
Good stuff.
I got busy for a while and you got really hot. Pushed all my buttons.
Especially SOA.
You would think the SNIA would be square in the middle of the testing area.
I guess they think they are with the SNIA SMI-S.
First off ZFS is clusterable as are the X4500s.
There are several articles on open solaris showing that they even have iscsi targets working alongside iscsi initiator properly finally …
ZFS itself goes beyond traditional raid, traditional file systems and GFS in many ways by taking the file system down to the block level and treating all data as snapshots. The SUN white papaers are very detailed about this.
From what I’ve read and then tested 1 X4500 has the same throughput as a net app FAS6000. Considering that you have fewer disks per back plain and a much higher achievable density in storage with the X4500 you have multi petabyte scalability in netapps nearly a petabyte footprint.
The X4500s are redundnat in every way. dual power supplies hotswap drives etc.
Swapping drives isn’t as hard as you make it out as long as you rack has sliding rails so you easily slide the server forward on it’s rails, then pop the tooless lid, swap the drive(s) … that simple.
Now that you have all that storage and left over terraflops of computing power taking up half the footprint in your data centerwhat are you going to do with the left over space?
That’s the biggest / only problem I see with the x4500
Which realy isn’t much of a problem at all…
Hi Robin
If you are still responding to requests, could you please assist with my query?
I have been advised that a Sun rack cabinet on casters fitted with 2 x X4500s, cannot be shipped by truck from one office to another (distance about 150 miles). Apparently the cabinet may collapse??
Is that a fact and, if so, does it mean that the servers have to be removed and shipped as separate units?
Thanks and regards
Glenn, I’d check with the local Sun service provider for the recommended procedure. I’m not surprised that shipping a loaded rack on casters is a problem – the difference in static loads vs the stresses of shipping is not something the mechancial engineers look at.
If it were me, I’d remove and box each x4500 separately for shipping. They are heavy. I’d pull all the drives and put them in a drive shipping box to maximize their chances of surviving the trip as well.
Robin