StorageMojo




Robin Harris    


Will FCoE save storage networks?

March 23rd, 2008 by Robin Harris in Architecture, SAN, FC

Back in ‘96, when I was flogging FC networks for Sun under NDA, the most common objection was “I don’t want another layer to manage.” Despite that FC became successful in big enterprise IT shops. But the objection is still valid and a major factor, with price, in the low uptake of FC in smaller shops.

Is FCoE (Fibre Channel over Ethernet) the answer?

FC vendors are - reluctantly - hoping it is
The future of pure FC looks pretty bleak in the long term. 10 GigE is coming down the cost curve just as earlier generations of Ethernet did. The volume Force is with them.

As 10 GigE gets cheaper its total available market gets larger. It may not be optimal, but for many shops “good enough” is good enough.

FC partisans aren’t quitting. 8 Gbit has just started shipping, 16 Gbit is on the drawing boards and there are noises about future generations beyond that.

FCoE follows in the footsteps of VTLs
When 1 Gbit FC started rolling out in ‘97, it was 10x-20x the speed of the then hot 100 Mbit Ethernet in either its full or half duplex flavors. And today - 8 Gbit FC is slower than 10 GigE. It is cheaper, but for how long?

An Emulex VP explained at a recent conference that enterprise shops have well-developed processes for managing FC SANs. FCoE enables shops to continue using those processes minus the fibre. The problem: FCoE won’t be ready for volume deployment until 2010 - if you believe the current schedules.

Any technical problems could easily drop FCoE into 2011, leaving Emulex, Qlogic and Brocade with a 3+ year chasm to cross. The Emulex VP tried to sound enthusiastic about FCoE but wasn’t succeeding. Maybe his teeth hurt.

The StorageMojo take
Enterprise data center inertia is a powerful market driver. Witness the success of VTLs. It’s understandable: they have work to do. Can’t be overhauling the engines in mid-flight.

But Wall Street isn’t as understanding as StorageMojo. FC is topping out, so where is the growth going to come from for FC companies? Especially when new iSCSI, Infiniband and pNFS products are coming to market in the near term.

The current economic malaise will force companies to get tough on data center requirements. The “good enough” standard will be the only standard for apps that aren’t absolutely core to business success.

Comments welcome, of course.

P4P: smart, fast and easy P2P

March 16th, 2008 by Robin Harris in Architecture, Future Tech, Off-Topic, SAN, FC

The P4P working group demo’d their work Friday at the Distributed Computing Industry Association show in New York. Not only did they show 2-3x faster downloads, but they also cut the average number of inter-metro hops - the expensive kind - from over 5 to less than 1. Cool.

The P4PWG idea is that if P2P is both cheaper for ISPs and faster for users we will all have a happier Internet. Folks from the Yale CompSci department - Haiyong Xie, Y. Richard Yang and Avi Silberschatz - along with Verizon and Pando Networks, cooperated on the demo.

The P4PWG includes AT&T, Verizon, Pando, BitTorrent, Cisco and LimeWire among others. The cable companies are there as observers. The P4P work is an open standard with the hope that all ISPs and P2P networks will endorse it.

How does it work?
The tech papers aren’t available yet on the web, but this is what I’ve pieced together from an afternoon’s websurfing. Update: Wide-awake reader Paul found this P4P Overview on Ars Technica. Thanks Paul! End update.

P2P is network oblivious. When you start downloading streams they might be from anywhere, regardless of network cost. The problem is that big routers are costly and smaller routers are much cheaper, not to mention undersea fiber.

What P4P is inject some knowledge into the P2P network so peering decisions are made more intelligently. It looks like a network version of locality of reference.

Implementation
There are at least 2 ways to deliver network awareness to peers. Here’s one of them.

A peer-tracker (pTracker) and an Internet tracker (iTracker) are added to the P2P network. A peer requests peering information of the pTracker, which has knowledge of local (metro area) and recent non-local resources. The pTracker sends back an edited server list and the peer goes its merry way.

If the resources aren’t local and the pTracker doesn’t know the network topology, it pings the iTracker, which returns high-level peering suggestions. If locality of reference works as well in cyberspace as it does with other data the pTracker won’t be querying the iTracker very often.

It is expected that the pTracker will be maintained by the P2P network, while the iTracker could be maintained by the ISP, network or a trusted 3rd party. This should preserve help P2P user privacy, although the *Tracker names certainly won’t reduce user paranoia.

Guys, how about something less Big Brotherish? PeerServer and RoutServer? Just a thought.

The StorageMojo take
As file sizes continue their secular trend upward the need for P2P will continue to grow. By aligning ISP, telco and user needs for faster and more efficient P2P the P4PWG has pulled off a win/win/win situation.

A less obvious benefit of this work is on VoIP networks, which are also P2P. It doesn’t take much to degrade VoIP quality. To the extent that it enables improvement in P2P network node selection, the P4P project will benefit the rapidly growing population of VoIP users as well.

Kudos to the P4PWG and especially the Yale team.

Comments welcome, of course. Images courtesy of the P4PWG.

Apple’s Xserve RAID bites the dust

February 19th, 2008 by Robin Harris in Disk, SAN, FC

StorageMojo reported last June 19th a rumor that Apple’s Xserve RAID would bite the dust. And now, exactly 8 months later, they’ve pulled the plug.

I saw a wall of Xserves and Xserve RAIDs at NAB last year and they were, without a doubt, the prettiest server/storage combo in the world. Brushed stainless steel, blue LEDs and the symmetrical installation looked like Hollywood’s idea of a computer. (Although the server room in Live Free or Die Hard is even crazier.)

Replaced by the Promise Vtrak
Not as pretty but more functional. The Xserve RAID didn’t have dual-redundant active/active controllers with failover, so users had to rely on software mirroring. An OK solution, but not a great one.

Xserve RAID’s big advantage, other than great looks, was price. A quarter the price of other FC RAID kit.

But with the Promise Vtrak arrays, Apple can now quote $1.12 per GB in 26 TB chunks. Pretty good! On a 4 Gbit FC backbone, they can deliver 6 streams of 8-bit uncompressed HD video. Pretty fast!

The Promise kit is fully redundant with hot-swap components. Not the sort of thing that Apple should spend money engineering. And it looks like it is packaged in a nice Xyratex enclosure, the standard of the industry.

Update: One commenter assures us that Promise doesn’t use Xyratex enclosures. I guess there are just so many ways to stick 16 drives into a 3U 19″ rack.

There also seems to be some angst over the apparent outsourcing to Promise as opposed to the Apple label Xserve RAID. Make no mistake, Apple outsourced the Xserver RAID as well to someone who did Apple’s industrial design. With Promise they are just making that apparent, probably because they get a better deal. But you still buy it from the Apple store, not Promise.

As an aside, Steve Jobs has many fine qualities, but his appreciation for how storage can extend Apple’s business is on a par with Scott McNealy’s - i.e. clueless. So it goes. End update.

The StorageMojo take
This move strengthens Apple’s thrust into professional video production and film editing. Their software-only competitors should be sweating, since Apple keeps throwing more functionality into Final Cut Studio, like Color, for very competitive prices.

With the release of Final Cut Server, expected shortly, Apple will have a storage-intensive software infrastructure that should meet the needs of many TV, cable and production studios. With low-cost storage they only make the business case more persuasive.

Apple will be moving a lot more terabytes this year.

Comments welcome, of course.
Update 2: I’ll be adding the Object Matrix price list to Price Lists shortly. They’ve built a cluster storage solution for Apple’s Final Cut Server archives. If you are waiting impatiently for Final Cut Server to ship you’ll want to check them out. End update 2.

Disk-based archive vs disk-based storage

January 27th, 2008 by Robin Harris in Information Management, SAN, FC, Security & Public Policy

What’s the difference?
I came across a thoughtful essay on the “Top Ten Differences between Disk-based Archive & Disk-based Storage” in the MatrixStore blog. MatrixStore is a Mac cluster-based disk archive for Apple’s to-be-announced-RSN Final Cut Server.

MatrixStore is focused on one market segment - video content archiving - but their comments seem to be generally applicable. With 2008’s likely focus on the disk-based backup and archive market, it is worth starting the conversation now.

Key points
SANs aren’t designed for archiving.

Reason 1.

If you are archiving your data, it’s probably because you don’t want to lose it.

Raison d’etre for a disk based archive? To keep data - safe. For a SAN? Speed of delivery, QoS… You wouldn’t put 256 bit delivery checksums into a SAN; SANs cut corners on flushing to disk; SANs don’t build in search or audit-trails, or security; SANs can down completely because of single-points-of-failure in the hardware; a bad software update in a SAN and…. Don’t do it. With nursing care and attention they can run fine for years, but they are inherently tightly coupled, software version sensitive, high maintenance, error prone and hardware technology dependent… even if they are brilliant at fast storage and delivery of information…

A disk-based archive must be: loosely coupled and free from dependencies between hardware components on independent nodes (surely the greatest example of a loosely coupled solution is the world-wide-web; you have no fear on the www that a server going down, say, hosting an IBM site, is going to bring down another in Cupertino!); free from requiring constant latest updates to software/firmware; able to guarantee safe delivery and storage of data; and basically, able to safely, securely store and protect data for year upon year, without complications, manual intervention, spanners…

Archives must be engineered for easy adoption of new technology
In storage everything is cheaper next quarter. So why buy now?

Reason 2.

There’ll be bigger, better, cheaper, more efficient disks in 2009, and in 2010, and in 2011…

Will there be bigger, better, cheaper, more energy efficient storage devices coming out this year, and every year that follows? Yes, of course there will be.

In your SAN do you have to mirror between like-sized devices? What happens when one of those devices goes down in 2 years time? Do you end up throwing away the good device? In your SAN can you bolt on new technologies as they arrive; holographic disks that store 10TB a shot, or new fibre connectors?

In ZFS can you decommission a part of a storage pool, replacing it with new storage devices without significant bleeding edge techniques and without disrupting the rest? Ideally, it be great to bolt new technology into an archive, as and when they arrive, rolling out old technologies if they reach the point of diminishing returns; to be able to do that whilst always seeing a single archive storage cluster; and without a maintenance or data migration headache; or should I say; without risk. A disk based archive can achieve that, if selected carefully.

Vendor handcuffs
Long-term storage and proprietary products don’t mix. Along with upgradeability-in-place, this should be high on customer checklists.

Reason 3.

Vendor tie-in is more like Vendor hand-cuffs.

OK - this isn’t strictly about SAN vs Disk based archiving; but fact of the matter is that most SAN/any other disk-based storage solutions tie you in to a particular vendor, which is great when they are supplying the ‘best-in-class’ solution of the moment at time of purchase, but not quite so clever when you come to upgrade that solution a year down the line and they aren’t offering the best in class anymore.

The archive should be vendor independent otherwise, for many reasons, you’re just creating tomorrow’s headache with a solution from yesteryear.

Stability and security

Reason 5.

Viruses. Hackers.

Choice one:

“out of the box” configured with encryption, firewalled, data locked down, all access to data routed through PPK, all maintenance functionality requiring 256 bit passwords.

Choice two:

bolt on each of the above to your favourite SAN/filesystem. Wait five years as your conglomerate of software solutions evolve (along with the workforce) and cross fingers. A disk-based archive must be secure out-of-the-box.

There’s more, of course, and if you are interested please read the whole essay and respond here with your thoughts so every one can see and respond.

The StorageMojo take
EMC’s upcoming backup and archive cluster, code-named Hulk/Maui (HW/SW), will drive a lot of customers to think about this topic. Of course, EMC’s famously disciplined sales force will scrupulously limit Hulk/Maui sales to B&A applications for the first several months weeks days hours after its release. Once the customer utters the magic word “Isilon” Hulk/Maui will suddenly be ready for enterprise use.

[I hope someone has mentioned this to the Maui engineers: forget about summer vacation.]

Disk-based backup and archive is a fast growing application with very different requirements from SANs, arrays and fast NAS boxes. Data migrations will be increasingly infeasible. Management has to be stoner-on-the-night-shift-proof. And the data can’t be held hostage by proprietary standards.

Companies do discontinue products or go bankrupt, after all.

Comments welcome, of course. Anything else?

Brocade’s ex-CEO nailed for fraud

August 7th, 2007 by Robin Harris in SAN, FC, Security & Public Policy

Found guilty on all 10 counts
Greg Reyes, former CEO of Brocade, was convicted today of all 10 counts he was charged with of criminal securities fraud for backdating stock options and lying about it in a San Francisco courtroom. He faces 20 years in prison.

Mr. Reyes made some $380 million dollars off Brocade during the dot com boom. What investors didn’t know is that if he had followed the proper accounting rules Brocade’s $67 million FY2000 profit would have been a $950 million loss, at least on paper. Options are a non-cash expense, but so are a lot of other things that show up on income statements.

None of the backdated options went to Mr. Reyes
His defense claimed that he was a sales guy and didn’t understand accounting enough to know that backdating options and falsifying board minutes was a no-no. But the “everyone was doing what I didn’t understand” defense failed to persuade the jury.

As I noted last July

I’m no lawyer, but given that Mr. Reyes sold $380 million of Brocade stock while investors believed the company was profitable, maybe the hope of “enrichment” clouded the man’s judgment.

Evidently a similar thought occurred to the jury.

The StorageMojo take
This has no impact on the Brocade of today, other than their culture is a direct descendent of the company that Mr. Reyes built. Like EMC, Brocade was a sales-focused culture with a “whatever it takes” mentality. They achieved fast growth for a time but are floundering because they handed their future over to storage OEMs who could care less if Brocade lives or dies. Their strategy is in worse disarray than EMC’s while their core fibre channel business is starting to decline.

I hope they can turn it around, but I’m more than dubious. Most of the world doesn’t need fibre channel and there are better places to buy Ethernet and Infiniband.

Comments welcome, as always. If Brocade is stronger than I know, please elucidate.

Long-haul Infiniband

July 25th, 2007 by Robin Harris in Architecture, Clusters, Future Tech, SAN, FC

I’ve liked Infiniband ever since I learned about it at YottaYotta in 2000. The switches are fast and cheap, the latency very low and the bandwidth - 6 GB/sec full-duplex at 12x - stunning. (Cisco has an excellent technical overview introduction here.)

One thing it didn’t do, though, was handle distance. Even fiber-based IB was limited to a few hundred meters. A great computer room interconnect, but not so good for the disaster-tolerant configurations that YottaYotta’s cluster-based RAID controller was hoping to address.

YY made due with gigE links, and managed some impressive demonstrations of terabyte long-distance data transfers. Just the thing for a long weekend at the lake.

Of course, there is a downside
Infiniband was designed to be more a fixed resource like Fibre Channel than an easy-come, easy-go WAN like Ethernet. Five years ago the management was less than optimal. Some 3rd-party tools were available from Voltaire - hey, guess who’s going public! - but most folks ended up writing their own management. But if you want an “always on” network this isn’t a big problem.

Putting all one’s eggs in one basket was something that always concerned me. A single data center, no matter how well-built, is asking for trouble. I mocked up this up to dramatize the issue:

eggs

Ideally, Infiniband would at least offer metro are networking for redundancy. I don’t think you can buy it yet, but long-haul I-band may be coming.

Enter Obsidian Research
Meanwhile, up in northern Alberta, one of YY’s former whizzes, David Southwell, formed Obsidian Research, dedicated to taking I-band long-haul. The company says

Longbow XR allows arbitrarily distant InfiniBand fabrics to communicate at full bandwidth through 10Gbits/s Wide Area Networks. The WAN connection is managed out of band, and except for flight time induced latency is transparent to the InfiniBand hardware, stacks, operating systems and applications.

XR achieves flow control by shaping WAN traffic and managing buffer credits to ensure extremely high efficiency bulk data transfers — including RDMAs — making the system a highly effective transport mechanism for very large data sets between geographically separated InfiniBand equipment.

In switch mode, Longbow XR looks like a 2-port switch to the InfiniBand subnet manager. A point-to- point WAN link presents as a pair of serially connected 2-port InfiniBand switches spanning the conventional InfiniBand fabrics at each site. A single subnet spans the Wide Area Network connection, unifying what were separate subnets at each site.

Longbow XR also provides an InfiniBand router mode — improving global system manageability, scalability and robustness. In this mode, each site remain separate subnets, with independent subnet managers, easing possible security and performance concerns associated with remote subnet management. 4x SDR InfiniBand provides just 8Gbits/s of data payload bandwidth; two totally independent Gigabit Ethernet links are also encapsulated across the WAN link to make full use of the extra bandwidth.

Longbow XR communicates over IPv6 Packet Over SONET (POS), ATM, and 10Gb Ethernet, as well as dark fiber applications.

Southwell is one of the smartest hardware engineers I’ve ever worked with. If he says he can do this, I’m willing to believe he can, given enough time. And if he’ll stop “improving” it and just ship.

The StorageMojo take
I-band has knocked about the industry for some time, a solution looking for that special problem that would provide volume and profits. With the growth of clusters - compute and storage - I believe it has found its niche. Long-haul I-band doesn’t solve distance latency problems, but it sure can move boatloads of data. As Google and others reach for 100x scaling, long-haul I-band could be a helpful tool.

Comments welcome, of course. What is the state of Infiniband today?

3Leaf’s virtual I/O system

April 29th, 2007 by Robin Harris in Enterprise, SAN, FC

First there was data center consolidation, then server, then storage and now . . .
I/O consolidation.

Stealth no more
3Leaf is exiting stealth mode, officially, tomorrow, but they put up their new website over the weekend, so they’re fair game. Incorporated with a $0.5 Million seed round from Storm Ventures in June 2004, raised an A round of $12 Million in April 2005, first betas last May, cleared a $20m B round in September, and are just about ready to start selling product.

Two part harmony
They have a two part product strategy:

  • First stage product provides I/O consolidation
  • Second stage product provides compute and memory consolidation

They do for I/O what VMware does for servers
Simple cost displacement model: 3leaf claims they’ll give you the same IO for half the price, so the ROI is a matter of weeks. How does that work?

Most servers aren’t working very hard, which means their I/O isn’t either. The idea is to reduce the number of direct server connections to costly enterprise class FC or ethernet switches by consolidating the IO of many servers to a single IO gateway. 3Leaf’s box offers 7 PCI express (and one HT) slots to stick standard HBAs into, while the box connects to the servers through either ethernet or Infiniband. Think backplane extender.

Also think diskless, stateless servers. According to their website, the virtual I/O server:

. . . replaces each compute node’s storage and network I/O with a single, high speed, redundant, and fault tolerant fabric, converting each compute node into a diskless and stateless commodity server with centrally managed bandwidth. Thus, each compute node costs far less to put into service, allows for better control of network bandwidth, and is more flexible, robust, and reliable.

The 3Leaf V8000 associates the I/O state (connectivity, security, quality) with the OS, not the physical server hardware. 3Leaf supports Windows and Linux, requiring a kernal driver for Linux and a Windows equivalent as well as working with VMware virtual servers. The standard configuration is a dual-redundant two-box system.

The next step: Torrenza
I hadn’t heard of Torrenza before. It’s an AMD initiative that puts an empty chip socket on their mobo’s so third parties, like 3 Leaf, can add stuff. From the AMD website:

The Torrenza Innovation Socket enables OEMs who develop their own processors to take full advantage the x86 operating environment. This new approach enables OEMs to consolidate server offerings for multiple processors to potentially a single platform, reducing datacenter disruption and deployment costs for customers.

Leading server OEMs that develop silicon, including Cray, IBM, and Sun, have endorsed Torrenza as an open innovation initiative.

3Leaf is working on a chip for that socket, which will enable them to place those stateless, diskless servers in a virtual server warehouse. As an application needs more stuff, the 3Leaf mojo will dole it out in real time. Sounds cool to me.

The StorageMojo take
3Leaf’s CEO Bob Quinn is clear that this is for enterprises, not scale-out internet data centers. Also for data center service providers, for example Savvis, a 3Leaf beta customer. Which tugs on some interesting threads:

  • Enterprise computing growth is slower than Moore’s law - so consolidation is the new normal for the enterprise.
  • 3 Leaf is yet another shot across the bow of costly FC and ethernet infrastructure. As Bob later wrote to me: “3Leaf will increase SAN utilization through economic connectivity for commodity X86 servers.” True enough. Which also says that fewer ports will be sold in 3Leaf equipped data centers.
  • Is somebody going to wake up and say, “Why the hell don’t we build an OS that can handle lots of concurrent applications?” It just feels like virtualization is fixing problems created by hinky software and 30 year old paradigms. Anyone for a clean-sheet re-think of application environments?

I’m keeping an open mind about Torrenza - Intel seems to have a parallel idea as well - and how it might play out. On its face, a great idea. Yet it could also be seen as turning a commodity into a non-commodity. Which is obviously the intention, and could be its undoing.

If smart people develop cool stuff and market it well, this could offer value-add that people flock to. On the other hand, if everybody treats it as an opportunity to shake more money out the enterprise they might be disappointed.

Sure, the decline in enterprise server cost frees up money for other toys. But CFOs aren’t ignorant and it won’t be long before they are trying to squeeze savings from the shrinking IT infrastructure. 3Leaf is well-positioned to catch that wave.

Update: Bob went through the article and offered some additions and clarifications that improve the information value of the post. I’ve added some but not all of them. Anything that is still unclear or stupid is my fault. I did check their website for whether 3Leaf has a space or not and got that wrong all by myself.

Comments welcome, as always. Seems like there are several topics here worthy of thoughtful dialogue.

Mo’ better ZFS performance stats

April 24th, 2007 by Robin Harris in Enterprise, Future Tech, SAN, FC

I don’t plan this stuff, it just happens
Guess this is turning into ZFS performance week at StorageMojo. Now we get to see how HW RAID 10 compares to software ZFS RAID 10.

This comes in a fine report from Robert Milkowski’s milek’s blog. It is part of a continuing series.

Comparing hardware and software RAID
This time Robert compares ZFS to a mid-range EMC Clariion CX3-40, both running on a 16 GB Sun x4100 M2 server with two dual-core Opterons. The CX3-40, which EMC positions as “High performance and capacity for high-bandwidth applications, heavy OLTP or e-mail workloads. . . ,” is connected by dual 4Gb FC links to the 4100 and sports 40 4Gb FC 15k 73 gig drives.

That’s a rig with some scoot.

He also throws in a Sun x4500 (Thumper) as a local disk comparison.

What’s the layout?
Robert configured four test stands on the hardware.

  • Hardware RAID: four 10 drive RAID 10 LUNs, two on each controller, with “pci-max-read-request=2048 set in qlc.conf”. Did I mention the tests are running on Solaris 10? ZFS used the four LUNs in its storage pool.
  • Software RAID: the Clariion presents 40 individual disks, 20 on each controller.
  • Software RAID/Q: same as above with “The CX3-40 is set up with 4 10 drive RAID 10 LUNs, two on each controller, with “pci-max-read-request=2048 set in qlc.conf” which must mean something.
  • x4500 SW RAID: a ZFS RAID 10 pool set up across 48 7200 rpm, 500 GB drives.

Note the random write performance? 44 slower spindles beat 40 fast ones. Hm-m-m.

If you care about ZFS you should read the whole thing
But here’s some eye-candy to whet your appetite:

ZFS_Graph

An IT product that isn’t fully baked day one? How can that be?
Honest fellow that he is, Robert can’t help noting that while ZFS performance is excellent, there are some feature deficits that the ZFS team is hard at work on:

The real issue with ZFS right now is its hot spare support and disk failure recovery. Right now it’s barely working and it’s nothing like you are accustomed to in arrays. It’s being worked on right now by ZFS team so I expect it to quickly improve. But right now if you are afraid of disk failures and you can’t afford any downtimes due to disk failure you should go with HW RAID and possibly with ZFS as a file system.

The StorageMojo take
Back when RAID was young, dedicated hardware made sense because CPUs were too pathetic for words. Nor had anyone done a clean sheet design like ZFS. What Robert demonstrates is that a software RAID implementation is capable of delivering the performance of hardware RAID today. That is very good news for anyone on a budget.

RAID arrays may escape becoming commodities. They may simply become irrelevant.

Update: Robert commented a correction which I incorporated above and which just strengthens the point.

Comments welcome, of course.

Hot new 10Gb switch will shake up storage networks

April 17th, 2007 by Robin Harris in Clusters, Enterprise, SAN, FC

This morning Woven Systems announced their new 10 Gbit Ethernet switch. I named Woven “coolest hardware” at last years Datacenter Ventures conference. Harry Quackenboss, their CEO, promised they’d have the switch working in six months. Well, here it is a mere seven months later, and they’ve done it. My hats off to the engineering team.

Now let’s get into Woven’s Mojo.

I’d rather switch than fight
The switch is unique is several respects:

  • 10 Gigabit ethernet only
  • Up to 144 non-blocking ports on a single switch
  • Up to 4,000 non-blocking ports in a fabric of Woven switches
  • Built from commodity parts - with one vital exception
  • Low-cost
  • The killer feature: active congestion management
  • Uses standard ethernet protocols

What is it going to kill?
It shouldn’t be a surprise that fibre channel has some features that storage systems find really useful. After all, FC was developed as a storage interconnect. So it has bandwidth, flow control, low latency and rapid failover.

Gigabit ethernet lacks in all these areas: limited bandwidth; lost packets in congested networks; high IP latency; and failover that is too slow for storage drivers to manage.

It looks like Woven has solved 3 of the 4
Woven’s secret sauce is built into an ASIC that sits in front of the commodity 24 port ethernet chip (picture helpfully provided by Woven).

Woven switch blade

The vScale Packet Processor - I don’t know what the “v” stands for - inserts low-overhead probe packets into the data stream, which the vPP at the other end of the stream, be it in the same switch or one across a fabric, bounces back, so the originating vPP has a real-time view of path latency. In milliseconds or less. It works across a fabric of up to 4,000 ports, ensuring that QoS even as the fabric grows.

That’s pretty cool, but the coolest thing is this:
When path latency is too high, the vPP has two tools it uses to manage the latency.

  • It can change to a less congested data path in less than 10ms
  • It can pause the HBA using a standard ethernet protocol

I know what you are thinking:
Wow, path failover in 10ms - drivers won’t even notice.
And
Pausing HBAs when congestion strikes is flow control for ethernet - a process FC handles with buffer credits.
All done using standard ethernet protocols, albeit creatively.

That bell you hear is tolling for Fibre Channel, which is about to meet its toughest competitor yet. Which may be why the FC over ethernet proposals are gathering steam in the T11 committee. Adding FC’s low latency protocol to a very fast and reliable 10 Gb switch adds real value and helps protect existing FC investment. Could be a nice win for all involved.

The StorageMojo take
I’m sure all the usual Internet Data Center suspects are lined up to beta Woven’s switch. Linking several hundred thousand servers via ethernet requires a lot of bandwidth, and 10GigE delivers. For the massive storage clusters it is an even bigger win: lost packets are still a pain even if the cluster can survive them.

If everything works as advertised, FC’s decline may be faster than forecast, at least among the large enterprise base that can use a switch of this size. Woven’s switch will be a shot in the arm for big clusters and the people who build them.

Update: I’d inadvertently left out the fact that you can cross-couple the switches to create a 4,000 port fabric so I’ve added it.

Update II: Harry, Woven’s CEO, helpfully added some budget pricing for all you folks with new fiscal years starting mid-year - like the Cisco tear-down guys - and I couldn’t just leave it buried in the comments.

Pricing will be finalized when general availability is announced (planned for Q3 2007), but a 144 10GE port configuration will be about $1500/10GE port, with fully-redundant fans, power supplies, and management cards.

Compare that to Cisco’s current $23k/port pricing and Riverstone’s very aggressive $10k/port pricing for full speed 10 Gb and the term “disruptive technology” just leaps to mind.

Comments welcome, of course. I spent six hours at NAB today and drove over 1,000 km, so moderation may be a bit sluggish today. Me too.

Dear Uncle StorageMojo

February 16th, 2007 by Robin Harris in Clusters, Enterprise, SAN, FC

There’s always a first time
Sometimes it is great and sometimes, not so great.

Advice to the SAN-lorn
I’m asking StorageMojo.com readers to help this gentleman with his first SAN. I’ll kick off with my take after his letter. He didn’t ask to be anonymous, but out of respect for his heartfelt plea for help, I won’t identify him.

Should there be anyone who has cause why this couple should not be united in marriage, they must speak now or forever hold their peace.

I’m buying a SAN. It’s to make our poor overloaded database server go a bit faster. We’re also going to store our growing archive of 30 million jpgs on it.

After spending what seems like the last 6 months reading white papers, I’ve decided to go for five shiny new Lefthand Networks HP DL320s nodes. 3 nodes have 12*15K SAS drives for the database, 2 nodes have 12*750GB drives for the storage.

We’re a small company, and this is by far our biggest purchase ever - we can’t afford to get it wrong!

Oh how I wish there was a bit more off-the-record chat about this stuff… It seems every word I’ve read in the past months has been written by a vendor. I’ve never read about a problem with anything. Do all SAN’s work perfectly all the time, or are there often problems? Has anyone ever been disappointed with their SAN purchase?

Anyway - my reasons for choosing the Lefthand stuff are:

  • (They claim) random IO performance scales linearly as you add nodes (e.g. 3 nodes = ~5,000 IO/s, 6 nodes = ~10,000 IO/sec, 9 nodes = ~15,000 IO/sec)
  • * Unlimited capacity scaling by adding nodes
  • * Snapshots, remote copy, easy management, thin provisioning etc.
  • * Low initial costs (~$25,000 per node) - When I say “low”, I guess I mean “almost impossibly high, but lower than the big players”.

Please uncle StorageMojo, am I doing the right thing? Maybe if you posted this on your most excellent blog, your readers might have some advice?

Readers, hear this man’s plea! Please respond in the comments.

The StorageMojo take
Dear USM, you appear to be suffering from anticipatory buyer’s remorse. I commend you. Far better to suffer it now than after your check has cleared. That said, all I’ve heard about LeftHand is that their stuff works well. It does seem a little pricey, though.

I can assure you that many people have been disappointed in their SAN purchases. How to ensure you aren’t the next one is the question. You are thinking about the future and the growth of you application, both good things. Here are some questions.

You didn’t detail how you decided that storage was the problem. I’ll assume you’ve looked at a faster server, more RAM and tuning the database. You don’t mention much about your workload. Thirty million jpegs sounds like a photo-sharing application. Depending on how big the average jpeg is and how visitors use the system, you might be more bandwidth limited than IOPS bound. Do you understand how much load GigE will handle? How much server overhead will be generated by the iSCSI protocol? Are you buying TOE-based HBAs?

StorageMojo readers, please comment. I’ll be interested to hear what you have to say myself.

PS: the subject line of his email was “Dear Uncle StorageMojo” - which got me started on the whole “advice to the lovelorn” theme.

Whipping Out the Checkbook for Isilon

January 22nd, 2007 by Robin Harris in Clusters, NAS, IP, iSCSI, SAN, FC

With the IPO completed, interest in Isilon among StorageMojo readers has been growing. So I thought I’d take a gander at their pricing and see how it stacks up. I’m fast tracking this project - doing the writing and analysis concurrently - so when you get to the end you’ll learn what I didn’t know at the start.

Using the handy StorageMojo Isilon Price List I put together an Isilon system using their top-of-the-line 6000 series nodes. Like most storage vendors, Isilon doesn’t actually provide the information required to configure their systems. I can see why IBM doesn’t, but a new vendor like Isilon should, since most of their customers are fairly knowledgeable and control freaks as well.

Just like Legos, only not as colorful
Building an entry-level Isilon cluster is pretty easy, given that I don’t know any better. You need three:

IQ6000iIQ 6000i InfiniBand platform nodes @ $21,411 = $64,233

and then you need one:

IQSwitch - Flextronics 24-port Infiniband switch @ $7,609

and some cables, let’s say 12 (I’m a little hazy on Infinband cables, (well, right NOW I’m a little hazy on everything, due to the post-prandial libations) but as I recall they are 2.5Gb each, so if you want, say, 10GB, you need four per node) [Update - a couple of alert readers assure me that the IB cables are 4x, so I've corrected the following calculations.] but who knows, maybe these are 4x cables and Isilon is just being coy:

5 Meter InfinBand cable @ $239 = $717

and, of course, what would hardware be without the noble leavening of software? A moldering hunk of inert metal, you say? So let’s add the “OneFS File System”, that, in words that would do Hopkinton proud:

OneFS® is Isilon’s patent-pending operating system software that provides the intelligence behind all Isilon® clustered storage systems. It combines the three layers of traditional storage architectures - file system, volume manager and RAID - into one unified software layer, creating a single intelligent file system that spans all nodes within a cluster. OneFS combines mission-critical reliability and high availability with state-of-the-art data protection to help storage administrators worry less and do more.

Call me crazy but doesn’t that sound a bit like ZFS? Naturally, despite my scepticism about architecture-based evaluation, I’d like to know how OneFS actually handles large numbers of small files, since it was built to handle large media files - the founders are from RealNetworks.

Of course the patent abstract (#7,146,524) is a little less breathless:

The intelligent distributed file system enables the storing of file data among a plurality of smart storage units which are accessed as a single file system. The intelligent distributed file system utilizes a metadata data structure to track and manage detailed information about each file, including, for example, the device and block locations of the file’s data blocks, to permit different levels of replication and/or redundancy within a single file system, to facilitate the change of redundancy parameters, to provide high-level protection for metadata, to replicate and move data in real-time, and to permit the creation of virtual hot spares among the smart storage units without the need to idle any single smart storage unit in the intelligent distributed file system.

So for three nodes we’d need three copies of OneFS:

OneFS 6000 platform software license for Isilon IQ 6000/6000i product (non-transferable) @ $16,376 = $49,128

So for a mere $49,128 + $717 +$7,609 +$64,233 = $121,687 you’ll have 18 TB of cluster storage. Just $6800 a TB!

The StorageMojo take
Isilon folks, feel free to comment to make any corrections. Yet somehow, this doesn’t feel like the answer to 1 PB, or even 100 TB storage. Why? Well, let’s compare to Sun’s X4500 (Thumper) that is about a quarter of that price. Granted, not clustered, nor as easily managed, yet, it just seems like for really massive data stores, the price should be closer to disk costs.

Comments welcome, as always. Moderation turned on to feed my megalomania. Or to keep spammers at bay.

Enterprise IT: The Elephant’s Graveyard

January 15th, 2007 by Robin Harris in Enterprise, SAN, FC

The elephant’s graveyard is a metaphor for a place where old ideas go to die. For IT it is the flipside of the consumerization of IT.

The seductive glass house
As Sun’s troubles over the last few years have emphasized, enterprise IT is a Venus flytrap market: once you get into it, it can be very difficult to get out. What makes EIT so seductive?

  • High margins
  • Big budgets
  • Low turnover

These conditions are the reverse of the consumer market, where margins are typically half of EIT’s, budgets are small and turnover of new technologies can be quite rapid.

Let’s look at Sun Microsystems
Sun started out as a commodity workstation company. Motorola 68000 processors, Seagate disks, Berkeley Unix, somebody’s ethernet, Sun did little more than design the circuit boards and packaging that everything plugged into. By not investing in manufacturing plants, semiconductor fabs and engineering and testing proprietary products, Sun achieved a sales per employee that was the envy of the industry in the mostly proprietary ’80s.

By the early ’90s though, the PC was starting to nip at Sun’s heels. Much less powerful, but much cheaper, PCs higher volume made them much more attractive to budget-conscious buyers and to market-conscious software developers. Sun began a long climb upmarket.

The upmarket is very pleasant - as long as it lasts
So workstation sales are getting harder, and meanwhile your very best customers are asking for larger and more powerful machines, network servers, to run the software they already know. These larger machines are more profitable and allow your salesforce to concentrate on fewer accounts to generate more revenue and profit. Engineering and marketing find it easy to justify fun new technology since a 10% goodness increase on a $500,000 machine is worth $50,000, while on a $1,000 machine it is worth $100 only if the customer is knowledgeable enough to notice. Which they aren’t, thank you very much.

Life is good. Until the same forces that chased you out of your original market start nipping at your heels again. You can maybe move a little more upmarket, as Sun did with E10000 series in the late ’90s, but eventually you are standing on the top of the mountain. And there is nothing left to climb. Now you have to climb down, a much less pleasant journey than the trip up.

IBM, EMC, NetApp have all been there
IBM faced the same problem in the early ’90’s as the company cratered under John Akers. High margins are extremely addictive and IBM had the shakes, bad. IBM was propped up by the high margins on its mainframes and storage, but mainframe demand was slowing and Amdahl, EMC and other plug-compatible vendors were growing. Lou Gerstner came in, saw that IBM had created a price umbrella under which its competition prospered, and cut prices big time. The umbrella snapped shut and the PCMs started dying off. IBM’s storage group didn’t respond to the RAID revolution fast enough (see Daddy, tell me again how little EMC beat giant IBM . . .) and suffered a major market share loss.

EMC rode the big iron gravy train through the ’90s, until the dot-bomb implosion. Not only did high-end demand shrink, but hundreds of millions of dollars of like-new storage came on the market at liquidation prices. EMC’s growth reversed abruptly. Since then EMC has sought to become a software company AND has moved into the low-end of the storage market throught its Dell channel. I spoke to an EMC engineering manager who’d worked on their first $5,000 product, and he told me that while customers were enthusiastic about the new price point, they were wondering when the $2500 version was coming out. Now that you can buy a 1 TB NAS box for $600, I’m sure the pressure is only growing.

NetApp, which has been growing fast in the enterprise, unveiled last year their $5k entry-price StoreVault product line. I doubt they are selling many entry-level systems, so NetApp is still missing the low-end of the market, which I would guess is the fastest growing part. But at least they are in there pitching.

The StorageMojo take
As I look around the storage landscape today, I see many companies focused on the enterprise, solving problems that only enterprises have. This isn’t a bad thing, and Lord knows enterprise IT can use all the help it can get. Yet I don’t expect those companies to be the NetApps or Veritas’s of tomorrow. The folks who solve Internet Data Center storage problems will do well, as will the folks who grow up out of the consumer media/SOHO storage businesses. Both markets build companies that thrive with low margins and high volume. They’ll be nipping at the big guy’s heels very soon now.

Comments welcome, as always. Moderation turned on to make spammers work a little harder.

Brocade Buys Into IP SANs

January 11th, 2007 by Robin Harris in Enterprise, NAS, IP, iSCSI, SAN, FC

Brocade announced it is buying Silverback Systems, makers of a low-cost IP accelerator chip

Silverback has been getting some traction, as I learned when I had dinner with a couple of Silverback worthies at SNW.

My pity turned to respect
When I agreed to meet with them, the thought balloon over my head was “these poor guys - another expensive TCP accelerator that is going nowhere”. But they assured me they had a low-cost accelerator - TCP/IP offload engine or TOE. OK, that is a cool thing, especially with the advent of 10G ethernet.

Fibre Channel won’t disappear, but . . .
Ethernet-based SANs, IMHO, are the larger market based on lower cost and performance that overshoots what most users need. Plus lower management costs. It looked to me that Silverback might have the silver bullet to make ethernet SANs the standard.

So y-n-ell would Brocade want them?
Brocade is the #1 FC switch vendor. TOEs go on host-bus adapters. Why would Brocade want to go in the cut-throat HBA business?

Baseless StorageMojo theorizing
Several scenarios might explain it. Sharpen Occam’s razor.

  • Brocade believes SAN market going 10 Gig E. Doesn’t see itself beating Cisco in E’net switches, decides to go beat up on Qlogic and take their HBA business.
  • Brocade tired of Qlogic trying to commoditize the FC switch business. Moving into HBAs is a shot across Q’s bow. Brocade has all the OEM relationships Qlogic does, so why not?
  • Brocade playing bigger game: believes that end-to-end ethernet SANs are the next big win; Silverback gives them the endpoints; next move, a big honking ethernet switch, add storage features to same, kneecap Cisco in the ethernet SAN business.

Or maybe they just assumed they’d think of something
I like the third option best. Not sure the “add features, create lock-in” model will work again, but I suppose it depends on the features. Any ideas, readers?

Comments welcome of course. Moderation turned on to minimize on-line Cialis prices.

Power, cooling & IOPS: Will power kill the 3.5″ drive?

November 6th, 2006 by Robin Harris in Enterprise, Future Tech, SAN, FC

Are enterprise drives worth the power?
Data center power consumption is getting a lot of press play lately. The issue: increased density means that a data center rack that used to need a 2kw may now need 6kw. And for every dollar spent to power equipment, another 40 cents is spent on cooling that equipment. It is a real problem.

Storage: power hog?
Most of the published press concern seems to focus on pizza boxes and blade servers, not big iron arrays, probably because Google raised the issue years ago. Yet EMC is responding with a couple of technical marketing papers (see Power Efficiency and Storage Arrays and EMC Symmetrix DMX-3 Electrical Power Estimation and Configuration Planning. HDS doesn’t seem to have any papers addressing the issue.

Feets, do your stuff
Nice tap dance by EMC in the latter paper. The simple fact is that all 3.5″ drives used in big iron arrays are power hogs. Fibre Channel alone adds about 2 watts per drive, according to Seagate’s Cheetah data sheet, which isn’t too surprising when you consider that FC drives have two interfaces. Even 7200 RPM SATA desktop drives use about 12 watts instead of FC 15k drives 16 watts. And it looks like SATA interfaces are slightly less power-efficient than PATA drives - on the order of 5% - while SAS drives are even worse - about 40% higher than SCSI. Oops!

So what is a “green” array vendor to do?
Seagate thinks it has the answer in their new Savvio 2.5″ enterprise drives. At 8 watts they use about 40% less power than equivalent 3.5″ drives. Of course, they are 70% smaller, so vendors will be able to cram even more drives into a smaller box, increasing power use per rack.

Maybe it’s time to get back to RAID
Striping was the original performance hack for disks. To my mind, the storage power problem isn’t going to get much better until vendors start using disks that were designed from the ground up for low power consumption: notebook drives. These average about 3 watts each and are designed to be shut down frequently. While each drive is pretty slow, stripe across a half dozen or so and the performance is better than you could buy 15 years ago. Didn’t people have databases and OLTP 15 years ago?

Or for a really radical solution. . .
Flash-based SSDs are even more frugal - at least in power - using a fraction of even the best notebook drive’s power. Perhaps the day will come when a data center will be cooled by a couple of room-size air conditioners and Samsung will be selling a couple of hundred million enterprise flash drives a year.

Comments welcome of course. And for all you Americans out there, don’t forget to vote tomorrow, Tuesday, November 7th.

A Big SSD Vendor Begs To Differ

October 23rd, 2006 by Robin Harris in Future Tech, SAN, FC, SOHO/SMB

The good people at Texas Memory Systems read StorageMojo.com and at my invitation offered this response to my post An SSD For The Rest Of Us. I think they did a pretty good job of laying out the, IMHO, historically under-appreciated RAM SSD. Naturally I have a couple of comments at the end.

*Dear Storage Mojo:*

*We are excited to see solid state disks get some coverage in your blog even if we might have different ideas about the DDR RAM solid state disk market.*

First, we like flash memory too. We like that it has high density and is non-volatile. It provides a great storage media for USB drives, cameras and even iPods. We would even want one in a notebook so that it will survive the fall a hard drive cannot, and the low power usage really adds value when you are running on batteries. The fact that all of these industries have adopted flash has led to a big drop in its price. The success of flash has cost someone some business… hard disk companies making microdrives. Interestingly, we have yet to feel any competitive pain from flash drive manufacturers.

It is worth pointing out that companies have been promoting flash memory systems for commercial business applications for years. Their prices have always been lower, their densities higher and they have always been non-volatile. Inspite of these factors, none of these companies have traction in the enterprise applications area that we sell into.

Why is that? Flash disks use flash memory and flash memory has some inherent limiting characteristics:

  1. it’s write speed is poor (comparable to disks)
  2. inspite of the best efforts of the industry it is still easy to wear out the write cycles, and finally
  3. the read speed is still a lot slower than our read speed

As an addendum to this note, we have included a description of the hard work a flash drive has to do just to get a write done.

Compare the performance specifications of the Samsung flash drive to the RamSan system:

RamSan peak IOPS: 400,000 (read or write, random or sequential)*

Samsung peak IOPS: 2,200 (Webserver IOmeter pattern) (reads only). Note that for the Database pattern it is actually lower then most single hard disks (~85 IOPS).

A flash disk in a notebook is hardly likely to wear out the write cycles from operating system boots and the occasional save of a document, but that same disk in an enterprise storage system will wear out well before the ROI is realized. If we could write 400,000 IOPS to a flash drive (which we can’t because they are way too slow) it would take us seconds to wear out the drive).

Flash systems will inevitably gain traction with the lower end of the solid state disk market but as the last few years have shown we are not really stealing market share from each other as much as we are both growing our respective markets. The flash drive companies are growing and the intensity of that market is reflected in the pace of acquisitions. Independently, Texas Memory Systems has grown its DDR RAM solid state disk market over 50% this year. The future of our combined markets has never been stronger.

Regards,

Texas Memory Systems

The StorageMojo take
TMS makes a number of good points. The most powerful, IMHO, is that their RAM SSD is a big honking fast machine. Stick 8 FC ports on one and you might actually come close to those 400,000 IOPS. There really is no substitute for a big RAM SSD if that is what you need. TMS is focused on the high-end and flash is not a threat there - today.

On the other hand I detect a bit of whistling past the graveyard bravado in the TMS reply. When a product comes along on the steep downward pricing curve that flash has - much faster than disk or even RAM - that offers some of your significant advantages - non-volatility, good performance (for some apps) relative to disks - at a fraction of the cost, you have to think hard about the possibility of substitute architectures emerging.

Conjecturally, such an architecture might replace the implicit centralized schema of the TMS FC products with a distributed, shared-nothing plan. Local attach is both faster and cheaper, so if a web server farm of pizza boxes were equipped with flash SSDs, one might see a sizable fraction of TMS performance at a much lower cost without the FC management headaches. Other, less obvious, options no doubt will be invented.

Once people see a much cheaper alternative to RAM SSDs, the juices start flowing and inventions occur. If I were TMS, I’d ask a couple of my better engineers to work part time on creative flash-based SSD architectures. Somebody will, it might as well be you.



Next Article »
StorageMojo RSS Feed May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007