StorageMojo




Robin Harris    


Cool Data, Cold Cache

February 13th, 2007 by Robin Harris in Clusters, Future Tech, NAS, IP, iSCSI

Geeky computer guy that I am, I have my machine instrumented with programs (Mac users: MenuMeters) that tell me all kinds of useless information. Network usage, memory usage, CPU load and, of course, disk activity.

Mostly all this stuff just tells me that the machine hasn’t crashed. But sometimes it tells me something surprising.

Cache out, laid-off, says he’s got a bad cough, wants to get it paid off - look out kid
Like my virtual memory page usage: pageins; pageouts; page faults; copy-on-writes; and cache hits and misses.

Get this: 5,292,427 cache lookups and only 32,860 cache hits - a measly 0.6% hit rate. Why bother?

What is “virtual memory” anyway?
If you know the answer, skip ahead.

Back 30 years ago, when RAM cost over $1,000 per MB, people were particular about how much they bought, even on big machines. Virtual memory extends physical RAM with disk capacity. Typically, least-used memory pages are swapped out to disk. If a document’s memory pages are sitting on disk, they get swapped into physical RAM once you start editing it again.

Data dynamics have changed
“Locality of reference” is the behavior that give cache -and virtual memory - their power. Locality of reference is the empirical observation that once a piece of data is accessed, it tends to be accessed again several times, maybe even hundreds of times. So it makes great sense to keep that piece of data close to the action until demand for it falls off.

That’s the theory. Yet if data accesses are near-random, you’ll see what I see: almost no cache hits. Which means the overhead of cache management is buying nada.

“Locality” doesn’t matter if you don’t “reference”
Data is cooling. Vast amounts of data are being stored as storage prices decline, and the number of data accesses per megabyte is steadily dropping. And that’s a good thing since disks accesses per megabyte are dropping too.

What I hadn’t thought about, and I haven’t seen discussed anywhere else, is the impact this change must have on system architecture. Much effort has gone into making cache mechanisms, including second and third level caches, virtual memory, system caches and disk caches, fast and efficient. Yet, if you use your system the way I do mine, much of this effort and overhead is wasted.

Expensive array assets are becoming less valuable
Many applications, such as databases, do exhibit high levels of locality of reference, and they probably always will. But for unstructured data, how valuable is it to spend good money on costly caches and the associated engineering for a resource that may return very little value?

The StorageMojo take
As scale-out storage architectures continue to evolve, engineers will need to look at the workloads they are designing for to determine the most cost-effective means of supporting them. The cost-adding “cache everywhere” architectures - disk, network, system, and more - may actually hurt performance while adding cost and complexity. It is another nail in the coffin of the traditional disk array.

Its something worth thinking about the next time you lay down cold hard cache cash.

Comments welcome, as always. Comments moderated, because moderation is a virtue, except in the defense of liberty.

I updated this article by shortening it and adding a gratuitous Subterranean Homesick Blues reference. My apologies to Mr. Dylan.

Isilon’s Cluster Technology. Pt. 2.0

January 30th, 2007 by Robin Harris in Clusters, Enterprise, NAS, IP, iSCSI

Metadata data structures
The basic insight of Isilon’s cluster is that they manage files on a pool of blocks. What we know as RAID levels exist on a per file basis, not per array. Unlike Google’s GFS, which only does file replication, Isilon does file replication through block replication and also offers file protection through parity protection. We see this in parts of the patent’s sample metadata structure:

Field Description
Mode Kind of file: regular, directory, etc.
Owner SSU account
Timestamp Last modification time
Size Size of metadata file
Parity count # of parity devices used
Mirror count # of mirrors
VHS count # of virtual hot spares
Version of metadata structure
Type of data location table
Data location table Address of or actual table
Reference count # of metadata structures referencing this one

What most arrays perform globally - hot spares, RAID levels, mirroring, recovery data - may be done on a per file basis with Isilon. Furthermore, there is a lot of flexibility in the data location structure - direct addressing and multiple levels of indirect addressing - to give the system multiple opportunities to optimize accessing blocks that may be widely scattered, especially in high performance or failure modes.

The data location flexibility is probably best seen in the ease of adding additional storage to an Isilon cluster - add an SSU and the pool of blocks becomes larger and the existing SSUs can start moving data based on their own needs.

Finally, the flexibility in the metadata structure file size and version number indicates that Isilon may add new fields as they see fit, building in new functionality with software upgrades.

Processes, processes, ad infinitum
At this point the patent goes into examples of all the processes that might be used to perform needed functions, such as data lookups and virtual hot spare provisioning. I’m sure Isilon engineers are always looking at ways to improve these essential activities, so the patent descriptions are of limited value.

The StorageMojo take
I generally consider architecture-based arguments dubious (see Architectural Appeal). Yet I also believe that storage has some secular trends (see Architecting the Internet Data Center) that one ignores at one’s peril.

There is a lot to like in the Isilon architecture, starting with their fundamental abandonment of the volume or LUN construct in favor of the storage pool. They realize that customers want to manage files, not disks. From that basic insight Isilon has put together a flexible product that is easy, by all reports, to manage and expand. I love the file-based virtual RAID capability, for one. Also, their price-neutral adoption of Infiniband is smart from both business and technical perspectives.

Where I wonder how they will play out comes from studying Google and Amazon. Isilon’s architecture buys its flexibility with a variety of resources, some cheap and some dear. CPU cycles are cheap and getting cheaper, so all the computation required for parity RAID and other functions isn’t a big concern. As a system scales even inexpensive components whose cost is a small percentage of the system start to become noticable in absolute dollars. At some point I would expect a system that doesn’t do all the computation the SSUs do would have a price advantage.

Network overhead is a bigger concern, as having data spread across multiple SSUs means there has to be a fair amount of coordination, data fetching, cache invalidation and so on. Isilon engineers are well aware of these issues, which is why they support Infiniband and before that, I believe, dual ethernets on each SSU and jumbo frames.

The biggest issue, IMHO, is the cost of the disk I/Os. Breaking a file across multiple SSUs means multiple I/Os to write and, more importantly, access a single file. Isilon concentrates on large (>1 MB) files to minimize this problem, yet this overhead must cost something. Bottom line: I suspect that Isilon has scaling problems, either in I/Os or economics due to their architecture. At what capacity these issues become apparent is beyond my ability to estimate. Readers?

That said, there is no reason that they can’t be very successful in the rather large space they have to play in. Unstructured data - files - are 85% of the data out there. There are lots of companies that would rather not manage a clumsy, LUN-based infrastructure for unstructured data.

I hope to look at Isilon’s business model through their IPO filings. They’ve had a successful launch, so it must look ok.

Comments welcome, as always. Moderation turned on to keep spam at bay.

Whipping Out the Checkbook for Isilon

January 22nd, 2007 by Robin Harris in Clusters, NAS, IP, iSCSI, SAN, FC

With the IPO completed, interest in Isilon among StorageMojo readers has been growing. So I thought I’d take a gander at their pricing and see how it stacks up. I’m fast tracking this project - doing the writing and analysis concurrently - so when you get to the end you’ll learn what I didn’t know at the start.

Using the handy StorageMojo Isilon Price List I put together an Isilon system using their top-of-the-line 6000 series nodes. Like most storage vendors, Isilon doesn’t actually provide the information required to configure their systems. I can see why IBM doesn’t, but a new vendor like Isilon should, since most of their customers are fairly knowledgeable and control freaks as well.

Just like Legos, only not as colorful
Building an entry-level Isilon cluster is pretty easy, given that I don’t know any better. You need three:

IQ6000iIQ 6000i InfiniBand platform nodes @ $21,411 = $64,233

and then you need one:

IQSwitch - Flextronics 24-port Infiniband switch @ $7,609

and some cables, let’s say 12 (I’m a little hazy on Infinband cables, (well, right NOW I’m a little hazy on everything, due to the post-prandial libations) but as I recall they are 2.5Gb each, so if you want, say, 10GB, you need four per node) [Update - a couple of alert readers assure me that the IB cables are 4x, so I've corrected the following calculations.] but who knows, maybe these are 4x cables and Isilon is just being coy:

5 Meter InfinBand cable @ $239 = $717

and, of course, what would hardware be without the noble leavening of software? A moldering hunk of inert metal, you say? So let’s add the “OneFS File System”, that, in words that would do Hopkinton proud:

OneFS® is Isilon’s patent-pending operating system software that provides the intelligence behind all Isilon® clustered storage systems. It combines the three layers of traditional storage architectures - file system, volume manager and RAID - into one unified software layer, creating a single intelligent file system that spans all nodes within a cluster. OneFS combines mission-critical reliability and high availability with state-of-the-art data protection to help storage administrators worry less and do more.

Call me crazy but doesn’t that sound a bit like ZFS? Naturally, despite my scepticism about architecture-based evaluation, I’d like to know how OneFS actually handles large numbers of small files, since it was built to handle large media files - the founders are from RealNetworks.

Of course the patent abstract (#7,146,524) is a little less breathless:

The intelligent distributed file system enables the storing of file data among a plurality of smart storage units which are accessed as a single file system. The intelligent distributed file system utilizes a metadata data structure to track and manage detailed information about each file, including, for example, the device and block locations of the file’s data blocks, to permit different levels of replication and/or redundancy within a single file system, to facilitate the change of redundancy parameters, to provide high-level protection for metadata, to replicate and move data in real-time, and to permit the creation of virtual hot spares among the smart storage units without the need to idle any single smart storage unit in the intelligent distributed file system.

So for three nodes we’d need three copies of OneFS:

OneFS 6000 platform software license for Isilon IQ 6000/6000i product (non-transferable) @ $16,376 = $49,128

So for a mere $49,128 + $717 +$7,609 +$64,233 = $121,687 you’ll have 18 TB of cluster storage. Just $6800 a TB!

The StorageMojo take
Isilon folks, feel free to comment to make any corrections. Yet somehow, this doesn’t feel like the answer to 1 PB, or even 100 TB storage. Why? Well, let’s compare to Sun’s X4500 (Thumper) that is about a quarter of that price. Granted, not clustered, nor as easily managed, yet, it just seems like for really massive data stores, the price should be closer to disk costs.

Comments welcome, as always. Moderation turned on to feed my megalomania. Or to keep spammers at bay.

Brocade Buys Into IP SANs

January 11th, 2007 by Robin Harris in Enterprise, NAS, IP, iSCSI, SAN, FC

Brocade announced it is buying Silverback Systems, makers of a low-cost IP accelerator chip

Silverback has been getting some traction, as I learned when I had dinner with a couple of Silverback worthies at SNW.

My pity turned to respect
When I agreed to meet with them, the thought balloon over my head was “these poor guys - another expensive TCP accelerator that is going nowhere”. But they assured me they had a low-cost accelerator - TCP/IP offload engine or TOE. OK, that is a cool thing, especially with the advent of 10G ethernet.

Fibre Channel won’t disappear, but . . .
Ethernet-based SANs, IMHO, are the larger market based on lower cost and performance that overshoots what most users need. Plus lower management costs. It looked to me that Silverback might have the silver bullet to make ethernet SANs the standard.

So y-n-ell would Brocade want them?
Brocade is the #1 FC switch vendor. TOEs go on host-bus adapters. Why would Brocade want to go in the cut-throat HBA business?

Baseless StorageMojo theorizing
Several scenarios might explain it. Sharpen Occam’s razor.

  • Brocade believes SAN market going 10 Gig E. Doesn’t see itself beating Cisco in E’net switches, decides to go beat up on Qlogic and take their HBA business.
  • Brocade tired of Qlogic trying to commoditize the FC switch business. Moving into HBAs is a shot across Q’s bow. Brocade has all the OEM relationships Qlogic does, so why not?
  • Brocade playing bigger game: believes that end-to-end ethernet SANs are the next big win; Silverback gives them the endpoints; next move, a big honking ethernet switch, add storage features to same, kneecap Cisco in the ethernet SAN business.

Or maybe they just assumed they’d think of something
I like the third option best. Not sure the “add features, create lock-in” model will work again, but I suppose it depends on the features. Any ideas, readers?

Comments welcome of course. Moderation turned on to minimize on-line Cialis prices.

Rackable: the Dell of Next-Gen Storage?

November 14th, 2006 by Robin Harris in Enterprise, Future Tech, NAS, IP, iSCSI

“There is not a lot of added value in commodity ‘storage bricks’”

Commented one feisty StorageMojo.com reader last week. I didn’t agree, but I didn’t have a ready answer, either. But now I do: Rackable Systems.

You may have heard of Rackable for their innovations in packaging systems: efficient DC power; half-depth servers mounted back-to-back with a central “chimney” for cooling; remote management; and a rapidly growing storage business.

What you didn’t know
RACK is profitable. They’ve been growing like a weed, doubling in size each of the last three years and are on track to do it again this year. They have sales of $1.9 million per employee, which is likely close to a Silicon Valley record. All this on gross margins in the low 20’s, just a few points higher than Dell.

It isn’t all hardware either
One of the fastest growing parts of their business is storage and their Terrascale clusters. They claim:

The Terrascale architecture is free of any serialized function that would limit performance scalability. . . . Terrascale software uses a lightweight, linearly scalable, on-demand cache coherency algorithm that guarantees that servers access the correct representation of any data block at any point in time.

I’ve dug into their white paper - which is better than most - called The TerrascaleTM Storage Cluster: A New Paradigm for Parallel I/O to Resilient
Network Storage
. It isn’t clear how they do all their magic from the paper. Nonetheless they are insistent in claiming that the system truly scales to hundreds of nodes. Here’s a precis of what I was able to glean.

Terrascale offers a:

  • Global name space
  • Global lock management service
  • Local cache coherence mechanism

The global name space means all the servers in the cluster see all the same files. The lock management ensures that data is written only when safe. The local cache coherence means that all servers know immediately when data is written, thanks to a write-through cache.

iSCSI to the rescue
Using open source software, Terrascale adds an iSCSI target kernal module that, in concert with a client iSCSI initiator, creates this highly parallel infrastructure. Above the Terrascale layers are standard Linux storage tools such as lvm, while below is the standard TCP/IP stack.

RAID upon RAID
The global name space means that all storage is part of a pool. RACK offers low-cost RAID 5 storage and then use those as virtual disks to create a second layer of RAID across those for greater speed and availability. They claim their storage scales to the limit of aggregate network or storage bandwidth. Need more of either? Buy more and plug it in.

The StorageMojo.com take
RACK is currently focused on the high performance computing market where their pack ‘em dense, stack ‘em high and sell ‘em cheap model is a hit. Yet it won’t take them too much longer before they will have to look to commercial markets for high growth. And there they have an excellent opportunity to make waves for conservative storage vendors - unless EMC or IBM buys them first - with their low margins and aggressive prise/performance. A company to watch and, if you’re in the market for more Mojo, a company to look at buying from.

Comments welcome, as always. Moderation turned on to keep the comment spam under control.

Big Network Storage Cache From Gear6

October 17th, 2006 by Robin Harris in Enterprise, NAS, IP, iSCSI

Gear6 goes semi-public today with their new network storage cache, saying what the product does, but not releasing configuration, pricing and other key details until later. So some of my questions, and probably yours, remain unanswered. Here’s the Cliff Notes version.

Network storage cache cluster
It is a network cache appliance for, initially, NFS filers. Plug it into the same ethernet switch the filers are on, perform some voodoo at the Gear6 management interface to assign filers to be cached, and you now have a really large - how much joy can you afford? - and scalable cache sitting between the servers and the filers.

  • Cache scales to single-digit terabytes
  • Packaged in pizza boxes loaded with RAM - I’d guess 32GB per
  • The boxes are clustered, internally redundant and new ones add on non-disruptively
  • No server software
  • Supports millions of IOPS at microsecond response times
  • NFS only today, iSCSI and FC at some unspecified future date
  • Their goal: Enterprise class, redundant, high-performance network cache

This isn’t a RAM disk, because you don’t need another disk to manage. Gear6 says they have significant IP in the tools that automate the populating of the cache.

Why bother?
Gear6 says their beta customers work across many industries and applications with a few common characteristics:

  • Large data sets
  • Random data access
  • Large-scale concurrency
  • Bursty traffic

If this sounds like you, check out Gear6.

Adding capability to the network
Three months ago I did a once-over-lightly competitive analysis of Gear6 and concluded they were building a

. . . honking-fast Linux-based parallel MIMD FC-SAN non-blocking I/O NAS appliance designed to handle hundreds of thousands of concurrent I/O’s from large numbers of servers, clustered or not. . . . highly scalable, you’ll be able to add processors and network interconnects just as you would in any cluster or grid. [No] custom hardware, preferring to use very smart software, such as the CxC parallel computing language, on commodity clustered servers and interconnects.

So it’s Gig E instead of FC, and millions of IOPS, but overall I ‘d give myself a B+ or even an A- for that prediction.

The StorageMojo.com take
Blowing functionality across the network is a long-term secular trend. By focusing on building an NFS SAN cache appliance I believe Gear6 has picked a niche ripe for change. Depending on their pricing and management mojo it may make sense for F1000 customers to start buying really cheap filers with minimal cache and let the Gear6 cache cluster do the heavy lifting.

Comments welcome, of course. Moderation is turned on to control comment spam, but no registration required.

Coolest New Companies At Datacenter Ventures

September 22nd, 2006 by Robin Harris in Backup, Enterprise, Future Tech, NAS, IP, iSCSI

I finally finished this post so I’ve superseded the first version with this one and given it a new title. I hope this doesn’t violate the mores of blogdom.
In flipping through my notes I accidentally ignored Appistry so I’ve added them to the honorable mention list in this update.

Dateline: Silicon Valley
Boy, where to start? Hardware? Software? VC wetware?

There were some 60-odd presenting companies, ranging from tiny startups looking to raise their first VC dime to companies that have already raised tens of millions, have customers, and aren’t looking for money. The latter had smiling VCs pushing business cards in their faces. The guys who wanted money were mostly greeted by silent, stony-faced wraiths. Tough crowd. There were usually eight presentations in parallel so I didn’t get to see every preso or even most. What follows are my impressions of what I did see.

Rumors
Might as well get the juicy stuff out of the way first. Biggest rumor is that Sun’s newly installed network storage chief, David Yen, is looking to jump ship. I wouldn’t blame him - what did he do to deserve that mess? He’s a seasoned senior manager who knows how to get product out the door so I’m sure any number of firms would be very happy to have him. I’d imagine that David is thinking “why wait a few years and leave under a cloud?”

Lots of speculation about Cisco’s Nuova investment. Conventional Wisdom holds that they are looking to bust out of the network market with something that will take on the hyper-scale cluster market and the big server incumbents. If they want to keep growing faster than the network market Cisco needs to do something creative.

Cool Ideas
A couple of people commented that network security is evolving from attempting to lock everything down inside the data center to using the WAN gateway as the security choke point. Seems about right.

Dave Donatelli of EMC, who heads the array business, got through his entire presentation without once mentioning ILM. Good going, Dave! EMC finally may be accepting reality on ILM.

Michael Workman of Pillar noted that the rapidly growing markets in China and India can’t afford big-iron storage. Another nail in that coffin. But is Pillar’s stuff really that much more cost-effective?

Richard Villars of IDC noted that unstructured data is growing much faster than structured data, and advanced the idea of the storage depot, big slow storage for long-term file storage. Not an archive, just infrequently searched for data.

Greg McAdoo of Sequoia predicted that the next 5-10 years of storage will be driven by consumers rather than enterprises. And he noted something you may have noticed: the tape backup model is irrevocably broken because access times are just too long.

Somebody made the provocative suggestion that VMware style server virtualization is a dead-end, because the bigger data center problem is not slicing one server into many but making many servers look like one. That feels right to me. What do you think?

Cool Stuff: Honorable Mentions
Cleversafe is sounding cooler, and I already thought it was pretty cool. The open source strategy, the encouragement of third-party developers, the security, the scalability, sounds like a pretty compelling value proposition. They cut through a lot of enterprise worry points: vendor lock in, encryption problems, cost. If you haven’t checked them out, do so now.

Qlayer is focused on what they call “commercial data centers”. Hosting companies, portals, large e-businesses. They mention “Google-like” on their not-very-clear website, but are more modest than Google in one respect: they virtualize a rack of commodity pizzabox servers instead of a whole datacenter. Qlayer is setting up their headquarters in the US, keeping R&D in Europe, a smart move. Their CEO has already founded and sold two companies, so I trust he’ll keep his eye on the ball, producing useful stuff for the real world.

Njini is a company I’ve mentioned before. I didn’t catch the pitch - too many good companies got scheduled together - but I met with CEO David Jones Friday morning to hear the story directly. I’d compared Njini to what, in concept, to what I thought Abrevity was doing - but after seeing Abrevity’s preso at DV I am more confused about them than ever. Njini’s engine puts a wrapper around files, effectively extending their metadata, to automate unstructured data management. They are relaunching their website in a couple of weeks and I look forward to learning more.

Cassatt and BEA founder Bill Coleman presented their product, Collage. Bill described Collage an infrastructure for service level automation. For example, the last day of the quarter you want incoming orders given the highest priority. You give that policy to Collage and it ensures that all the relevant systems have the resources they need to get the job done. There are more than a couple of details I’m curious about, such as how does Collage know what systems order processing relies on, that could make or break this in production. Yet Bill gives a great pitch, and when he allowed that the board of Cassatt had just approved opening a round for $10-$20 million, the VCs started whipping out their business cards. I have several Cassatt white papers and I hope to delve deeper into their approach Real Soon Now.

PointofData presented their Active Information Platform. This software enables companies to search all their structured and unstructured data quickly. The software produces small indices (~10% the size of common index algorithms) that include not only the content but also the content’s modification history. Since the indices are small, the searches are fast, and, they claim, more complete than searching the various data silos themselves. They used the example of Enron email where they said they found 30% more query hits than the costly legal software the lawyers employed in discovery.

Appistry’s CEO gave an informative talk that, I think, gave away a little more than planned about the “how” of Appistry’s technology. Appistry provides software that enables “Application Solution Fabrics” that

. . . enables customers to dramatically reduce the cost, time and complexity of running large-scale, time-critical applications. In doing so, Appistry empowers customers to easily field strategic applications fine-tuned to meet the highly differentiated needs of their enterprises.

So my question was, how do you take an app written for a single CPU and make it scale across a cluster. “Cluster-awareness” is usually non-trivial. What they do, apparently, is insert some code at appropriate breakpoints that speaks to their fabric layer. The fabric then enables multiple instances of each section of the code for parallel processing. Their fabric layer must keep track of each job going through the cluster.

It sounds a little scary to add code to enterprise apps, but Kevin Haar, the CEO, told of going into a major shipping company on a Tuesday morning, clusterizing their crucial routing app that day, and showing to the executive team on Wednesday morning. So it is work, but not a lot. Obviously you have to own the source code, so to me it seems best for custom legacy apps that are likely to be at the core of a company’s competency.

Coolest Hardware & Software @ Datacenter Ventures
So with all the cool stuff mentioned above, what impressed me the most?

Coolest Hardware: Woven Systems
Woven Systems is developing a low latency, high port count and low cost 10 Gigabit Ethernet switch. Combining a latency almost as good as Infiniband with Ethernet’s universal support and high volume components, Woven is working to ship the universal backplane for hyper-scale cluster apps. Create a mesh with these switches and you’ve got a powerful and fast fabric for building utility-class infrastructures. Combine with Coraid’s AoE architecture and you’ve got a screaming fast and really low-cost hyper-scale storage fabric. It won’t blow FC out of the water day one, but it will certainly cause a lot of people to think twice about new FC installs.

So won’t Cisco just mop up the floor with them? Good question. My take is that Cisco, like EMC in storage, is heavily invested in their “value-add” to basic network switches. They could build the product, but would their commission sales force actually sell it? As a startup, Woven is nuisance level. But if Cisco offered the same product they could cannibalize a lot of current high margin business. Targeting such niches is how little companies get a toehold in a competitive market.

Coolest Software: Zmanda
An open-source backup product as coolest software? Have I lost my mind? Hey, who said I had it in the first place? Seriously, Zmanda is in the right place at the right time. Backup is the most widely used storage software. So why are people voluntarily locking up their backups in proprietary, overpriced, cash-cow software that forces them to buy back their data every few years with maintenance fees and minimal “upgrades”?

Zmanda is based on Amanda, the 15 year old open source project devoted to data protection. Zmanda sells subscriptions to support services, not the software, so the pricing is really competitive compared to Netbackup or other proprietary backup products. The Zmanda server runs on Linux with clients for Mac OS X, Windows, Solarism, Linux and a bunch Unix systems. Very cool feature: you don’t even need the application to read a Zmanda backup tape or disk. Now that is investment protection that protects your data investment, not just the vendor’s revenue stream.

Need to do more with less? Zmanda is a great place to start. Selling to the SMB market? Zmanda’s pricing will make you a hero to your customers.

The StorageMojo.com Take
While the VCs seemed pretty downbeat, I saw a lot to like at DV this year. The key takeaway: a bunch of people working on making computing more scalable and cheaper than ever using the lessons from internet data centers. From past experience we know that will open up exciting new applications and build some big companies. Put your sunglasses on because the future is so bright.

Comments welcome including disagreement, amplification or interpretation.

Bits & Pieces: Network Storage Of Tomorrow

August 22nd, 2006 by Robin Harris in Backup, Enterprise, Future Tech, NAS, IP, iSCSI

Cleversafe, Again
The New York Times has a readable article about Cleversafe. StorageMojo.com commented on Cleversafe in June and July (see Cleversafe: Yet Another Online Storage Startup and Coolest Remote Data Services).

The money quote:

The Cleversafe design could lead to a communal Internet storage system that Mr. Patterson called “hippie storage.” The idea is similar to SETI@Home, the shared computing system that allows PC users to contribute idle time on their machines to create a distributed supercomputer.

It sounds a lot more like BitTorrent when described like that. The difference, that the stored bits aren’t readable by themselves so the data is secure, is the key to creating a private resource from a public network.

The Network Is The Storage
The article also gets into the issue of the impact that broadband internet is having on storage: internet-enabled distributed file systems; web storage services; efficient secure storage. The problem with all these schemes is that network bandwidth is so costly compared to storage capacity. Amazon’s S3 for example, charges you almost 3x to upload and retrieve a gigabyte as it does to store it (S3 charges $0.15/GB/per month for storage, and $0.20/GB for bandwidth.)

Gilder’s Fever Dream Remains Just That
George Gilder, the late ’90’s prophet of the Telecosm, foresaw a world where network bandwidth would be both plentiful and cheap. A vision not so different from the atomic energy visionaries of the 1950’s who spoke confidently of energy too cheap to meter. Alas, both were wrong. Networks are expensive compared to local access and always will be. While increases in network bandwidth and speed allow networks to do more every year, their growth rate is far exceeded by the growth of stored data. The network tail does not wag the storage dog - no matter how long the tail is.

Actually, Storage Is The Network
Storage and networks have long been recognized as partial substitutes for each other: caching substitutes storage for bandwidth and access time, whether it is an L2 cache on a CPU or Akamai’s content delivery network storing multiple copies across the web. We use networks to connect pools of storage and skim off the most valuable content. Using broadband networks for massive storage is one of those intriguing theoretical what-ifs that will remain forever just beyond our grasp.

Cleversafe May Have Accidentally Designed Something Great
And not a safe backup infrastructure, either. They may have designed the next generation of storage array. Not RAID anything, nothing encrypted, yet safer and more reliable than any existing array. Data parceled out across hundreds of disks, so no hotspots; lots of spindles for I/O, no single disk drive, or even several, containing reconstructible data; perhaps riding on a cheap, fast network storage protocol like AoE.

Sure, hitching up Cleversafe with a backup data compression appliance on the front end would answer some of those network bandwidth issues. But the real win could be in the data center, where a secure, high-performance infrastructure could be built out of standard components.

Cleversafe has an open-source component. Why not?

As always, comments welcome.

An Open-Source SAN

August 17th, 2006 by Robin Harris in NAS, IP, iSCSI, SAN, FC, SOHO/SMB

Update Over at TechRepublic, Scott Lowe offers another view of AoE here. If I were an SMB VAR, I’d be checking AoE out.

It Is About Time
Here’s a potential game-changer - especially for the SMB market. It is low-cost SAN functionality based on local Ethernet. From a company named Coraid. Available for Windows, Linux, Solaris, FreeBSD and Mac OS X.

Wait a minute? Isn’t that iSCSI? It is a block device after all. Nope. Different. IMHO, better. There are some Don’t Gets, and a lot of Don’t Needs.

Putting Local - And Storage - Back In LAN
Coraid’s innovation is the open ATA over Ethernet (AoE) protocol. The big Don’t Get is that the protocol isn’t routable - it is strictly local - no IP involved. So the Don’t Needs include no TCP/IP overhead, no TCP/IP offload engines, no CPU-cycle sucking and latency-inducing TCP/IP stacks. AoE sits right on the data link layer - level two - of the ISO network model, so with a switched LAN - is there any other kind these days? - you get very low latency and full network bandwidth across a low-cost, industry standard LAN.

The other big Don’t Get: expensive and finicky Fibre Channel HBAs, switches and storage, along with the extra bandwidth FC offers. Like FC, AoE appears to make very effective use of available bandwidth - maxing it out with storage traffic. You’ll want a dedicated storage network to run AoE across.

Practice Makes Perfect
Even though it is cleared for use with Oracle, it probably isn’t a solution, today, for habitually late adopters. You’ll need to think through your security and system management processes to ensure that data doesn’t get munged by an inattentive sysadmin. A dedicated AoE SAN is a start, and VLAN techniques can help partition off potential damage-doers. The key: it just looks like a disk, and anything goofy you can do to a disk you can do over AoE.

Write Once, Read Never?
So far it appears that Coraid is the only company building AoE hardware. It doesn’t appear they are trying to keep anyone else from doing it, only it just hasn’t happened yet. That might be a worry for some folks. So in a smart move, Coraid has a Linux tool called srcat a tool for recovering data from the raw disks on a Coraid JBOD or array. So if the company goes belly up, controller breaks, no replacements available, you can still pull the drives and use srcat to pull the data off. Neat.

StorageMojo.com Take
Congrats to Coraid for a creative way to bring the benefits of network economics to storage networks, just as some of us thought FC would 10 years ago. By creating an open platform and protocol, they’ve started the open-source equivalent of a SAN. If you require - or would like to be able to afford - a lot of storage capacity, you should certainly check these guys out.

Update: Over at Tech Republic, Scott Lowe offers some more info on AoE. The (literal) money quote:

AoE is cheap! An array capable of supporting up to 11.25 TB from Coraid starts at less than $4,000 without disks. Today’s price for a 750-GB disk at NewEgg.com is $400 and the unit supports 15 disks. So, for less than 10 grand, you can get 11.25 TB of shared block storage. If you do the math, that runs at about $888/TB or $0.87/GB. Not bad!

Brocade Buys McData: Yawn.

August 11th, 2006 by Robin Harris in Enterprise, NAS, IP, iSCSI, SAN, FC

From the Too-Little, Too-Late Department
Eyes glazed over at the news that Brocade is buying McData. The Wall Street Journal reported (subscription required), that Brocade CEO Mike Klayko told analysts that “customers are frustrated by equipment that doesn’t work together well.”

Well, duh. Network equipment? Double duh.

The Wages Of Sin
Fibre Channel has never fulfilled its early promise. Partly because it isn’t quite a network - it’s a channel - and mostly because everyone imported storage business tactics. The chief tactic: minimal interoperability with other storage vendors to ensure lock-in.

The problem is that applied to a network, vendor lock-in means you don’t get the advantages of network economics. In a nutshell, the value of the network increases as it grows while the cost of connecting drops. That is why all networks get linked: the interconnection cost cheap compared to what has already been spent, while the benefits are huge. Have you heard of Cisco?

Virtuous Cycle Of Network Economics
A single telephone is worthless. Two connected telephones is more valuable. A billion connected telephones is invaluable. And due to learning curve effects the cost of that billionth telephone is much lower than the first.

Fibre Channel Inflection Point
Consolidation usually occurs in maturing industries, as it has in disk drives, for one or more of several reasons, such as increasing capital intensity (semiconductors), economies of scale (automobiles), or acquiring customers (soft drinks). In this case though it is happening because Fibre Channel is beginning a long decline.

Customers have seen an anemic ROI for their billions in FC investment. Without network economics, FC cannot compete with Ethernet over the long term. And now the long term has arrived.

Can This Technology Be Saved?
Not likely. Ultimately, it is the folks that connect to the network who must decide that compatibility is in their interest. Remember IBM’s very silly anti-Ethernet Token Ring network? IBM pushed it hard and lots of their most trusting customers bought it, only to face a painful migration a few years later. That is how you turn trusting customers into suspicious customers.

Storage vendors do not believe in interoperability, do not support it, and have no interest in encouraging mixed vendor FC infrastructures. So design and management is unnecessarily painful and expensive.

On the ethernet/IP/iSCSI side of the house however, compatibility with the network is the only option. Network and semiconductor economics are implacable. In ten years, Fibre Channel will be one of those legacy technologies used only where niche economics or customer sentiment dictate.

New Clustered NAS Head From Crosswalk

August 2nd, 2006 by Robin Harris in Enterprise, Future Tech, NAS, IP, iSCSI

Spent an hour talking to Mark Stratton, a VP of startup Crosswalk, about their new product, the iGrid 5100 Intelligent Storage Grid series. So what is an iGrid and why should you care? Initially, Crosswalk is aiming at the HPC market, yet longer term they have designs on the higher-end NAS market. They may have the industry’s first scalable NAS solution, good news for every large user of NAS.

What iGrid Does
Crosswalk’s new system provides a NAS (NFS, CIFS) front end to standard storage arrays. The front end is a cluster, so it scales horizontally, while also providing cluster levels of availability, redundancy and performance.

Each of the cluster nodes presents the entire capacity of the backing storage to the servers, so it doesn’t matter which node a server deals with. The nodes create file systems to present to the servers, virtualizing the backing storage. Crosswalk’s software handles the back end volume management, provides snapshots and interfaces to NetVault’s VDL software for backup. Since it is a cluster a single node can back up the entire storage pool.

There is no Crosswalk server-side software.

iGrid Architecture
When I spoke to Mark I was mostly interested in the architecture of the product. I spent several years at YottaYotta, which also built a clustered RAID controller with a network backplane, so I have some experience in this area.

Each cluster node is a 3U box with up to 8 front side gigabit ethernet ports and up to 8 backside 2 Gb FC ports. Each node is a quad-processor with 16 GB of cache, expandable to 32 GB. There are also multiple gigabit ethernet ports for inter-node communication. Some might question the use of gig-E as the cluster interconnect, but in my experience passing metadata, addresses and lock management just aren’t that taxing. There is no custom silicon.

They currently support an eight node cluster with plans to increase that number to 256 over time. They also plan to offer block services as well, with iSCSI support.

The nodes expect nothing more than mid-range active-active Fibre Channel RAID arrays on the back end. Thanks to the cache-heavy nodes, Crosswalk believes they can get great performance from RAID 5.

iGrid Management
The iGrid admin sees all the backside LUNs and collects them into volumes upon which Crosswalk’s software places filesystems. New file system creation is quick - as fast as you can click through the GUI. There are some preference setting options to simplify setting up new file systems and adding arrays to the storage pool. There is more than I’m relaying here, but these are the basics.

iGrid Futures
In addition to scaling up to 256 nodes and block services, Crosswalk has some other intriguing possibilities. Since their product is software based, they could use much lower cost servers to drive their entry pricing down, while expansion to 256 nodes would preserve their growth path. They could also lose the back end FC, in favor of iSCSI or SSA, further driving our cost and increasing flexibility. Since their internode communication is over standard gig-E, they could also geographically disperse their nodes over a MAN and tap the high-availability market. They might also port a (future) cluster-aware ZFS and allow customers to lose the expensive and latency inducing RAID controller layer altogether.

The StorageMojo take
Crosswalk, founded by Jack McDonnell, who had good success with McData, with CTO Raju Bopardikar, formerly of ill-fated Cereva, certainly has the bones for success. They’ve done a number of important things right: no host software; no custom silicon; commodity hardware; partnering where possible; horizontal scaling. This puts them ahead of getting-long-in-the-tooth startup BlueArc.

The High Performance Computing (HPC) focus is questionable. My experience is that folks who start with HPC stay there, because each HPC customer has so many interesting requirements that engineers love to solve and that will never make a dime for the company. Performance-driven customers ask for all kinds of enhancements that most commercial customers will never notice. So I wish them luck expanding past that market.

Another concern: the Denver location. STK culture - mainframe, big iron, slow to adapt - looms so large in storage circles there that there really haven’t been many successful storage startups. Jack overcame that at McData, although you might recall that McData sold mainframe ESCON directors to IBM for years before getting into, and largely outmaneuvered in, the Fibre Channel market. Does Crosswalk really want to go after the big NetApp and EMC NAS boxes?

Crosswalk has the potential to upset the current NAS players. Yet I think they’ll need a stronger cost argument in addition to cool technology. Fortunately their architecture gives them lots of options. I wish them luck.

Open Source FreeNAS Project

July 21st, 2006 by Robin Harris in Enterprise, NAS, IP, iSCSI

The Storage Forum provides a brief review of the open source FreeNAS project. FreeNAS is a small (less than 16MB) operating system based on FreeBSD 6 that provides free Network-Attached Storage services: CIFS, FTP and NFS.

The money quote:

Testing I did in my lab with FreeNAS on a high end Xeon with SATA disks simply screamed. There’s just no other way to describe it. I ran my throughput tests five times before I was sure I hadn’t made some fundamental mistake with the math. I saw $75k performance from a $5k box running a beta NAS software platform.

[bolding added]

Help Keep Vendors Working For Your Business
Open source software is exerting enormous pressure on commercial vendors to keep innovating technically and commercially. Look at the progress Microsoft has made with Windows Server, scaling it up to compete Apache and Linux, and compare that to the glacial pace of Internet Explorer until Firefox came along. Vendors are businesses - if they don’t have a business reason to invest they won’t.

Which is why open source storage is so important. Whether it is FreeNAS or ZFS, open source can either drive cost out of commercial products or force them to improve. There is a business case that any company with an IT budget over $1 million a year should set aside 2% for investing in and supporting the open source alternative to their most important platform. This 2% For Open investment will pay for itself in lower products costs, improved functionality and a stronger bargaining position.

Lend A Hand
Those of you with technical skills might want to help. And those of you who supply low-end NAS to clients might want to look at using it - and helping.

Sun’s X4500 Architecture

July 19th, 2006 by Robin Harris in Enterprise, NAS, IP, iSCSI

The biggest knock against the X4500 I’ve heard is that it is too expensive. From a storage perspective it is actually absurdly cheap compared to the 5-10x charged by the name-brand storage vendors - and for lower data integrity than the X4500/ZFS system offers.

Considered as a server it is a different story. After all, the most popular servers are in the $1k-$5k range, so why not just glue the disks on with some cheap PCI-X adapters and be done with it?

Several reasons, as this brief post at c0t0d0s0.org notes. He points out some of the trade-offs that low-cost servers make in order to meet their price points, like bottlenecked architectures that offer connectivity at the expense of performance.

Sun’s server group should get some benchmarks out pronto of the X4500’s iSCSI and NFS performance, along with some TPC numbers. That is risky, since the storage group might hijack it out from under the server group. Yet the storage folks have their hands full with staff turnover - always a problem, but much worse right now - and the struggling STK integration effort. People need to see what this machine is capable off, sooner rather than later.

Start-up Watch: Gear6 and Njini

July 18th, 2006 by Robin Harris in Enterprise, Future Tech, NAS, IP, iSCSI

A couple of startup funding announcements caught my eye yesterday, and when I looked a little further I found them both intriguing - but maybe not for the reasons investors would like.

Gear6
is focused on the Server-Storage Performance Gap, which hasn’t been top-of-mind with too many folks I know. Gear6 is being coy about their product - announcement due in October - but after some snooping around I’m pretty confident about the outlines.

Their solution is a honking-fast Linux-based parallel MIMD FC-SAN non-blocking I/O NAS appliance designed to handle hundreds of thousands of concurrent I/O’s from large numbers of servers, clustered or not. Designed to be highly scalable, you’ll be able to add processors and network interconnects just as you would in any cluster or grid. They don’t seem to have any custom hardware, preferring to use very smart software, such as the CxC parallel computing language, on commodity clustered servers and interconnects.

So it could be very cool. Commodity hardware, smart software, really scalable. The first Google-like infrastructure product for the enterprise?

On the other hand, they could definitely use some help putting together a more compelling story. The Server-Storage Performance Gap began about two weeks after the disk drive was invented. We’ve survived it this long because data is getting cooler, RAM is getting cheaper and software is getting smarter.

And, of course, there’s the problem of actually making it work: scalable, high-performance cluster apps are very hard. My heart is with the Gear6 folks and I hope they do well; my head needs more data.

Njini: Tag, You’re It
Njini seems to be a meta-data extension product not terribly different than what Abrevity has been doing for a couple of years. The theory is that by adding meta-data to a file at or near creation time, you can do all sorts of money and time saving things, like reducing the number of stored copies or managing the IT investment in the data over time. Njini has several apps that use the extended meta-data for management.

It’s a good theory. While tagging might be seen as an alternative to search, I think they are complementary. I just don’t know what will stop extended meta-data from being built into filesystems, as it has with Mac OS X HFS+. That could take years, so Njini has some runway. In the long run, tagging is a feature.

In the short run it could be a good business. Good luck, Njini.

Ray Ozzie’s Question: Are You Ready?

June 16th, 2006 by Robin Harris in Enterprise, Future Tech, NAS, IP, iSCSI

What about the data center investments of Yahoo, Google and Microsoft? Microsoft has commited to spend over $2.5B on capital and acquisitions - much of it to compete with Google. Yahoo will also likely spend about $1B just on computers and equipment. Google will spend over $1.5B this year year on plant and equipment - more than their past three years spending combined! Call it $4 billion+ in greenfield IT spend from just these three companies. In one year.

Big numbers. Maybe a million new servers. Hundreds of petabytes of storage.

Enormous economies of scale.

Finally Someone Asks The Obvious Question:
Ray Ozzie, Microsoft’s CTO, dropped the other shoe at a recent Microsoft conference when he asked listeners to consider

“How might IT organizations ultimately take advantage of these data center investments? Might they be used to augment your infrastructure to take advantage of our scale?” he asked. He also asked . . . how the infrastructure investments could help IT departments decrease the complexity and increase the agility of their own organizations.

Here’s another question Ray could have asked:

“Storage vendors - since you are such a big piece of IT spending - how will you respond to this threat to your business model?”

Remember, Microsoft, Google and Yahoo aren’t spending a nickel of that $5B on EMC, Hitachi, IBM or Sun storage (NetApp has Yahoo, but for how long?). You’ve already lost that business.

Now, What About the F1000 business?
Willful obliviousness will be in style this fall in Hopkinton, Santa Clara, and San Jose. At your next CIO confab, ask if, given the security issues of accessing and storing critical corporate data on third party systems, who will be moving apps off the big iron to the storage cloud in the sky. Nod sagely at their horrified reaction. Tell yourself and everyone else just how wonderful business will be.

There are several scenarios on how this may play out. But of this much I am certain. Security issues will either get solved or get managed. Business units, not CIOs, drive corporate IT spending. Infrastructure with huge cost advantages is fertile ground for new killer apps. Millions of smart people are working late into the night trying to do cool things. Billions in advertising dollars are moving to the web.

Money, technology, competition, innovation - this very yeasty mix will rise, and a big element will be their cost advantage over traditional IT. Right now some soon-to-be dropout or a couple of PhD candidates could be putting the finishing touches on the Next Big Thing, something that will also finish the storage business as we know it.

Vendors, if you have any strategic thinkers worthy of the name, they better get cracking.



« Previous ArticleNext Article »
StorageMojo RSS Feed May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006