Inside Skinny On Isilon

by Robin Harris on Wednesday, 21 February, 2007

Already did the outside skinny
I talked to Sujal Patel, founder and CTO of Isilon, last week to learn more about Isilon. As I’d mentioned in my posts on the company I was surprised that there wasn’t more technical information about the products on their website. On the other hand, I believe that may be a wise decision, since my experience is that if you talk too much about the “how” prospects tend to forget about the “what”. And the “what” is what sells.

But I like the “how”!
So Sujal agreed to a bit about the how and to respond to some concerns about the Isilon architecture with me. I also got some business info as well. I’m supplementing Sujal’s comments with a bit of information from SEC filings and RBC Capital Markets research.

Business first
Isilon today has over 300 customers, 88 added just last quarter. They’ve been recruiting resellers heavily, which makes sense to me since with such an easy-to-manage product they don’t need the costly SE hand holding that most SANs requires.

Rain city veterans
Sujal started Isilon in Seattle about the same time as YottaYotta, where I ran marketing for several years. 2000 was a tumultuous year: Gilderesque predictions of bandwidth nirvana; rampant easy money delirium; overheated visions of massive internet investment. That both are still around is a minor miracle. Sujal said that for Isilon, remarkably, the goals have stayed the same. The product intended for media servers and other large-file, mostly sequential workloads. They kept at it and the business developed. They didn’t allow themselves to get distracted by the uproar of the dot-bomb crash, stayed focused, and the rest is history.

One measure of their success: 50% of their business is from repeat customers. A couple of very large customers – Kodak and Comcast – account for less than 20% of their sales, but as they bring on new customers their importance is declining.

Down is the new up
Their move downmarket, with the new IQ200, a 3-node cluster listing for less than $40K was, Sujal reports, driven by the channels focus. Not only does that not surprise me, but I’d guess there will be an under $20k IQ100 late this year. Three node VAXclusters were the most popular in the 1980s due to the functional redundancy. Lose one, still have two-thirds of your processing power and some redundancy. I don’t think that dynamic has changed.

Isilon also has very good gross margins – over 50%. Not as good as the products I’ve marketed, but hey, they’re young. I love to see vendors making good margins while providing good value: they’ll be around. Buy with confidence.

Now for the technology
Isilon’s technology is more complicated than the Google File System, which is my favorite model of a bare bones, low-cost, scale-out cluster storage system. Isilon’s system is fully distributed, which is more elegant than Google’s master/server architecture, yet adds overhead. Isilon supports RAID5-like functionality, Google nothing but file replication.

With Isilon you can choose n+1, n+2, n+3, n+4 redundancy on a per-file basis at a cost of 20% increments of capacity usage. If a node fails, the system uses its Virtual Spares to maintain the requested redundancy.

Linear, let’s get linear
A fully distributed system needs to do a lot of message passing to keep everyone coordinated. These messages are small, but their latency is a problem. That is why their no-cost adoption of Infiniband is smart: the under 100 ns latency is a real performance boost. They handle subnet management in their software, so customers don’t need to be Infiniband gurus to get it working.

Sujal says that messaging traffic grows linearly as you add nodes. There seems to be a couple of reasons for that. First, even in a large cluster there are typically no more than 16 nodes involved in any one I/0. Second, Isilon uses three meta-data “authorities” for each file. So while the data may be spread from here to eternity, the metadata isn’t, reducing coordinating traffic required to handle updates and cache invalidation.

Another strategy for reducing latency in a fully distributed system: NVRAM. Isilon uses battery-backed NVRAM cards to eliminate disk writing latency. Each node can safely acknowledge a write in microseconds instead of waiting milliseconds for a disk I/O to complete. Sweet.

96 nodes to the tune of “96 Tears”
Isilon currently has a 96 node limitation, which I found interesting because VAXclusters also topped out at 96 nodes. Made me wonder if there was something mystical about the number 96. Sujal says no, it is a testing limitation. They started with 12 nodes and have gradually raised the number to 15, 32, 35, 88, and now 96 nodes. It takes a lot of space and energy to set up to test that many nodes, even if you figure out the financial implications – would you want to sell 96 nodes as used equipment?

The StorageMojo take
Isilon has cool technology and a lead in a new market segment. They’re focused, smart, funded and growing fast. Are they invulnerable? Not even close. But the big iron vendors need to think seriously about the meaning of disruptive technology.

Isilon might be an object lesson.

Comments welcome, as always. Moderation enabled because moderation is a virtue, except in the defense of liberty.

{ 1 trackback }

Selling IT Wall of Shame: Isilon Systems « ScottRu
Wednesday, 5 September, 2007 at 5:23 pm

{ 4 comments… read them below or add one }

Javier Thursday, 22 February, 2007 at 4:02 am

Hi,

I’ve been an Isilon cluster for almost a year, we have been very happy with the product. We only have a small 4 node 3000i system (12 TB raw, 9 usable) and we work with big media files (a hour of video is 25GB), we get around 1.2Gbps write speed and almost the same on read speed, of course, all is secuential and mostly over FTP or Samba.

Javier

Richard Thursday, 22 February, 2007 at 5:47 am

The info is still very ‘skinny’, but at least it confirms that “the product is intended for media servers and other large-file, mostly sequential workloads”.

At … http://biz.yahoo.com/prnews/070207/sfw090.html?.v=83 …they don’t seem to be too shy in stating that they can deliver “10 gigabytes per second of performance in a single file system and single volume”.

Anything is possible under “reads”, given enough nodes with correct placement of data …. there is no traffic across their coherency switch.

It may be “wise” and easy for them to substantiate this in simple terms….how many nodes… is it purely ‘read’ level performance, etc .

“Sujal says that messaging traffic grows linearly as you add nodes”.

This may be true for meta-data only, residing in just three places … but what happens to data traffic…. especially under partial and ‘unaligned’ RAID 5 writes … or during reconstructs under R5 reads..?

You report ..
“Their move downmarket, with the new IQ200, a 3-node cluster listing for less than $40K ”. I really question how much ‘downmarket’ this is…. including how they arrive at the stated 50% margin.

Three x 1U chassis, motherboard, single power supply with Ethernet switching. Hardware cost much like 3 x 1U single processor servers …. plus 12 SATA disks at $300 each . .. . and no ‘free’ IB gear is involved here.

This is a long way from a ‘commodity’ cluster …. end of comment.

Blake Thursday, 22 February, 2007 at 2:08 pm

Same here. I’ve been looking at their product for over a year, but finally bought some about 3 months ago. My 20 node cluster (14 acclerators and 6 1920′s) maxed out at 2.2 GB / sec (that’s gigabytes). I saved that picture to show other “clustered” storage vendors what they are up against.

Richard Thursday, 22 February, 2007 at 8:47 pm

Blake,
This is a good start to some simple numbers. …thank you.

It looks like you are getting 30-50% of the available bandwidth, i.e 30MB per disk.
I am assuming that this is over 20 Ethernet host channels, i.e. 55 MB per 1 Gbit channel.

1. Is this under ‘read’ or a mixture of read/writes?. What is your write speed ?.
2. Are you running RAID 5 and / or mirrors …if so how many mirrors?
3. What is the ‘useable’ capacity across your six x 1920 enclosures ?

A comment regarding the layout of data and the number of read/write ‘streams’ would help.

Leave a Comment

Previous post:

Next post: