Unexpectedly, this has turned into Isilon Week here at the StorageMojo. I think everyone is excited by Isilon’s successful IPO, the first, I hope, of many for other storage startups.

I’ve already commented on Isilon’s surprisingly uninformative website. They have cool technology, so you’d think they’d want to talk about it. Maybe not talking about technology is how one has a successful IPO these days.

But wait! There’s more!
However, Isilon had a patent granted December 5, 2006 to inventors Sujal M. Patel (company founder), Paul A. Mikesell, Darren P. Schack and Aaron J. Passey. And wonder of wonders: the patent is surprisingly readable! If you’ve read many patents, most of them read like architecture papers rendered into insurance company legalese. The Isilon patent isn’t. It is still 15 pages of fine print, broken up by the USPTO’s really weird online publishing protocol, with the modules that make up the system dissected into an out-of-order presentation. But compared to most patents it is a paragon of clarity, even though a really dumb error crept in – see later in this post.

Some friends showed up; the 3-2-1 Margaritas started flowing: See you tomorrow!

OK, so it’s the day after tomorrow – and the search for storage nirvana continues . . . .

One caveat: I’m using the patent rather than a technical paper on the actual product to explicate Isilon’s architecture. Patents are typically written to embody a lot more functionality than the first gen products whose IP they are protecting. So what I’m describing here may or may not be part of Isilon’s shipping products. That said, my gut tells me that while there may be features that haven’t been implemented, the patent is, in fact, illustrative of the Isilon architecture. Isilon guys are welcome to chime in and correct any misperceptions. I see Isilon folks visiting regularly, so don’t be shy. Sujal?

Further, the patent actually covers what they call a “virtual hot spare”, but it seems to describe most of their system.

The Isilon layer cake recipe:
The core of Isilon’s offering is supposed to be the Intelligent File System (IFS). Using a standard NAS protocol, the user requests a file. That request goes to Isilon’s Linux-based server, where the kernal space Virtual File System receives the request. The VFS maintains a buffer cache that stores metadata generated by the lower layers of the IFS. The VFS layer talks to the Local File System layer, which

. . . maintains the hierarchical naming system of the file system and sends directory and filename requests to the layer below, the Local File Store layer. The Local File System layer handles metadata data structure lookup and management.

The Local File System layer speaks, in turn, to the Local File Store layer – don’t worry, the quiz will be open-book – which translates the logical data request to a specific block request. That request goes to the Storage Device layer, which hosts the disk driver.

That’s the description of the IFS. Notice anything missing?

Right!
Nothing coordinates IFS across the cluster. That piece is handled, according to the patent’s tortured taxonomy, by the Smart Storage Units. Which could be running their functionality in hardware, firmware or software. See what I mean about patent language? And this is a readable one!

Modular Smart Storage
The Smart Storage Unit (SSA) consists of a management module, a processing module, a cache, a stack and a storage device. The management module does about what you’d expect, monitoring and error logging and such.

The real work gets done in the processing module which consists of another set of modules:

  • Block allocation manager
  • Block cache module
  • Local block manager
  • Remote block manager
  • Block device module

Here’s a description of each:

  • Block allocation manager consists of three submodules
    • Block Request Translator Module receives incoming READ requests, performs name lookups, locates the appropriate devices, and pulls the data from the device. The module sends a data request to the local or remote block manager module depending on whether the block of data is stored locally or remotely in another smart storage unit. It can also respond to device failures by requesting parity data to rebuild lost data.
    • Forward Allocator Module (FAM) allocates device blocks for a writes based upon redundancy, capacity and performance. It receives statistics from other SSUs and uses those statistics to optimize new data distribution. The statistics include measurements of CPU utilization, network utilization and disk utilization. It also receives latency information from remote block managers and may underutilizing slow SSUs, if possible, based on the redundancy settings. Latency is logged and reported and reasons for slow performance might include bad network cards or a device being hammered by demand.

      A variety of strategies are used to allocate the data, such as striping data across multiple SSUs. The file system handles the striping so disks of different sizes and performance can be used. The module looks up the root metadata data structure for disk device information and calculates the number of smart storage units across which the file data should be spread using performance or other rules. The FAM may provide no data redundancy, or parity or mirroring, while also taking into account SSU capacity, performance or network or CPU utilization in allocating incoming data.

    • The Failure Recovery Module (FRM) recovers data no longer available due to a device failure. The remote block manager detects failures and notifies the FRM. It locates data blocks that no longer meet redundancy requirements and recreates data from parity information and requests the FAM to allocate space. Sysadmins can limit rebuild resource consumption. The FRM is where the virtual hot spare comes in. It’s a set of idle storage blocks distributed among blocks present on the SSUs. It sounds cool, yet it looks like all it does is reserve some blocks for rebuild purposes.
  • Block Cache Module manages caching, name looks ups and metadata data structures. It caches data and metadata blocks using the Least Recently Used caching algorithm, though it may vary the caching protocol to respond to the system’s performance levels.
  • Local Block Manager manages the allocation, storage, and retrieval of data blocks stored – you guessed it! – locally.
  • Remote Block Manager Module manages inter-device communication, including, block requests, block responses, and the detection of remote device failures. The module resides at the Local File System layer.
  • Block Device Module hosts the device driver for the particular piece of disk hardware used by the file system.

Continued tomorrow, Tuesday.
Oh, and that error? The patent says

This parity information is used to perform data recovery when a disk failure occurs. The lost data is recalculated from taking the bitwise XOR of the remaining disks’ data blocks and the parity information. In typical RAID systems, the data is unrecoverable until a replacement disk in inserted into the array to rebuild the lost data.

Of course, a typical RAID system is reading and writing the data after a disk failure. Otherwise it wouldn’t be much use. What I suppose they meant was that the redundancy doesn’t get recreated until the a replacement disk is inserted. Even then, a hot spare is often allocated, so the rebuild starts automatically. An odd oversight.

Update: further research has left me in doubt about Isilon’s host OS. I said Linux above, but a couple of references indicate it may be FreeBSD. I’ve invited the Isilon folks to comment, so maybe they’ll straighten this out.

Update II: Got a detailed comment from a reader who’s looked – briefly – under the covers of the Isilon box. Recommended!

That’s all for today. Comments welcome, of course.