Isilon’s Cluster Technology. Pt. 2.0

by Robin Harris | Tuesday, January 30, 2007 | Clusters, Enterprise, NAS, IP, iSCSI | 2 comments

Metadata data structures
The basic insight of Isilon’s cluster is that they manage files on a pool of blocks. What we know as RAID levels exist on a per file basis, not per array. Unlike Google’s GFS, which only does file replication, Isilon does file replication through block replication and also offers file protection through parity protection. We see this in parts of the patent’s sample metadata structure:

Field	Description
Mode	Kind of file: regular, directory, etc.
Owner	SSU account
Timestamp	Last modification time
Size	Size of metadata file
Parity count	# of parity devices used
Mirror count	# of mirrors
VHS count	# of virtual hot spares
Version	of metadata structure
Type	of data location table
Data location table	Address of or actual table
Reference count	# of metadata structures referencing this one

What most arrays perform globally – hot spares, RAID levels, mirroring, recovery data – may be done on a per file basis with Isilon. Furthermore, there is a lot of flexibility in the data location structure – direct addressing and multiple levels of indirect addressing – to give the system multiple opportunities to optimize accessing blocks that may be widely scattered, especially in high performance or failure modes.

The data location flexibility is probably best seen in the ease of adding additional storage to an Isilon cluster – add an SSU and the pool of blocks becomes larger and the existing SSUs can start moving data based on their own needs.

Finally, the flexibility in the metadata structure file size and version number indicates that Isilon may add new fields as they see fit, building in new functionality with software upgrades.

Processes, processes, ad infinitum
At this point the patent goes into examples of all the processes that might be used to perform needed functions, such as data lookups and virtual hot spare provisioning. I’m sure Isilon engineers are always looking at ways to improve these essential activities, so the patent descriptions are of limited value.

The StorageMojo take
I generally consider architecture-based arguments dubious (see Architectural Appeal). Yet I also believe that storage has some secular trends (see Architecting the Internet Data Center) that one ignores at one’s peril.

There is a lot to like in the Isilon architecture, starting with their fundamental abandonment of the volume or LUN construct in favor of the storage pool. They realize that customers want to manage files, not disks. From that basic insight Isilon has put together a flexible product that is easy, by all reports, to manage and expand. I love the file-based virtual RAID capability, for one. Also, their price-neutral adoption of Infiniband is smart from both business and technical perspectives.

Where I wonder how they will play out comes from studying Google and Amazon. Isilon’s architecture buys its flexibility with a variety of resources, some cheap and some dear. CPU cycles are cheap and getting cheaper, so all the computation required for parity RAID and other functions isn’t a big concern. As a system scales even inexpensive components whose cost is a small percentage of the system start to become noticable in absolute dollars. At some point I would expect a system that doesn’t do all the computation the SSUs do would have a price advantage.

Network overhead is a bigger concern, as having data spread across multiple SSUs means there has to be a fair amount of coordination, data fetching, cache invalidation and so on. Isilon engineers are well aware of these issues, which is why they support Infiniband and before that, I believe, dual ethernets on each SSU and jumbo frames.

The biggest issue, IMHO, is the cost of the disk I/Os. Breaking a file across multiple SSUs means multiple I/Os to write and, more importantly, access a single file. Isilon concentrates on large (>1 MB) files to minimize this problem, yet this overhead must cost something. Bottom line: I suspect that Isilon has scaling problems, either in I/Os or economics due to their architecture. At what capacity these issues become apparent is beyond my ability to estimate. Readers?

That said, there is no reason that they can’t be very successful in the rather large space they have to play in. Unstructured data – files – are 85% of the data out there. There are lots of companies that would rather not manage a clumsy, LUN-based infrastructure for unstructured data.

I hope to look at Isilon’s business model through their IPO filings. They’ve had a successful launch, so it must look ok.

Comments welcome, as always. Moderation turned on to keep spam at bay.

2 Comments

Richard on Thursday, 1 February, 2007 at 1:14 am

Robin,

You say “There is a lot to like in the Isilon architecture, starting with their fundamental abandonment of the volume or LUN construct in favor of the storage pool”.

A simple NFS server running under Linux can be protected via an internal DP Raid file system (DP). This eliminates the need for an external RAID â€¦ but is slow, for various good technical reasonsâ€¦. some of it is the generation of XOR.
It is a â€˜low costâ€™ solution.

Isilon seems to be using ‘commodity motherboardâ€™ hardware, running a specialized file system…. and hence the opportunity to eliminate the need for multiple mapping layers associated with the traditional RAID firmware â€¦ not really a new architecture.

While this may deliver some marginal performance benefits, the protection is still based on computationally intensive RAID5/6 algorithms…. which probably now reside on the central processor…. which is already very busy supporting NFS, TCP/IP etc. It remains to be seen how fast all this is under ‘writes’.

InfiniBand switch is not a new concept. Ethernet is too slow in clustered architectures. I am surprised that they started with Ethernet.
Very little choice hereâ€¦ IB or 10Gbit Ethernet â€¦. both expensive.

Lets not forget that there is only *one * manufacturer of IB chips. IB Host adapters & switches are not cheapâ€¦ but come with drivers, etc â€¦. easier integration job.

Striping across multiple enclosures does not add to the performance until the existing data is moved to the new enclosure. Re-striping is very time-consuming and may be prone to failureâ€¦ no immediate scaling in performance to the existing users, plus the extra managementâ€¦. forever !.

Reported use of large â€˜chunkâ€™. sizeâ€¦
Small writes will produce a â€˜stormâ€™ over the IB switch i.e. RAID algorithms trying to cope with partial writes.â€¦. this is why they need IB .

Performance is probably reasonable under sequential â€˜read onlyâ€™ traffic… but so what? It is very limited by the 1 Gbit Ethernet Host connection … plus the overhead of the protocol stack. Extra host connections require additional enclosuresâ€¦ more cost and re-striping.

The next step hereâ€¦ can only be to 10 Gbit Ethernetâ€¦ more cost.

Remains to be seen how â€™revolutionaryâ€™ all this is.
Robin Harris on Thursday, 1 February, 2007 at 10:37 pm

Richard,

You’re spot on. I question the scalability of Isilon because of the parity RAID overhead. Google optimizes for cheap CPU cycles and they don’t do any parity RAID. Who do you think is smarter?

YottaYotta used Infiniband, which is a technology I like. I don’t know what the state of Infiniband management software is these days, but I’ve heard it isn’t in the same league with ethernet.

I do applaud Isilon for virtualizing the block pool, which enables minimizing storage management, the major cost of SANs today. They aren’t going after the 15% of structured data – they want the 85% of unstructured data that no one has time to manage. Revolutionary? No. Lucrative? Could be. Sounds like a plan to me.

Cheers,

Roibn

Trackbacks/Pingbacks

Pragmatic Dictator » Blog Archive » links for 2007-02-02 - [...] StorageMojo Â» Isilonâ€™s Cluster Technology. Pt. 2.0 (tags: storage) [...]