Metadata data structures
The basic insight of Isilon’s cluster is that they manage files on a pool of blocks. What we know as RAID levels exist on a per file basis, not per array. Unlike Google’s GFS, which only does file replication, Isilon does file replication through block replication and also offers file protection through parity protection. We see this in parts of the patent’s sample metadata structure:
Field | Description |
Mode | Kind of file: regular, directory, etc. |
Owner | SSU account |
Timestamp | Last modification time |
Size | Size of metadata file |
Parity count | # of parity devices used |
Mirror count | # of mirrors |
VHS count | # of virtual hot spares |
Version | of metadata structure |
Type | of data location table |
Data location table | Address of or actual table |
Reference count | # of metadata structures referencing this one |
What most arrays perform globally – hot spares, RAID levels, mirroring, recovery data – may be done on a per file basis with Isilon. Furthermore, there is a lot of flexibility in the data location structure – direct addressing and multiple levels of indirect addressing – to give the system multiple opportunities to optimize accessing blocks that may be widely scattered, especially in high performance or failure modes.
The data location flexibility is probably best seen in the ease of adding additional storage to an Isilon cluster – add an SSU and the pool of blocks becomes larger and the existing SSUs can start moving data based on their own needs.
Finally, the flexibility in the metadata structure file size and version number indicates that Isilon may add new fields as they see fit, building in new functionality with software upgrades.
Processes, processes, ad infinitum
At this point the patent goes into examples of all the processes that might be used to perform needed functions, such as data lookups and virtual hot spare provisioning. I’m sure Isilon engineers are always looking at ways to improve these essential activities, so the patent descriptions are of limited value.
The StorageMojo take
I generally consider architecture-based arguments dubious (see Architectural Appeal). Yet I also believe that storage has some secular trends (see Architecting the Internet Data Center) that one ignores at one’s peril.
There is a lot to like in the Isilon architecture, starting with their fundamental abandonment of the volume or LUN construct in favor of the storage pool. They realize that customers want to manage files, not disks. From that basic insight Isilon has put together a flexible product that is easy, by all reports, to manage and expand. I love the file-based virtual RAID capability, for one. Also, their price-neutral adoption of Infiniband is smart from both business and technical perspectives.
Where I wonder how they will play out comes from studying Google and Amazon. Isilon’s architecture buys its flexibility with a variety of resources, some cheap and some dear. CPU cycles are cheap and getting cheaper, so all the computation required for parity RAID and other functions isn’t a big concern. As a system scales even inexpensive components whose cost is a small percentage of the system start to become noticable in absolute dollars. At some point I would expect a system that doesn’t do all the computation the SSUs do would have a price advantage.
Network overhead is a bigger concern, as having data spread across multiple SSUs means there has to be a fair amount of coordination, data fetching, cache invalidation and so on. Isilon engineers are well aware of these issues, which is why they support Infiniband and before that, I believe, dual ethernets on each SSU and jumbo frames.
The biggest issue, IMHO, is the cost of the disk I/Os. Breaking a file across multiple SSUs means multiple I/Os to write and, more importantly, access a single file. Isilon concentrates on large (>1 MB) files to minimize this problem, yet this overhead must cost something. Bottom line: I suspect that Isilon has scaling problems, either in I/Os or economics due to their architecture. At what capacity these issues become apparent is beyond my ability to estimate. Readers?
That said, there is no reason that they can’t be very successful in the rather large space they have to play in. Unstructured data – files – are 85% of the data out there. There are lots of companies that would rather not manage a clumsy, LUN-based infrastructure for unstructured data.
I hope to look at Isilon’s business model through their IPO filings. They’ve had a successful launch, so it must look ok.
Comments welcome, as always. Moderation turned on to keep spam at bay.
Robin,
You say “There is a lot to like in the Isilon architecture, starting with their fundamental abandonment of the volume or LUN construct in favor of the storage pool”.
A simple NFS server running under Linux can be protected via an internal DP Raid file system (DP). This eliminates the need for an external RAID … but is slow, for various good technical reasons…. some of it is the generation of XOR.
It is a ‘low cost’ solution.
Isilon seems to be using ‘commodity motherboard’ hardware, running a specialized file system…. and hence the opportunity to eliminate the need for multiple mapping layers associated with the traditional RAID firmware … not really a new architecture.
While this may deliver some marginal performance benefits, the protection is still based on computationally intensive RAID5/6 algorithms…. which probably now reside on the central processor…. which is already very busy supporting NFS, TCP/IP etc. It remains to be seen how fast all this is under ‘writes’.
InfiniBand switch is not a new concept. Ethernet is too slow in clustered architectures. I am surprised that they started with Ethernet.
Very little choice here… IB or 10Gbit Ethernet …. both expensive.
Lets not forget that there is only *one * manufacturer of IB chips. IB Host adapters & switches are not cheap… but come with drivers, etc …. easier integration job.
Striping across multiple enclosures does not add to the performance until the existing data is moved to the new enclosure. Re-striping is very time-consuming and may be prone to failure… no immediate scaling in performance to the existing users, plus the extra management…. forever !.
Reported use of large ‘chunk’. size…
Small writes will produce a ‘storm’ over the IB switch i.e. RAID algorithms trying to cope with partial writes.…. this is why they need IB .
Performance is probably reasonable under sequential ‘read only’ traffic… but so what? It is very limited by the 1 Gbit Ethernet Host connection … plus the overhead of the protocol stack. Extra host connections require additional enclosures… more cost and re-striping.
The next step here… can only be to 10 Gbit Ethernet… more cost.
Remains to be seen how ’revolutionary’ all this is.
Richard,
You’re spot on. I question the scalability of Isilon because of the parity RAID overhead. Google optimizes for cheap CPU cycles and they don’t do any parity RAID. Who do you think is smarter?
YottaYotta used Infiniband, which is a technology I like. I don’t know what the state of Infiniband management software is these days, but I’ve heard it isn’t in the same league with ethernet.
I do applaud Isilon for virtualizing the block pool, which enables minimizing storage management, the major cost of SANs today. They aren’t going after the 15% of structured data – they want the 85% of unstructured data that no one has time to manage. Revolutionary? No. Lucrative? Could be. Sounds like a plan to me.
Cheers,
Roibn