Very Cool. Why?
Here are the highlights of some of the changes you’d see with ZFS on Leopard, the next version of the Mac OS.
No More Disk Warrior
Data corruption on PCs and Macs is a sad and stupid fact of life. Power failures, flaky RAM, poor grounding, (slowly) failing hard drives, driver glitches, phantom writes and more conspire to rot your data.
ZFS eliminates that. All blocks are checksummed and the checksum is stored in a parent block. ZFS always knows if the block is correct and/or corrupt. Every block has a parent block (with one obvious exception that gets special treatment), so the entire data store is self-validating. You’ll never have to wonder if all your data is correct again. It is.
No RAID Cards or Controllers
ZFS implements very fast RAID that fixes the performance knock-off against software RAID. In ZFS all writes are the fastest kind: full stripe writes. And the RAID is running on the fastest processor in your system (your Mac), rather than some 3-5 year old microcontroller.
Just add drives to your system and you have a fast RAID system. With Serial Attach SCSI and SATA drives you’ll pay for the drives (cheap and getting cheaper), cables and enclosures.
No More Volumes
Every time you add a disk to your Mac you see another disk icon on the desktop. If you want to RAID some disks you use Disk Utility (or something) to create the volume. Slow, error-prone, confusing.
ZFS eliminates the whole volume concept. Add a disk or five to your system and it joins your storage pool. More capacity. Not more management.
Backup Made Easy
ZFS does something called snapshot copy, which creates a copy of all your data at whatever point in time you want. Copy the snapshot up to a disk, tape or NAS box and you are backed up.
Create a snapshot on every write if you want, so if your database barfs you can go back to just before it choked.
But That’s Not All!
For in-depth treatment of ZFS see here and here. Includes links to more technical info and benchmarks.
ZFS sounds like awesome technology. I sure hope Apple is integrating it into 10.5!!!
> You’ll never have to wonder if all your data is correct again. It is.
How comes? All you can be sure in that case is that you can detect an error when comparing read data and correspoding checksum. But still I don’t get how this prevents rodden bits for instance. It only helps detecting it! (same is true for md5sum).
> No RAID Cards or Controllers
If you are going for a serious RAID, you still should have more than one controller in order to ensure that a slightly failing controller doesn’t get you. Think for instance in the tape world where people write tapes with a unadjusted head and are not able to read the data on another drive. Same to a certain degree can be true for disks – you simply can’t trust the hardware underneath you, so you should have multiple channels and mirror ofter them (at least accross two of them).
Damn, this is too good to be true. It’s now almost unreliable due to the buzzwordiness of it all.
I am spending countless evenings studing storage architectures and designs and it turns out the answer might be a osx upgrade away from being under my nose.
Your three posts give a great feature overview of ZFS and I love the simplicity of JBOD + ZFS = piece of mind for cheap.
But can you comment on what kinds of workloads this is planned for. When I see things like we don’t write over new data, I wonder if that’s a design that makes sense for heavy database workloads. Could you advise if this appeals to Apple Geniouses who are tired of crying power book users, or search engine wanna-be’s.
What about about us Enterprise LAMP on a shoe-string folks? In particular the heavy MySQL workload folks and Apache server folks.
P.S. the first link points to a word press login not the page you intended.
A couple of quick answers to a couple of the questions above.
How comes? All you can be sure in that case is that you can detect an error when comparing read data and correspoding checksum. But still I don’t get how this prevents rodden bits for instance. It only helps detecting it! (same is true for md5sum).
Couple of points: Checksums don’t just say there is an error, they also correct most of them. If a block is really hosed, you rebuild from the RAID copy.
No RAID controllers means No RAID controllers. You may want to have dual interconnects to your storage with failover for redundancy, but all you need is something to send the bits over the wire – no XOR, no RAID 5, no cache. So failure modes should be much simpler and less frequent, which the checksums will catch.
Workloads:
ZFS is designed for heavy-duty server workloads just like Solaris runs all the time. It has fancy I/O scheduling algorithms that do smart things, plus its code is compact. Check out http://blogs.sun.com/roller/page/bill?entry=zfs_vs_the_benchmark for more info on performance.
Oh and thanks for pointing out the broken link.
So the file system doesn’t see new disks as new volumes. Does that include all types of media, even removable media such as flash drives? For example, I work at two places and use a flash drive to move files from one to another (synching the two). I need to have a separate volume. Or what about the iPod, or memory card readers?
I now begin to see how Leapard’s backup/recovery feature is implemented. Hey, it may depend on ZFS. Maybe the delay of Leapard is because ZFS cannot be root until the end of the year.
Hans,
Good questions – wish I knew the answers. IIRC ZFS has the usual raft of options, so new disks can either be automagically added to the pool or the sysadmin can be queried. So I think it can be handled, and certainly would be for a consumer computer.
REM,
Apple folks were vociferous in maintaining that Time Machine did not rely upon ZFS, and I take them at their word. Having said that though, there is no doubt that Time Machine and other graphically oriented storage utilities become much simpler under ZFS. Also, I recently realized that ZFS has some important features for small, flash-based devices.
Robin