IMHO, both. In a storage industry where the hardware cost to protect data keeps rising, ZFS represents a software solution to the problem of wobbly disks and data corruption. Thus it is a threat to hardened disk array model of very expensive engineering on the outside to protect the soft underbelly of ever-cheaper disks on the inside.
It’s Software Version of the Initiation Rite in A Man Called Horse
Before I jump into the review of ZFS, let me share what I like best about it, from a slide in the modestly titled “ZFS, The Last Word In Filesystems” presentation:
ZFS Test Methodology
- A Product is only as good as its test suite [amen, brother!]
- ZFS designed to run in either user or kernel context
- Nightly “ztest” program does all of the following in parallel:
- Read, write, create and delete files and directories
- Create and destroy entire filesystem and storage pools
- Turn compression on and off (while FS is active)
- Change checksum algorithm (while FS is active)
- Add and remove devices (while pool is active)
- Change I/O caching and scheduling policies (while pool is active)
- Scribble random garbage on one side of live mirror to test self-healing data
- Force violent crashes to simulate power loss, then verify pool integrity
- Probably more abuse in 20 seconds than you’d see in a lifetime
- ZFS has been subjected to over a million forced, violent crashes without losing data integrity or leaking a single block
Great article – maybe im missing something fundamental, but when you say,
“Any checksum stored with the data it is supporting can only tell you that this data is uncorrupted. It could be the wrong data”
How is this different than ZFS? Is the checksum calculated at a different time? If the checksum is calculated on corrupt data, then you’d have the same problem of a valid checksme with invalid data.
Garbage In, Garbage out seems to continue to apply either way. Can you give an example of a failture that ZFS would recover from, that a traditional system would not?
Good question, Jonathan. The basic issue is if the checksum is in the packet with the data, all it tells you is that the data is correct. But it may not be the data you want. Jeff Bonwick, the ZFS architect cites phantom writes and misdirected reads (corruption in a pointer file?) as reasons that the wrong block might be returned. Since ZFS keeps the checksum seperate from the data, it knows if the wrong data has been returned.
Since the checksum is calculated while the data is in the host’s RAM, I suppose it could be corrupt, but if that were a regular occurence the host would crash pretty quickly.
How often does this happen? I have no idea. Since this is a statistical universe, I’d assume that as data stores continue to grow the incidence will cross a cognitive threshold and boom! all the CIOs will get excited.
From a marketing perspective, no storage vendor has gone broke scaring the hell out of customers. Someone is going to pick this up and run with it. Sun? Not likely — they’ve got a lot of Storagetek to sell. More likely some hungry mid-tier vendor will start the ball rolling.
Hey, ZFS guys, any comment?
Are you referencing Jeff Bonwick’s post of—
“Friday December 09, 2005
ZFS End-to-End Data Integrity ”
in your reply?
If so, this issue is at the core of developing the definitive Search, Find, and Obtain (SFO) “killer app” (function) that is the cornerstone of Peter Morville’s Findability. [My words, not Peter’s.]
Findability is the key element in determining the User Experience (UX) defined by Peter in “Ambient Findability”. True, everything above the SFO is way above the file system level. However, a big stumbling block for developing a working SFO has been Information Integrity (II [my acronym]). In my concept of End-to-end Information on Demand (E2EIoD) SFO is mission critical and depends totally on Information Integrity (II). Particularly with regard to returning useful Information in a manner and time frame that generates a pleasing User Experience (UX).
The “Operational Definition” of “pleasing User Experience (UX)” is, “Did you get what you asked for”? Was it fast enough to be of use? Other requirements to this definition are:
1) Did you get “value add” Information that was helpful?
2) If the Information was not available did the reply contain meaningful information as to when it would be available?
3) If the “exact” Information was not available was any meaningful “fuzzy search’ or “sounds like” Information returned?
I have been working for years on ways to overcome the very problems addressed by the ZFS developers. EMC has been particularly obtuse when I raised these issues with their equipment. I raised these issues with other Storage vendors as well but I made the mistake of thinking EMC understood the problem. They talked like they did. Turns out it was EMC”Show-biz” talk.
Sure. I’ve explained this to the best of my ability here. If you find this unsatisfactory or unclear in any way, please let me know so I can improve it.
Interesting article. But there is a question, as soon as you compared ZFS with Google’s GFS: Does ZFS support distribution over a cluster? Reading the docs, I have no indications of that. Effective and fault tolerant clustering is a major feature of GFS.
No cluster support today. I’m told it is high on the list though, with implementation of a global namespace.
Dear sir:
I am very glad to hear from you. Please forgive me for disturbing. I’m a student in China and interested in the zettabyte file system. I know you are a expert in this field. On the one hand, I know zfs is the system that write out of place, which is COW. So this file system will produce the garbage. On the other hand, how to collect the garbage is a problem in my mind and I can’t find it. After repeated inquiries on the web, I failed. So i want to ask for your help. Could you help me explain how zfs collect the garbage. Thank you for your reading. I am looking forward to your reply. Best wishes for you.