I signed up for the ZFS discussion group over on OpenSolaris.org and have been following a fascinating thread about a versioning file system.
The discussion has been hot for several days. There is intelligent conversation about data preservation: CDP, snapshots, source control systems and file versioning among others. This is how features often get hammered out by engineering. The good news is that these guys are trying to grok the user experience.
File versions: the prequel
You know how everytime you edit a file in Windows, OS X or, I’m guessing, Linux, your saved file is now the only version of the file you have? It doesn’t have to be that way.
You could have the file create a new version of itself every time you edit it. So later, if you decide that fourth 321 Margarita wasn’t a good idea – it usually isn’t – you could go back to an earlier version, and begin again.
File versioning was standard on DEC’s VMS and TOPS-20 operating systems, and I really liked it. I could do radical things, secure in the knowledge that I could go home again.
ZFS file versioning
It doesn’t exist today, but the engineers are talking about. Here’s part of my favorite post so far, to give you a taste of the debate between two of the folks. I’ve put their comments into different typefaces so you can tell them apart:
Given this, we’re back into the problem FV is supposed to solve. It is entirely possible for an editor to keep open a file for a long time, periodically writing out your changes without issuing a new open().
You describe this as a problem, but *I* see it as the exact thing that makes file versioning useful. It DOESN'T save random magically chosen moments; it saves exactly all the version that *you*, the user, saved at some point of the editing session.
Word with auto-save turned off is a prime example. Given this, you’ve only created a new version when you first load the document, and all your intermediary changes are lost, since it only saves the document on close().
You're forgetting that the user, unless he's stupid, will save regularly during the editing session.
Thus, in order to get benefits from FV, your editor must issue periodic close() and open() commands on the same file, as you edit, all without your intervention. Exactly how many editors do this? I have no idea. So, the only way to enable FV is to require the user to periodically push the “Save” button. Which is how much more different than the current situation?
It is completely and utterly different from the current situation. In the current situation, when I type the "save" command *I am deleting a previous version*. That's dangerous, because people don't think of it as performing a destructive operation, and hence don't give it the care and consideration they give to an explicit "rm". And that's precisely what file versioning fixes; saving a file is no longer a destructive operation.
The second guy has the better argument
Not being an engineer I’m not sure how to enter the discussion, except to say I think versioning was a great idea 30 years ago and I still think it is a great idea today.
Robin,
Rule number one – engineers* never fail to entertain, and almost every engineer conversation includes at least one condescending remark about Joe User. In this case apparently only “intelligent†users save their work on a regular basis – a claim that is completely untrue.
Rule number two – never let engineers determine how to implement any functionality for end-users, unless, of course, they’re designing a widget for their own personal use.
File versioning of the type these two engineers discussed is old news. It may not have been implemented in the file system layer of Windows or OS X, but it has been implemented in just about every half-way decent information management application created over the past 15 years.
Thanks to globalization and hundreds of popular file formats, modern file versioning in the enterprise includes the concept of renditions (i.e. file versions in different languages and file formats). Think of modern file versioning as a three-dimensional array. The first axis represents the numerical versions of a document – traditional versioning if you will. The second axis represents the various file formats in which the document versions must be published and maintained. And the third axis represents the translated copies of the versions. Publishing companies, especially daily newspapers with print and online delivery and a multilingual audience, are an extreme example of multidimensional versioning in action. Add in a real-time development across multiple authors spanning the globe, and you can imagine the complexity.
Modern information systems must be able to understand, maintain, and protect these interdependent versions. In my four years of preaching advanced information management, I have yet to encounter a single storage vendor, or file system developer whose thinking has advanced beyond decades old traditional file versioning. If I want modern file versioning I don’t rely upon the OS or file system. I find myself a decent information management application….and wait for the infrastructure guys to catch up.
Kind regards,
Joe
*I’m an ex-engineer, and hindsight is 20/20
Joseph,
Thank you for raising my awareness of the problem and explaining is so succinctly.
I’m curious, what is the application you use for “modern file versioning”? I have a similar problem: I produce StorageMojo.com using several applications, and sometimes have a tough time knowing where the most current version is.
Search is a big help, yet not the whole answer. The hyperlink reflects the social nature of information which Google uses to good effect. Yet I would love to see graphic representation of thought through the change of link relationships through time. I’m not sure this really relates to your comment, except that you got me thinking.
Robin