A distinguished scholar published a book last year about data replication in the Greek-speaking ancient world. He examined a group of texts and how the technology and context of the times affected data integrity.

He looked at (I think he had some help) over 5700 ancient source texts, all of them at least copies of copies of copies, to find textual variants. There are over 250,000 variants, or more than one for every word of the texts. Makes floppies look like graven stone.

Boy, do we have it good!
We may complain about migrating data from one Windows machine to another, but the ancients had it far worse. Data replication technology was a guy looking at a text and copying it. No printing presses, not even punchcards. Primitive in the extreme.

The UI really stunk!
The standard scribal technique was to write without lifting pen from parchment, papyrus, vellum or whatever. No gaps between the words. No punctuation. TheywouldjustwriteandwriteuntilwellIdonotknowwhentheywouldstop. And they wouldn’t have that period there. Needless to say, no paragraphs, headers or hypertext links.

No wonder people couldn’t read. With text like that who would want to?

Reading a Turing machine tape, except in Greek
People make mistakes. Bored people make mistakes. Poorly trained people make more mistakes. Usually the folks copying these texts were amateurs, making a copy for themselves or for friends, maybe at the end of a long day. The words all running together, many of the words looking alike. Some common error patterns emerged, such as:

  • Mistaking one letter or word for another
  • Eye-skips, where the copyist skipped a line
  • Dictation errors, where one person was reading to the copyist and a word was substituted for one it sounded like

Mistakes on purpose
People, being people, often have opinions about a text, and sometimes the copyist would change the text to, in their opinion, correct or improve the text. Much of the book is taken up with analyzing where and why these changes were introduced, using rules developed by scholars over several hundred years to attempt to reconstruct the original text.

AFAIK no other ancient text has received such rigorous scholarly treatment. I find the techniques fascinating, even if they result in less certainty, rather than more, about the original, long lost, text.

Modern day counterparts
Our ability to store massive amounts of data has a downside: we can store massive amounts of error as well. Credit reports have high error rates that can cost people real money. America’s infamous “no-fly” list has snagged Senator Ted Kennedy and the wife of another Senator. To err is human. To err and preserve it in computer files demonic.

Oh, and the text is:
The New Testament. The book on textual analysis is Bart D. Ehrman’s Misquoting Jesus, The Story Behind Who Changed the Bible and Why. Bart is chairman of the religious studies department at UNC. A fascinating book, aimed at laypeople, on New Testament textual analysis. I highly recommend it.

The StorageMojo take
I’m not making or asking for any comment of the religious implications of Bart’s textual analysis of the New Testament. What is valuable, IMHO, is the awareness that information gets altered in many ways for many reasons.

Even in the age of bit-perfect digital copies, we also have tools that allow us to edit, alter and even fake digital information. One of the highest purposes of education is foster the ability to evaluate information independently of supposed authority, provenance or reliability. I don’t think that will ever change no matter what technological marvels we develop.

Comments welcome, of course. I haven’t been writing as frequently as I would like on StorageMojo due partly to travel and to other work, including my new blog on ZDnet. I plan to keep up with both, yet I expect it will take some time for me to figure out what, if any, the audience differences are between the two.