Given how hard it is to save data you want (see The Universe hates your data) to keep, losing data on the web should be easy. It isn’t, because it gets stored so many places in its travels.

But the power of the web means that silliness can now be stored and found with the speed of a Google search. You don’t want sexy love notes – or pictures – to a former flame posted after infatuation ends.

Or maybe you want to discuss relationship, health or work problems with a friend over email – and don’t want your musings to be later shared with others. Wouldn’t it be nice to know that such messages will become unreadable even if your friend is unreliable?

Researchers built a prototype service – Vanish – that seeks to:

. . . ensure that all copies of certain data become unreadable after a user-specified time, without any specific action on the part of a user, without needing to trust any single third party to perform the deletion, and even if an attacker obtains both a cached copy of that data and the user’s cryptographic keys and passwords.

That’s a tall order. Their 1st proof-of-concept failed. But they are continuing the fight.

In Vanish: Increasing Data Privacy with Self-Destructing Data Roxana Geambasu, Tadayoshi Kohno, Amit A. Levy and Henry M. Levy of the University of Washington computer science department present an architecture and a prototype to do just that.

Ironically, the project utilizes the same P2P infrastructures that preserves and distribute data: BitTorrent’s VUZE distributed hash table (DHT) client.

The basic idea is this: Vanish encrypts your data with a random key, destroys the key, and then sprinkles pieces of the key across random nodes of the DHT. You tell the system when to destroy the key and your data goes poof!

They developed a data structure called a Vanishing Data Object (VDO) that encapsulates user data and prevents the content from persisting. And the data becomes unreadable even if the attacker gets a pristine copy of the VDO from before its expiration and all the associated keys and passwords.

Here’s a timeline for that attack:

DHT overview

A DHT is a distributed, peer-to-peer (P2P) storage network. . . . DHTs like Vuze generally exhibit a put/get interface for reading and storing data, which is implemented internally by three operations: lookup, get, and store. The data itself consists of an (index, value) pair. Each node in the DHT manages a part of an astronomically large index name space (e.g., 2160 values for Vuze).

DHTs are available, scalable, broadly distributed and decentralized with rapid node churn. All these properties are ideal for an infrastructure that has to withstand a wide variety of attacks.

Vanish architecture

Data (D) is encrypted (E) with key (K) to deliver cyphertext (C). Then K is split into N shares – K1,…,KN – and distributed across the DHT using a random access key (L) and a secure pseudo-random number generator. The K split uses a redundant erasure code so that a user definable subset of N shares can reconstruct the key.

The erasure codes are needed because DHTs lose data due to node churn. It is a bug that is also a feature for secure destruction of data.

They built a Firefox plug-in for Gmail to create self-destructing emails and another – FireVanish – for making any text in a web input box self-destructing. They also built a file app, so you can make any file self-destructing. Handy for Word backup files that you don’t want to keep around.

The major change to the Vuze BitTorrent client was less than 50 lines of code to prevent lookup sniffing attacks. Those changes only affect the client, not the DHT.

The Vanish proto was cracked by a group of researchers at UT Austin, Princeton, and U of Michigan. They found that an eavesdropper could collect the key shards from the DHT and reassemble the “vanished” content.

Who is going to collect all the shard-like pieces on DHTs? Other than the NSA and other major intelligence services, probably no one. For extra security the data can be encrypted before VDO encapsulation.

The StorageMojo take
The Internet is paid for with our loss of privacy. Young people may think it no great loss, check back in 20 years and we’ll see what you think then.

It is slowly dawning on the public that their lives are an open book on the Internet. Expect a growing market for private communication and storage if ease-of-use and trust issues can be resolved.

You don’t have to be Tiger Woods to want to keep your private life private. I hope the Vanish team succeeds.

Courteous comments welcome, of course. Figures courtesy of the Vanish team.