The P4P working group demo’d their work Friday at the Distributed Computing Industry Association show in New York. Not only did they show 2-3x faster downloads, but they also cut the average number of inter-metro hops – the expensive kind – from over 5 to less than 1. Cool.
The P4PWG idea is that if P2P is both cheaper for ISPs and faster for users we will all have a happier Internet. Folks from the Yale CompSci department – Haiyong Xie, Y. Richard Yang and Avi Silberschatz – along with Verizon and Pando Networks, cooperated on the demo.
The P4PWG includes AT&T, Verizon, Pando, BitTorrent, Cisco and LimeWire among others. The cable companies are there as observers. The P4P work is an open standard with the hope that all ISPs and P2P networks will endorse it.
How does it work?
The tech papers aren’t available yet on the web, but this is what I’ve pieced together from an afternoon’s websurfing. Update: Wide-awake reader Paul found this P4P Overview on Ars Technica. Thanks Paul! End update.
P2P is network oblivious. When you start downloading streams they might be from anywhere, regardless of network cost. The problem is that big routers are costly and smaller routers are much cheaper, not to mention undersea fiber.
What P4P is inject some knowledge into the P2P network so peering decisions are made more intelligently. It looks like a network version of locality of reference.
Implementation
There are at least 2 ways to deliver network awareness to peers. Here’s one of them.
A peer-tracker (pTracker) and an Internet tracker (iTracker) are added to the P2P network. A peer requests peering information of the pTracker, which has knowledge of local (metro area) and recent non-local resources. The pTracker sends back an edited server list and the peer goes its merry way.
If the resources aren’t local and the pTracker doesn’t know the network topology, it pings the iTracker, which returns high-level peering suggestions. If locality of reference works as well in cyberspace as it does with other data the pTracker won’t be querying the iTracker very often.
It is expected that the pTracker will be maintained by the P2P network, while the iTracker could be maintained by the ISP, network or a trusted 3rd party. This should preserve help P2P user privacy, although the *Tracker names certainly won’t reduce user paranoia.
Guys, how about something less Big Brotherish? PeerServer and RoutServer? Just a thought.
The StorageMojo take
As file sizes continue their secular trend upward the need for P2P will continue to grow. By aligning ISP, telco and user needs for faster and more efficient P2P the P4PWG has pulled off a win/win/win situation.
A less obvious benefit of this work is on VoIP networks, which are also P2P. It doesn’t take much to degrade VoIP quality. To the extent that it enables improvement in P2P network node selection, the P4P project will benefit the rapidly growing population of VoIP users as well.
Kudos to the P4PWG and especially the Yale team.
Comments welcome, of course. Images courtesy of the P4PWG.
This is not too dissimilar from what I was doing for the second half of my tenure at EMC. On the one hand P4P is more fully distributed, on the other it lacks support for writes with full coherency and authentication, but those are all kind of beside the point. What matters is that the idea of setting up a dynamic tree/graph/whatever based on the actual network topology and then propagating data through that network instead of between peers at random really can produce dramatic efficiency gains. In one of my specs, which IIRC got reproduced almost verbatim in the associated patent, I refer to the two cardinal laws of data distribution:
* Never transmit data over the same link multiple times when once could have sufficed.
* Never transmit data over a link once when zero could have sufficed.
Most systems tend to violate one rule or the other, most often the first – and since infinity minus one is greater than one minus zero that can be far worse. Because of my work I was involved with some mostly-academic projects in or near the P2P space at the time. There was a distinct pattern of starting out with DHTs and such that completely ignore network topology, then trying to bolt on some limited form of topology awareness later as latency (and sometimes reliability but rarely bandwidth) turned out to be a problem after all. Few had the foresight to take topology into account right at the start.
What I really wonder about in P4P is how – and how well – the nodes cooperate wrt placement of data copies. On the one hand, it can be good to have one piece of data replicated as many places as possible. On the other, it can be good to replicate as many pieces of data as possible. Total storage space being finite even in a P2P network, balancing these needs is a much-studied problem and I’m curious what approach the P4P team took.
Ars Technica links to one paper (tech enough to have equations, at least):
http://www.dcia.info/documents/P4P_Overview.pdf