190 likes | 295 Views
Peer-to-Peer Infrastructure and Applications. Andrew Herbert Microsoft Research, Cambridge +44 1223 479818 aherbert@microsoft.com. Microsoft and the Grid. Shared vision of the “virtual organization” But focused on e-Business rather than e-Science Grid investments
E N D
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge +44 1223 479818 aherbert@microsoft.com
Microsoft and the Grid • Shared vision of the “virtual organization” • But focused on e-Business rather than e-Science • Grid investments • Windows clusters for large scale computations • TerraServer projects for large data sets • Globus port to Windows • Globus implementation of OGSA • Wrap Grid services as Web services • Leverage web services as Grid infrastructure • E.g., Hailstorm user authentication
Microsoft Research • Web services enable wide area integration • How to extend this to enable efficient wide scale information sharing and collaboration? • Move to a model of peer-to-peer service implementation in contrast to today’s server-based model • Necessarily scalable and self-organizing • Necessarily simple developer framework • No conflict with WSDL, SOAP etc
Peer-to-Peer today • Music / video download • Napster, Morpheus, Gnutella • Distributed computing • SETI@Home • Research community • looking for general purpose frameworks • discovering useful applications
Peer-to-Peer applications • Publish/Subscribe Event Notification (SCRIBE) • Share load of supporting topics and disseminating messages from publishers to subscribers • Distributed document archive (PAST) • Share load of storing documents reliably • Web caching (SQUIRREL) • Share load of caching web pages • Dynamic directory (OVERLOOK) • Share load of storing directory entries for dynamic data
A P2P framework requires: • Content-based addressing • Hash content to key • Route message to computer hosting that key • Dynamic caching and proxying • Local computers stand in for remote ones • Faster access, reduced load on key holder • Replication and automatic failover • Store at K computers adjacent to key holder • Multicast cascade for group communication • Each computer needs a spanning tree of routes for reaching every other computer
Overlay Networks • Peer-to-peer requires richer routing semantics than IP • IP routes to destination computer, not content • URLs route to destination computer, not content • IP multicast isn’t widely deployed • Solution: Overlay networks • allow applications to participate in hop-by-hop routing decisions • Ideal overlay is efficient, self-organizing, scalable, and fault-tolerant
Pastry outline • Computers (Nodes) have unique Id • Typically 128 bits long • Primitive: Route (msg, key) • Deliver msg to currently alive node with Id closest numerically to key • Scalable, efficient • Per node routing table O(log(N)) entries • Route in O(log(N)) steps • Fault tolerant • Self-fixes routing tables when nodes added, deleted or fail
Pastry routing 0XXX 1XXX 2XXX 3XXX 0112 2321 START 0112 routes a message to key 2000. 2032 First hop fixes first digit (2) 2001 Second hop fixes second digit (20) END 2001 closest live node to 2000.
Pastry routing table Routing table: For each level, nearest peer for other domains Namespace leaf set: nearest Ids to “left” and “right” in name space Each entry gives IP address for host associated with Id
Pastry node addition • Want to add new node to the system • Invent a new random nodeId X • Go to a nearby or well-known node A • Route to “key” X via A (finds node Z with Id closest to X) • Obtain leaf set from Z and rebuild • Obtain routing table entries from each node along the route from A to Z and rebuild • Register with each member of A’s namespace leaf set so they adjust their leaf sets and rebuild • Find nearest leaf set node and use itsrouting table to improve locality
Scribe: A Pastry Application Publisher Publisher • Publish / subscribe is a popular model for “event driven” systems with volatile membership • Decouple event publishers from event subscribers • Publishers don’t know in advance who subscribers are • Subscribers don’t know in advance who publishers are • Challenge is how to multicast notifications from topics efficiently Topic of interest Subscriber Subscriber
Scribe: architecture • Topic hashed to a key • Construct a multicast tree based on the Pastry network • Have the (Pastry) node with the closest Id to the topic key be the root • This node replicates knowledge of the topic to its k nearest neighbours for resilience • Pass event notification down through the tree • Each parent forwards event to it’s children • Avoids over stressing network links close to the topic node
Scribe: Topic creation • Each topic is assigned a topicId • Root of the multicast tree= node with nodeId numerically closest • Create(topic): route through Pastry to the topicId Root T Create(T)
Scribe: subscribing 1111 1000 1111 1100 1101 1100 1101 1011 1001 1011 0100 1001 0100 0111 1000 0111
Publish(topic, event) Route through the Pastry network using the topicId as the destination Dissemination along the multicast tree starting from the root Scribe: event dissemination E 1100 1101 1011 1011 0100 0111
Summary • Peer-to-peer techniques are good for wide area information sharing and collaborative computation • Overlay networks enable peer-to-peer distributed computing • Pastry is an efficient, scalable, self-organizing peer-to-peer framework • Pastry makes it easy to build powerful peer-to-peer applications • For more see: http://research.microsoft.com/~antr/Pastry/