240 likes | 556 Views
Implicit group messaging in peer-to-peer networks Daniel Cutting, 28th April 2006 Advanced Networks Research Group Outline. Motivation and problem Implicit groups Implicit group messaging (IGM) P2P model Evaluation Motivation.
E N D
Implicit group messaging inpeer-to-peer networks Daniel Cutting, 28th April 2006 Advanced Networks Research Group
Outline. • Motivation and problem • Implicit groups • Implicit group messaging (IGM) • P2P model • Evaluation
Motivation. • It’s now very easy to publish content on the Internet: blogs, podcasts, forums, iPhoto “photocasting”, … • More and more publishers of niche content • Social websites like Flickr, YouTube, MySpace, etc. are gateways for connecting publishers and consumers • Similar capability would also be desirable in P2P • Collaboration and sharing without central authority • No reliance on dedicated infrastructure • No upfront costs, requirements
Problem. • As more new niches are created, consumers need to search/filter more to find and collate varied content • How can we connect many publishers and consumers? • The publisher already knows the intended audience • Can often describe the audience in terms of interests • Does not know the names of individual audience members • So, address them as an implicit group
Implicit groups. • Explicit groups • Members named • Pre-defined by publisher or consumers need to join • Wolfgang, Julie • Implicit groups • Members described • Publisher defines “on the fly”, consumers don’t need to join • Soccer & Brazil
Implicit group messaging. • CAST messages from any source to any implicit group at any time in a P2P network • Each peer described byattributes (capabilities, interests, services, …), e.g. “Soccer”, “Brazil” • Implicit groups are specified as logical expressions of attributes, e.g. “(Soccer OR Football) AND Brazil” • System delivers messages from sources to all peers matching target expressions
P2P model. • A fully distributed, structured overlay network • Peers maintain a logical Cartesian surface (like CAN) • Each peer owns part of the surface and knows neighbours • Peers store data hashed to their part of the surface • Peers geometrically ROUTE to locations by passing from neighbour to neighbour • Quadtree-based surface addressing • Smoothly combine two major techniques for efficient CAST delivery to groups of any size
P2P model. • Attribute partitioning:“attribute peer” index for small groups • Summary hashing:for reaching BIG groups • Hybrid CAST algorithm: reactive multicast algorithm combining the above
Quadtree-based addressing. • Surfaces can be any dimensionality d • An address is a string of digits of base 2d • Map from an address to the surface using a quadtree decomposition • Quadrants called extents
Attribute partitioning. • A distributed index from each attribute to all peers • Indices are stored at rendezvous points (RPs) on the surface by hashing the attribute to an address
Attribute partitioning (registration). • Every peer registers at each of its attributes RPs • Every registration includes IP address and all attributes
Attribute partitioning (CASTing). • To CAST, select one term from target • Route CAST to its RP • RP finds all matches and unicasts to each
Attribute partitioning. • Simple, works well for small groups and rare attributes • Fast: just one overlay route followed by unicasts • Fair: each peer responsible for similar number of attributes • BUT common attribute lots of registrations at one RP • Heavy registration load on some unlucky peers • ALSO big groups many identical unicasts required • Heavy link stress around RPs • SO, in these cases share the load with your peers!
Summary hashing. • Spreads registration and delivery load over many peers • In addition to attribute registrations, each peer stores a back-pointer and a summary of their attributes at one other location on the surface • Location of summary encodes its attributes • Given a target expression, any peer can calculate all possible locations of matching summaries (and thus find pointers to all group members) • Summaries distributed over surface; a few at each peer
Summary hashing (registration). • Each peer creates a Bloom Filter • {Soccer,Brazil}01101 01100 | 01001 • Treat bits as an address • 01101(0) 122 (2D) • Store summary at that address on the surface Benoit {Argentina, Soccer} Wolfgang {Soccer, Brazil} Kim {Brazil} Julie {Soccer, Argentina, Brazil}
Summary hashing (CASTing). • Can find all summaries matching a CAST by calculating all possible extents where they must be stored • Convert CAST to Bloom Filter, replace 0s with wildcards • Soccer & Brazil {Soccer, Brazil} *11*1 01100 | 01001 • Any peer with both attributes must have (at least) the 2nd, 3rd and 5th bits set in their summary address • The wildcards may match 1s or 0s depending on what other attributes the peer has
Summary hashing (CASTing). • Find extents with 2nd, 3rd and 5th bits are set • {Soccer,Brazil} *11*1(*)= { 122, 123, 132, 133, 322, 323, 332, 333 }
Summary hashing (CASTing). • Start anywhere and intersect unvisited extents with target expression • Cluster remainder and forward towards each one until none remain • When summaries are found, unicast to peers • Called Directed Amortised Routing (DAR)
IGM on P2P summary. • Peers store their summary on the surface and register at the RP for each of their attributes • If an RP receives too many registrations for a common attribute, it simply drops them • To CAST, a source peer picks any term from target expression and tries a Partition CAST (through an RP) • If RP doesn’t know all matching members (because it’s a common attribute) or the group is too large to unicast to each one, it resorts to a DAR
Evaluation. • 2,000 peer OMNeT++/INET simulation of campus-scale physical networks, 10 attributes per peer (Zipf) • 8,000 random CASTs of various sizes (0 to ~900 members) • Comparison to a Centralised server model • Metrics • Delay penalty • Peer stress (traffic and storage)
Evaluation (delay penalty). • Ratio of Average Delay (RAD) and Ratio Maximum Delay (RMD) compared to Centralised model • 80% of CASTs have average delay less than 6 times Centralised model • 95% havemaximum delayless than 6 timesCentralised
Evaluation (peer stress). • Order of magnitude fewer maximum packets handled by any one peer over the Centralised serverHigher average stresssince more peers involvedin delivering CASTs • Even spread ofregistrations over peers
Conclusion • Implicit groups are a useful way of addressing a group when you know what they have in common but not who they are • IGM is also applicable to other applications • Software updates to those who need them • Distributed search engines • P2P implicit group messaging is fast and efficient • Does not unfairly stress any peers or network links • Can deliver to arbitrary implicit groups with large size variation