110 likes | 232 Views
Semantic Overlay Networks for P2P Systems. Arturo Crespo and Hector Garcia-Molina. Motivation and Related Work. Partition the P2P network into several thematic networks Queries for a content will not reach nodes without such content
E N D
Semantic Overlay Networksfor P2P Systems Arturo Crespo and Hector Garcia-Molina
Motivation and Related Work • Partition the P2P network into several thematic networks • Queries for a content will not reach nodes without such content • Flooding in smaller networks with smaller TTL (or more results with same) • Edutella, Hypercup: peers with similar content connect to the same SuperPeer
Node partitioning • When does a node belong to SON A? • When it contains a piece of type A • When it contains more than x pieces of type A • Less nodes per SON=>more results sooner • Less SONs per node=>less connections • As in DBFS, coverage problem
SON Classification • Classification must provide: • Load-balance • Each category has similar number of nodes • Each node belongs to a small number of categories • Easy and accurate way to classify a document
Statistics • Classification from “All Music Guide” based on music style (26 styles without and 255 styles with substyles) • Traces from 1800 Napster nodes • Classifications on decade and tone provide worse results
Statistics of styles • A node belongs to a style if it has 1 document • 24% of the nodes belong to one style • 90% nodes belong to up to eight • 14 styles with 200 to 400 nodes • 2 styles with 1000 to 1200 nodes • 1 style with 1600 to 1800 nodes
Statistics of substyles • 18% of the nodes belong to one substyle • 90% nodes belong to up to 30 styles • 87% of substyles with less than 400 nodes
Statistics of document classification • Classification based on the “All Music Guide” database • Reasons of failure: • Filename format • All documents not songs • Misspellings • Database not complete • 25% of documents classified incorrectly • 4% of nodes classified incorrectly
Layered SONs • Styles and substyles • Minimum required percentage (or number) of documents of type A to join SON A • May not be able to join a substyle but may join a style • Nodes belonging to a substyle, do not join the parent style
Searching • A document is classified to belong to (sub)style A • Search all substyles of style A SONs • Search (sub)style A • Search a higher level SON • Until we get enough results • (How do we locate each SON?)
Results • Acyclic graphs, to measure effect of msgs to nodes without such content • To get half the documents that match a query: • Layered SONs: 461 msgs • Gnutella: 1731 msgs