1 / 21

Searching in Unstructured Networks

Searching in Unstructured Networks. Joining Theory with P-P2P. What is P2P?. Distributed system where all nodes play an equal role. In the “Internet world” homogeneity cannot be guaranteed (ie. Bandwidth, storage, processing power).

macy
Download Presentation

Searching in Unstructured Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching in Unstructured Networks Joining Theory with P-P2P

  2. What is P2P? • Distributed system where all nodes play an equal role. • In the “Internet world” homogeneity cannot be guaranteed (ie. Bandwidth, storage, processing power). • Any such system requires an indexing, or searching scheme in order to function.

  3. P2P Architectures • Centralized (Napster) • Decentralized with Structure (all DHT architectures such as CAN, Chord, etc) • Decentralized and Unstructured (Gnutella, Freenet). I call these...

  4. Popular Peer to Peer (P-P2P) • No centralized directory. • No centralized control over object placement. • No centralized control over network topology. • Only most loose of guarantees and weakest assumptions can be made.

  5. Flooding (BFS) Reaches the most nodes (w/in depth D) Returns complete results Fastest brute-force search Does not scale Crawl (DFS) Successful searches terminate Minimal resources Poor response time: time increases exponential in D Current Search Strategies

  6. Outline • Current Solutions • A better approach by example • A good first step • Final thoughts

  7. Current Solutions An Heuristic Approach to Heuristics

  8. Directed BFS (DBFS) • Keep state on search results returned from neighbours. • Send query only to those neighbours with most successful “history.” • Subset of neighbours revert to standard BFS (flooding). • “Intelligent” selection of 1 neighbour produces similar success as BFS.

  9. Iterative Deepening • Global policy, P={a, b, c,…,W} • AI technique called search over state space. • Multiple BFSs with successively larger D. • Requires globally unique identifier • Reduces processing? • Increases bandwidth? • No analysis or intuition: where’s the beef?

  10. Hierarchical Approach • Nodes with higher bandwidth and processing are designated “super-nodes” • In a graph, these nodes would have higher weight (or size) • Connected via 2 types of edges: • large weight edges to other super-nodes • smaller weight edges to smaller nodes. • Founded on experience (eg. DNS system)

  11. Lead by Example: Replication in unstructured networks

  12. Current Replication Strategies • Owner Replication: objects replicated by nodes requesting object • Path Replication: object replicated at every node along path from object origin to object request.

  13. The Problem In an unstructured network, how many copies of each object should there be in order to minimize the cost of a search (assuming fixed storage)?

  14. First start with the simplest case • m objects, n nodes • Each object, i, replicated at ri random sites. • Object i requested with rate qi , • Probability of successful search in k probes:

  15. The simple case (con’t) • Average search size of i: • The average messaging overhead during a query can be represented by, A, the average search size over all objects:

  16. Replication Strategies • Assume average replicas per site • Uniform: • Proportional:

  17. Square-root Replication • Put as a minimization problem, what is the optimal allocation of replicas so that A is minimized [Kleinrock, 1976]? • This occurs when • So,

  18. Searching in Unstructured Networks: Revisited

  19. Random Walks • A query, at every node, is forwarded to a randomly chosen neighbour until the query succeeds. • A strong body of knowledge exists: • understand and quantify resource trade-offs? (ie. How many walkers? How many hops?) • gain insight into the merits of “tweaks” • helps us avoid guessing, and trial & error

  20. “Gossip” • “She tells 2 friends, and they tell 2 friends, and…” [VO5] • randomly select k neighbours to whom queries are forwarded • A young and relatively unexplored solution that (as yet) lacks significant analysis and understanding • perhaps this is the perfect opportunity

  21. Final Thoughts • DHT cannot succeed P-P2P applications without large breakthroughs in hashing (or ad-hoc solutions to search problems). • P-P2P inherently suffers from unscalable or unfinished searches. • An unstructured P2P network is still a graph; capitalise on prior work & knowledge.

More Related