1 / 24

Small World: decentralized search

Small World: decentralized search. (slide credits: Leskovec, Adamic , Metaxas, and authors of corresponding papers) . Small world in LJ. LiveJournal site, c. 2004: 1.3M bloggers, who can list Friends (other LJ bloggers) Location Interests, …

taran
Download Presentation

Small World: decentralized search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Small World: decentralized search (slide credits: Leskovec, Adamic, Metaxas, and authors of corresponding papers)

  2. Small world in LJ • LiveJournal site, c. 2004: • 1.3M bloggers, who can list • Friends (other LJ bloggers) • Location • Interests, … • 500k LJ bloggers list home town and state that can be geomapped (to lat & long) • Only approximate (to within the city) • About 4M “friendship” links between these bloggers • mostly reciprocal links • 385k bloggers are in one connected component • In-degree/out-degree plots are heavy tailed

  3. In silico Small World Expt • Simulate Milgram’s experiment • Pick random start node u and target t • Repeat until message is at u’s hometown: • If u is closer to t than any of t’s friends: • Give up (failing)/Forward to random people in u’s hometown • Else: • Pass the message to the friend of u closest to t, geographically

  4. Result • Similar to Milgram • 18% (blue) or 80% (red) finish rate (vs. milgram: 30%) • Mean length 4 (blue)or 16(red):here they just reach his hometown (vs. milgram: 6) • Can we explain using Kleinberg’s model?

  5. Looking at the geographical distribution of friendship links Avg. user has ~2.5 unif. random friends and ~5.5 geo distributed Mixture of power law (local connections) and uniformly-distributed long-range links. Difference in East and West coast link probabilities Problem: Kleinberg’s paper predicts that short paths are not locally findable with Prob(uv) = 1/Z d(u,v) -1.2

  6. Improved model • Basic intuition: • need to account for population density • ranku (v) = |{w: d(uw)< d(uv)}| • New probability • Pr(random (uv) edge)  ranku (v)-r • For 2d-grid, using r=1 is same as using exponent 2 in Kleinberg’s model

  7. Fitting data • Under rank model, optimum exponent = 1 • Observed exponent ~1.2

  8. Group structure based models b=3 • Nodes belong to multiple foci, probability of edge depends on size of smallest common foci (q) • Pr(edge)  q-r Theorem: If r = 1 and outdegree is polylogarithmic, can search in O(log n) Individuals classified into hierarchies. Theorem: If a = 1 and outdegree is polylogarithmic, can search in O(log n) hij = height of lowest common ancestor h [Kleinberg, 2001]

  9. Why do they look this way? • Why would networks have navigability as a property? • Caveats: not clear how universal it is • Is there an evolutionary model? • Sandberg & Clarke [2007]: • Start with a grid + uniform random edges • Choose a (s,t) pair uniformly, start routing locally from s • With some prob, each node in path rewires long range link to directly point to t • Simulation shows that it reaches r=1 for 1d-grid

  10. Decentralized search in p2p systems

  11. Client-server vs. p2p Systems In p2p - Files can be located anywhere on the network. - Nodes join and leave - Central repository is not present - BitTorrent, Kazaa, Gnutella - How can we quickly locate where a file is?

  12. Chord • K files are assigned to N peers • Need to hash for uniform load • How can we design a hash table when nodes (i.e. hash buckets) can join and leave • Do not want to rehash all keys: O(K) work • Great idea: consistent hashing • Karger, Lehman, Leighton, Panigrahy, Levine, Lewin ‘97.

  13. Consistent Hashing • Hash both nodes and files into m-bit IDs and arrange them in a circle • Each file stored in the node following it • If any node joins or leaves, expected file location changes = O(K/N)

  14. Finding a file • Each node keeps a list of s neighbors following it • If neighbors are all active, just forward the lookup until we reach potential location • Expected steps O(N)

  15. Faster Search • Each node n keeps a list of fingers • For eachi, corresponding finger is minm node in [n+2i-1, n+2i] • Similar to “long range links” in small world models

  16. Faster Search • Use the fingers to make as much progress as possible without overshooting • In each step, we halve the distance • Expected steps = O(log N) • Technical issues related to how to update fingers, successors when nodes leave, join etc….

  17. Power Laws

  18. Power laws • x-axis: degree • y-axis: what fraction of nodes have this degree • Degree distribution is very different from what is expected for random graphs • Quite a few nodes with very high degree, lot of nodes with small degree

  19. Log log plot • Log-log axis: both x and y axis are in log • x-axis: degree • y-axis: what fraction of nodes with this degree • Shows fitted best line log(y) = A - c log(x) y = Bx-c , y  x-c

  20. Structure of the web: power laws (Broder et al.)

  21. Exponential vs. power law [slide courtesy Leskovec]

  22. Other power laws • Pareto 1897 – wealth distribution • Lotka 1926 – scientific output • Yule 1920s – species in a genera • Zipf 1940s – word frequency • Simon 1950s – city population

  23. Why are they surprising? • Because they do not follow the intuition of Central Limit Theorem • For instance, one possible way to model degree is that each node decides to link with fixed probability • X = X1 + X2 + …. + Xn, X’s are i.i.d. • This would give normal distribution in limit • Pr(X = k) normal

More Related