240 likes | 351 Views
Small World: decentralized search. (slide credits: Leskovec, Adamic , Metaxas, and authors of corresponding papers) . Small world in LJ. LiveJournal site, c. 2004: 1.3M bloggers, who can list Friends (other LJ bloggers) Location Interests, …
E N D
Small World: decentralized search (slide credits: Leskovec, Adamic, Metaxas, and authors of corresponding papers)
Small world in LJ • LiveJournal site, c. 2004: • 1.3M bloggers, who can list • Friends (other LJ bloggers) • Location • Interests, … • 500k LJ bloggers list home town and state that can be geomapped (to lat & long) • Only approximate (to within the city) • About 4M “friendship” links between these bloggers • mostly reciprocal links • 385k bloggers are in one connected component • In-degree/out-degree plots are heavy tailed
In silico Small World Expt • Simulate Milgram’s experiment • Pick random start node u and target t • Repeat until message is at u’s hometown: • If u is closer to t than any of t’s friends: • Give up (failing)/Forward to random people in u’s hometown • Else: • Pass the message to the friend of u closest to t, geographically
Result • Similar to Milgram • 18% (blue) or 80% (red) finish rate (vs. milgram: 30%) • Mean length 4 (blue)or 16(red):here they just reach his hometown (vs. milgram: 6) • Can we explain using Kleinberg’s model?
Looking at the geographical distribution of friendship links Avg. user has ~2.5 unif. random friends and ~5.5 geo distributed Mixture of power law (local connections) and uniformly-distributed long-range links. Difference in East and West coast link probabilities Problem: Kleinberg’s paper predicts that short paths are not locally findable with Prob(uv) = 1/Z d(u,v) -1.2
Improved model • Basic intuition: • need to account for population density • ranku (v) = |{w: d(uw)< d(uv)}| • New probability • Pr(random (uv) edge) ranku (v)-r • For 2d-grid, using r=1 is same as using exponent 2 in Kleinberg’s model
Fitting data • Under rank model, optimum exponent = 1 • Observed exponent ~1.2
Group structure based models b=3 • Nodes belong to multiple foci, probability of edge depends on size of smallest common foci (q) • Pr(edge) q-r Theorem: If r = 1 and outdegree is polylogarithmic, can search in O(log n) Individuals classified into hierarchies. Theorem: If a = 1 and outdegree is polylogarithmic, can search in O(log n) hij = height of lowest common ancestor h [Kleinberg, 2001]
Why do they look this way? • Why would networks have navigability as a property? • Caveats: not clear how universal it is • Is there an evolutionary model? • Sandberg & Clarke [2007]: • Start with a grid + uniform random edges • Choose a (s,t) pair uniformly, start routing locally from s • With some prob, each node in path rewires long range link to directly point to t • Simulation shows that it reaches r=1 for 1d-grid
Client-server vs. p2p Systems In p2p - Files can be located anywhere on the network. - Nodes join and leave - Central repository is not present - BitTorrent, Kazaa, Gnutella - How can we quickly locate where a file is?
Chord • K files are assigned to N peers • Need to hash for uniform load • How can we design a hash table when nodes (i.e. hash buckets) can join and leave • Do not want to rehash all keys: O(K) work • Great idea: consistent hashing • Karger, Lehman, Leighton, Panigrahy, Levine, Lewin ‘97.
Consistent Hashing • Hash both nodes and files into m-bit IDs and arrange them in a circle • Each file stored in the node following it • If any node joins or leaves, expected file location changes = O(K/N)
Finding a file • Each node keeps a list of s neighbors following it • If neighbors are all active, just forward the lookup until we reach potential location • Expected steps O(N)
Faster Search • Each node n keeps a list of fingers • For eachi, corresponding finger is minm node in [n+2i-1, n+2i] • Similar to “long range links” in small world models
Faster Search • Use the fingers to make as much progress as possible without overshooting • In each step, we halve the distance • Expected steps = O(log N) • Technical issues related to how to update fingers, successors when nodes leave, join etc….
Power laws • x-axis: degree • y-axis: what fraction of nodes have this degree • Degree distribution is very different from what is expected for random graphs • Quite a few nodes with very high degree, lot of nodes with small degree
Log log plot • Log-log axis: both x and y axis are in log • x-axis: degree • y-axis: what fraction of nodes with this degree • Shows fitted best line log(y) = A - c log(x) y = Bx-c , y x-c
Structure of the web: power laws (Broder et al.)
Exponential vs. power law [slide courtesy Leskovec]
Other power laws • Pareto 1897 – wealth distribution • Lotka 1926 – scientific output • Yule 1920s – species in a genera • Zipf 1940s – word frequency • Simon 1950s – city population
Why are they surprising? • Because they do not follow the intuition of Central Limit Theorem • For instance, one possible way to model degree is that each node decides to link with fixed probability • X = X1 + X2 + …. + Xn, X’s are i.i.d. • This would give normal distribution in limit • Pr(X = k) normal