280 likes | 551 Views
Ch. 20: The Small-World Phenomenon. Easley & Kleinberg: Networks, Crowds and Markets Arthur Schram. Recall: 6 Degrees of Separation. Stanley Milgram (1967): Letter : Randomly chosen ‘starters’ fixed target ( provided target’s name, address , occupation , etc )
E N D
Ch. 20: The Small-World Phenomenon Easley & Kleinberg: Networks, Crowds and Markets Arthur Schram
Recall: 6 Degrees of Separation • Stanley Milgram(1967): • Letter : • Randomlychosen ‘starters’ fixed target • (providedtarget’s name, address, occupation, etc) • Each participant couldonly pass on based on first name basis • 33% of the letters arrived • Median: 6 steps
Goal • Tworemarkablefacts: • Shorts pathsexist • Short paths are found need not be the case Goals of this chapter: Develop models for both of these principles
Models of Short Paths Recall: short paths compatible with intuition (100 friends example) How realistic is this (your friends are not friends)?
Models of Short Paths But social networks consist of triangles (mutual friends): Triadic closure limits # people you can reach in short paths.
Watts-Strogatz Model We are looking for model that combines many triads with short paths. Duncan Watts & Steve Strogatz: combine homophily with weak ties (i.e., interconnected triangles) Assume everyone lives in 2D grid (defining proximity as one horizontal or vertical grid-step) Create network homophily link to all nodes within radius r weak ties link to k other nodes uniformly at random
Watts-Strogatz Model • Resultingnetwork has manytriangles • Take starting node v and first useonlyweakties. • (no triadicclosureamonst these) • Verymanynodes in few steps • NB: even 1 weak tie out of every k nodes does the trick. • There is also a one-dimensionalversion
Models of Finding Short Paths • Question addressed: canone construct random networkwheredecentralized routing finds ‘short’ paths? • Assumethatnodesonlyknowedges out of self • Node s knowsit must sendmessageto t and must decidetowhomto pass on first. • Eavluate different procedures based on delivery time (average over random s, t, andweakties) • Watts-Strogatz model has long delivery times. • (intuition: weakties are too random, notbased on similarityamongnodes)
Models of Finding Short Paths Consider observed paths in Milgram experiment • the weak tie used strongly depends on where the node using it lies. • Weak ties need to span various ranges • this needs to be captured in the model
Generalizing the Watts-Strogatz Model • Addscaleof long-range weakties: • each node still has • k weakties • edgestoeach node within r steps • but now: • random generation of weak tie between v and w generatedbyprobability (proportionalto) d(v,w)-q • (original model: q=0) • If q is ‘too small’: long-range links are “too random” • If q is ‘too large’: long-range links are “not random enough” (do notallowyouto make large enough steps)
Generalizing the Watts-Strogatz Model What is optimal q? Simulation results: • It seems: q=2, so • generate weak tie with prob ~ d(v,w)-2 • Why? Think of organizing nodes R2 relative to v depending on distance [0,d), [d,2d), [2d,4d) etc. Consider number of nodes in [d,2d] radius grows by d area grows like d2 # nodes proportional to d2
Why d(v,w)-2? • # nodes with distance [d,2d) proportional to d2 • If probability that v links to any one node in this area is d(v,w)-2, • then probability of linking to some node in area at distance d is independent of d • long-range weak ties are formed that spread uniformly over all scales of distance • people can forward message in a way to systematically reduce distance to target • N.B. exponent 2 is directly related to dimensionality of grid
One-dimensional Case Random weak links with probability proportional to d(v,w)-1 A search strategy that works: myopic search: Node v holding the message passes it on to contact as close to destination as possible (alternative: send to someone with many links) ‘long-range contact’ ‘local contact’ For example: path from a to i: a d e f h i NB: this is not the shortest path! (a b h i) But: in expectation paths are short!
Short Paths • Problem at hand: • generate random networkwithsuchweakties • choose random startingand target nodes • number of steps neededbymyopic search is X • Show that E(X) is ‘small’ • strategy: track # steps neededto halve distance • Define: message is in phase j ifitsdistancefrom target is between 2jand 2j+1 The number of phases is at most log2n (from 2j=n) E (X) = E(X1) + … + E(Xlog2n)
Normalizing Probabilities Recall: random weak links with probability proportional to d(v,w)-1;constant of proportionality determined by prob=1. Define Z ≡ d(v,w)-1; then P(v links to w) = (1/Z)d(v,w)-1 Simple calculus shows: P(v links to w) > (d(v,w)-1)/(2log2n) Next question: how much time is spent in phase j?
How much Time in Phase j? Assume message is at node v in phase j: Phase ends when dist < 2j One way: v’s weak tie is at < d/2 from t Probability that one of nodes at < d/2 from t is weak tie > 1/(3log2n)
How much Time in Phase j? Prob(weak tie at < d/2 from t) > 1/(3log2n) Prob(Xj>i) < (1- 1/(3log2n))i-1 Therefore: E(Xj) = Pr{Xj>1} + Pr{Xj>2} + Pr{Xj>3} + …. < 1 + (1- 1/(3log2n))+ (1- 1/(3log2n))2 + (1- 1/(3log2n))3 + … (geometric sum) = 3log2n So: E(X) = E(X1) + … + E(Xlog2n) < (3log2n)2.
Myopic Search Finds Short Paths So: E(X) < (3log2n)2. Myopic search constructs a paththat is exponentially smallerthan # nodes (easy to extend to two dimensions)
Other q We conclude: myopic search is efficient for q = 1 But: is q=1 most efficient? If probability of tie weakly depends on distance (q small): one can create area around t that is hard to penetrate. If q is large: typically only small steps are made: takes a long time to cross a distance
Empirical Analysis Challenge: distribution of nodes is typically not uniform Possible solution: determine link probs not by distance but by rank For v, rank (w) = # nodes close to v than w Rank-based friendship: prob (link v-w) = rank(w)-p. For uniformly distributed nodes: linking with prob d-2 roughly corresponds to linking with prob rank(w)-1. More generally: p=1 yields efficient decentralized search. (Liben-Nowell et al. 2005) (some empirical support that this is how people link)
Social Foci Recall: social focus = any type of community, occupaation, etc., that serves to organize social life. Social Distance between v and w (dist(v,w)): smallest social focus that includes both. Make link v-w with prob dist(v,w)-p. Again: p=1 yields efficient decentralized search (Kleinberg 2001). Bottom line: model based on uniformly distributed nodes and geographical distance can easily be extended: efficient search is still possible.
One Last Thing In practice: high status targets more easily reached than low-status. Reason: core-periphery structure of networks. There is a need for richer models (w.r.t. link probabilities and search strategies) that can take this into account.