170 likes | 305 Views
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks. It is a ‘small world’ after all Billions of people on Earth, yet every pair separated by “six degrees” of acquaintance relationships
E N D
Small World Networks • It is a ‘small world’ after all • Billions of people on Earth, yet every pair separated by “six degrees” of acquaintance relationships • Notion popularized by experimental psychologist Stanley Milgram’s, different from his more infamous experiment • Mathematically • Sparse – linear number of edges • Diameter - small like logarithm (log N) • Clustering is high – neighbors are neighbors
Small World = Small Diameter + Clustering • Defined by two measures: • characteristic path length L = number of edges in shortest path between two vertices, averaged over all vertex pairs • clustering coefficient C: • take vertex v with k 1 neighbors • at most k(k-1)/2 edges among neighbors • C(v) = fraction of k(k-1)/2 edges present • C = average clustering coefficient • C >> C_random, L L_random
The small world of the Web • Empirical study of Web-graph reveals small-world property • Sparse graph • Average distance (d) in simulated web: d = 0.35 + 2.06 log (n) e.g. n = 109, d ~= 19 • Diameter properties inferred from sampling • Calculation of max. diameter computationally demanding for large values of n • Clustering unknown
Implications for Web • Logarithmic scaling of diameter makes future navigation of web manageable • 10-fold increase of web pages results in only 2 more additional ‘clicks’, but … • Users may not take shortest path, may use bookmarks or just get distracted on the way • Search engines play a crucial role, how can they use this SW link structure?
Small World in Real World of Hollywood: The Kevin Bacon Game Goal: Connect any actor to Kevin Bacon, by linking actors who have acted in the same movie. Oracle of Bacon website uses Internet Movie Database (IMDB.com) to find shortest link between any two actors. Created by students at Univ. of Virginia http://oracleofbacon.org/ Boxed version of the Kevin Bacon Game
The Hollywood Network Total # of actors in database: ~550,000 Most actors are within three links of each other! Average path length to Kevin Bacon: 2.79 Actor closest to “center”: Rod Steiger (2.53) Rank of Kevin, in closeness to center: 876th Center of Hollywood?
Math Citation Network:Erdős Number Number of links required to connect scholars to Erdős, via co-authorship of papers Erdős wrote 1500+ papers with 507 co-authors. Jerry Grossman’s (Oakland Univ.) website allows mathematicians to compute their Erdos numbers: http://www.oakland.edu/enp/ Connecting path lengths, among mathematicians only: • average is 4.65 • maximum is 13 Paul Erdős (1913-1996)
Arny Rosenberg Fred Annexstein Fan Chung My number is 3 Erdős - Erdős and Renyi showed that average path length between connected nodes in a random graph is logarithmic - But degree sequences in social networks like Web and Hollywood are not Poisson - Back to Power-laws
Classes of small-world networks • Single-scale: Connectivity distribution decays exponentially (e.g., Poisson and random graphs) • Scale-free: Power-law distribution of connectivity over entire range • Broad-scale: Power-law over “broad range” + abrupt cut-off
Bow-tie Structure of Web • A large scale study (Altavista crawls) reveals another interesting property of web – “symmetric asymmetry” • Study of 200 million nodes & 1.5 billion links • Small-world property not applicable to entire web • Some parts unreachable • Others have long paths • Power-law connectivity holds though • Page indegree ( = 2.1), outdegree ( = 2.72)
Bow-tie Components • Strongly Connected Component (SCC) • Core with small-world property • Upstream (IN) • Core can’t reach IN • Downstream (OUT) • OUT can’t reach core • Disconnected (Tendrils)
Component Properties • Each component is roughly same size • ~50 million nodes • Tendrils not connected to SCC • But reachable from IN and can reach OUT • Tubes: directed paths IN->Tendrils->OUT • Disconnected components • Maximal and average diameter is infinite
Empirical Numbers for Bow-tie • Maximal minimal (?) diameter • 28 for SCC, 500 for entire graph • Probability of a path between any 2 nodes • ~1 quarter (0.24) • Average length • 16 (directed path exists), 7 (undirected) • Shortest directed path between 2 nodes in SCC: 16-20 links on average
Next Time:Models for the Web Graph • Stochastic models that can explain or at least partially reproduce the properties of the web graph. Goals of model • power law distribution properties • maintain the small world property • bow-tie structure