1 / 16

CS 728 Lecture 4 It’s a Small World on the Web

CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks. It is a ‘small world’ after all Billions of people on Earth, yet every pair separated by “six degrees” of acquaintance relationships

kana
Download Presentation

CS 728 Lecture 4 It’s a Small World on the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 728Lecture 4It’s a Small World on the Web

  2. Small World Networks • It is a ‘small world’ after all • Billions of people on Earth, yet every pair separated by “six degrees” of acquaintance relationships • Notion popularized by experimental psychologist Stanley Milgram’s, different from his more infamous experiment • Mathematically • Sparse – linear number of edges • Diameter - small like logarithm (log N) • Clustering is high – neighbors are neighbors

  3. Small World = Small Diameter + Clustering • Defined by two measures: • characteristic path length L = number of edges in shortest path between two vertices, averaged over all vertex pairs • clustering coefficient C: • take vertex v with k  1 neighbors • at most k(k-1)/2 edges among neighbors • C(v) = fraction of k(k-1)/2 edges present • C = average clustering coefficient • C >> C_random, L  L_random

  4. The small world of the Web • Empirical study of Web-graph reveals small-world property • Sparse graph • Average distance (d) in simulated web: d = 0.35 + 2.06 log (n) e.g. n = 109, d ~= 19 • Diameter properties inferred from sampling • Calculation of max. diameter computationally demanding for large values of n • Clustering unknown

  5. Implications for Web • Logarithmic scaling of diameter makes future navigation of web manageable • 10-fold increase of web pages results in only 2 more additional ‘clicks’, but … • Users may not take shortest path, may use bookmarks or just get distracted on the way • Search engines play a crucial role, how can they use this SW link structure?

  6. Small World in Real World of Hollywood: The Kevin Bacon Game Goal: Connect any actor to Kevin Bacon, by linking actors who have acted in the same movie. Oracle of Bacon website uses Internet Movie Database (IMDB.com) to find shortest link between any two actors. Created by students at Univ. of Virginia http://oracleofbacon.org/ Boxed version of the Kevin Bacon Game

  7. The Hollywood Network Total # of actors in database: ~550,000 Most actors are within three links of each other! Average path length to Kevin Bacon: 2.79 Actor closest to “center”: Rod Steiger (2.53) Rank of Kevin, in closeness to center: 876th Center of Hollywood?

  8. Math Citation Network:Erdős Number Number of links required to connect scholars to Erdős, via co-authorship of papers Erdős wrote 1500+ papers with 507 co-authors. Jerry Grossman’s (Oakland Univ.) website allows mathematicians to compute their Erdos numbers: http://www.oakland.edu/enp/ Connecting path lengths, among mathematicians only: • average is 4.65 • maximum is 13 Paul Erdős (1913-1996)

  9. Arny Rosenberg Fred Annexstein Fan Chung My number is 3 Erdős - Erdős and Renyi showed that average path length between connected nodes in a random graph is logarithmic - But degree sequences in social networks like Web and Hollywood are not Poisson - Back to Power-laws

  10. Classes of small-world networks • Single-scale: Connectivity distribution decays exponentially (e.g., Poisson and random graphs) • Scale-free: Power-law distribution of connectivity over entire range • Broad-scale: Power-law over “broad range” + abrupt cut-off

  11. Bow-tie Structure of Web • A large scale study (Altavista crawls) reveals another interesting property of web – “symmetric asymmetry” • Study of 200 million nodes & 1.5 billion links • Small-world property not applicable to entire web • Some parts unreachable • Others have long paths • Power-law connectivity holds though • Page indegree ( = 2.1), outdegree ( = 2.72)

  12. Bow-tie Components • Strongly Connected Component (SCC) • Core with small-world property • Upstream (IN) • Core can’t reach IN • Downstream (OUT) • OUT can’t reach core • Disconnected (Tendrils)

  13. Component Properties • Each component is roughly same size • ~50 million nodes • Tendrils not connected to SCC • But reachable from IN and can reach OUT • Tubes: directed paths IN->Tendrils->OUT • Disconnected components • Maximal and average diameter is infinite

  14. Empirical Numbers for Bow-tie • Maximal minimal (?) diameter • 28 for SCC, 500 for entire graph • Probability of a path between any 2 nodes • ~1 quarter (0.24) • Average length • 16 (directed path exists), 7 (undirected) • Shortest directed path between 2 nodes in SCC: 16-20 links on average

  15. Next Time:Models for the Web Graph • Stochastic models that can explain or at least partially reproduce the properties of the web graph. Goals of model • power law distribution properties • maintain the small world property • bow-tie structure

More Related