1 / 32

The Logarithmic Dimension Hypothesis

MITACS International Problem Solving Workshop July 2012. The Logarithmic Dimension Hypothesis. Anthony Bonato Ryerson University. Workshop team. David Gleich (Purdue) Dieter Mitsche (Ryerson) Stephen Young (UCSD) Myunghwan Kim (Stanford) Amanda Tian (York). Friendship networks.

kamin
Download Presentation

The Logarithmic Dimension Hypothesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MITACS International Problem Solving Workshop July 2012 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University Log Dimension Hypothesis

  2. Workshop team • David Gleich(Purdue) • Dieter Mitsche(Ryerson) • Stephen Young (UCSD) • Myunghwan Kim (Stanford) • Amanda Tian(York) Log Dimension Hypothesis

  3. Friendship networks • network of friends (some real, some virtual) form a large web of interconnected links Log Dimension Hypothesis

  4. 6 degrees of separation • (Stanley Milgram, 67): famous chain letter experiment Log Dimension Hypothesis

  5. 6 Degrees in Facebook? • 900 million users, > 70 billion friendship links • (Backstrom et al., 2012) • 4 degrees of separation in Facebook • when considering another person in the world, a friend of your friend knows a friend of their friend, on average • similar results for Twitter and other OSNs Log Dimension Hypothesis

  6. Complex Networks • web graph, social networks, biological networks, internet networks, … Log Dimension Hypothesis

  7. nodes: web pages edges: links over 1 trillion nodes, with billions of nodes added each day The web graph Log Dimension Hypothesis

  8. On-line Social Networks (OSNs)Facebook, Twitter, LinkedIn, Google+… Log Dimension Hypothesis

  9. Key parameters • degree distribution: • average distance: • clustering coefficient: Log Dimension Hypothesis

  10. Properties of Complex Networks • power law degree distribution (Broder et al, 01) Log Dimension Hypothesis

  11. Power laws in OSNs (Mislove et al,07): Log Dimension Hypothesis

  12. Small World Property • small world networks (Watts & Strogatz,98) • low distances • diam(G) = O(log n) • L(G) = O(loglog n) • higher clustering coefficient than random graph with same expected degree Log Dimension Hypothesis

  13. Sample data: Flickr, YouTube, LiveJournal, Orkut • (Mislove et al,07): short average distances and high clustering coefficients Log Dimension Hypothesis

  14. Community structure • (Zachary, 72) • (Mason et al, 09) • (Fortunato, 10) • (Li, Peng, 11): • small community • property Log Dimension Hypothesis

  15. (Leskovec, Kleinberg, Faloutsos,05): • densification power law: average degree is increasing with time • decreasing distances • (Kumar et al, 06): observed in Flickr, Yahoo! 360 Log Dimension Hypothesis

  16. Geometry of OSNs? • OSNs live in social space: proximity of nodes depends on common attributes (such as geography, gender, age, etc.) • IDEA: embed OSN in 2-, 3- or higher dimensional space Log Dimension Hypothesis

  17. Dimension of an OSN • dimension of OSN: minimum number of attributes needed to classify nodes • like game of “20 Questions”: each question narrows range of possibilities • what is a credible mathematical formula for the dimension of an OSN? Log Dimension Hypothesis

  18. Geometric model for OSNs • we consider a geometric model of OSNs, where • nodes are in m-dimensional Euclidean space • threshold value variable: a function of ranking of nodes Log Dimension Hypothesis

  19. Geometric Protean (GEO-P) Model(Bonato, Janssen, Prałat, 12) • parameters: α, β in (0,1), α+β < 1; positive integer m • nodes live in m-dimensional hypercube (torus metric) • each node is ranked 1,2, …, n by some function r • 1 is best, n is worst • we use random initial ranking • at each time-step, one new node v is born, one randomly node chosen dies (and ranking is updated) • each existing node u has a region of influence with volume • add edge uv if v is in the region of influence of u Log Dimension Hypothesis

  20. Notes on GEO-P model • models uses both geometry and ranking • number of nodes is static: fixed at n • order of OSNs at most number of people (roughly…) • top ranked nodes have larger regions of influence Log Dimension Hypothesis

  21. Simulation with 5000 nodes Log Dimension Hypothesis

  22. Simulation with 5000 nodes random geometric GEO-P Log Dimension Hypothesis

  23. Properties of the GEO-P model (Bonato, Janssen, Prałat, 2012) • a.a.s. the GEO-P model generates graphs with the following properties: • power law degree distribution with exponent b = 1+1/α • average degree d =(1+o(1))n(1-α-β)/21-α • densification • diameter D = O(nβ/(1-α)m log2α/(1-α)m n) • small world: constant order if m= Clog n • clustering coefficient larger than in comparable random graph Log Dimension Hypothesis

  24. Spectral properties • the spectral gapλ of G is defined by the difference between the two largest eigenvalues of the adjacency matrix of G • for G(n,p) random graphs, λtends to0 as order grows • in the GEO-P model, λis close to 1 • (Estrada, 06): bad spectral expansion in real OSN data Log Dimension Hypothesis

  25. Dimension of OSNs • given the order of the network n, power law exponentb, average degree d, and diameterD, we can calculate m • gives formula for dimension of OSN: Log Dimension Hypothesis

  26. 6 Dimensions of Separation Log Dimension Hypothesis

  27. Uncovering the hidden reality • reverse engineering approach • given network data (n, b, d, D), dimension of an OSN gives smallest number of attributes needed to identify users • that is, given the graph structure, we can (theoretically) recover the social space Log Dimension Hypothesis

  28. Logarithmic Dimension Hypothesis • Logarithmic Dimension Hypothesis (LDH): the dimension of an OSN is best fit by about log n, where n is the number of users OSN • theoretical evidence GEO-P and MAG (Leskovec, Kim,12) models • empirical evidence? • (Sweeney, 2001) Log Dimension Hypothesis

  29. Experimental design • supervised machine learning • Alternating Decision Trees (ADT) • approach of (Janssen et al, 12+) based on earlier work on PIN by (Middendorf et al, 05) • classify OSN data vs simulated graphs from GEO-P model in various dimensions • develop a feature vector (graphlets, degree distribution percentiles, average distance, etc) to classify the correct dimension • ADT will classify which dimension best fits the data • cross-validation and robustness testing Log Dimension Hypothesis

  30. Example Log Dimension Hypothesis

  31. preprints, reprints, contact: search: “Anthony Bonato” Log Dimension Hypothesis

  32. Log Dimension Hypothesis

More Related