330 likes | 427 Views
MITACS International Problem Solving Workshop July 2012. The Logarithmic Dimension Hypothesis. Anthony Bonato Ryerson University. Workshop team. David Gleich (Purdue) Dieter Mitsche (Ryerson) Stephen Young (UCSD) Myunghwan Kim (Stanford) Amanda Tian (York). Friendship networks.
E N D
MITACS International Problem Solving Workshop July 2012 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University Log Dimension Hypothesis
Workshop team • David Gleich(Purdue) • Dieter Mitsche(Ryerson) • Stephen Young (UCSD) • Myunghwan Kim (Stanford) • Amanda Tian(York) Log Dimension Hypothesis
Friendship networks • network of friends (some real, some virtual) form a large web of interconnected links Log Dimension Hypothesis
6 degrees of separation • (Stanley Milgram, 67): famous chain letter experiment Log Dimension Hypothesis
6 Degrees in Facebook? • 900 million users, > 70 billion friendship links • (Backstrom et al., 2012) • 4 degrees of separation in Facebook • when considering another person in the world, a friend of your friend knows a friend of their friend, on average • similar results for Twitter and other OSNs Log Dimension Hypothesis
Complex Networks • web graph, social networks, biological networks, internet networks, … Log Dimension Hypothesis
nodes: web pages edges: links over 1 trillion nodes, with billions of nodes added each day The web graph Log Dimension Hypothesis
On-line Social Networks (OSNs)Facebook, Twitter, LinkedIn, Google+… Log Dimension Hypothesis
Key parameters • degree distribution: • average distance: • clustering coefficient: Log Dimension Hypothesis
Properties of Complex Networks • power law degree distribution (Broder et al, 01) Log Dimension Hypothesis
Power laws in OSNs (Mislove et al,07): Log Dimension Hypothesis
Small World Property • small world networks (Watts & Strogatz,98) • low distances • diam(G) = O(log n) • L(G) = O(loglog n) • higher clustering coefficient than random graph with same expected degree Log Dimension Hypothesis
Sample data: Flickr, YouTube, LiveJournal, Orkut • (Mislove et al,07): short average distances and high clustering coefficients Log Dimension Hypothesis
Community structure • (Zachary, 72) • (Mason et al, 09) • (Fortunato, 10) • (Li, Peng, 11): • small community • property Log Dimension Hypothesis
(Leskovec, Kleinberg, Faloutsos,05): • densification power law: average degree is increasing with time • decreasing distances • (Kumar et al, 06): observed in Flickr, Yahoo! 360 Log Dimension Hypothesis
Geometry of OSNs? • OSNs live in social space: proximity of nodes depends on common attributes (such as geography, gender, age, etc.) • IDEA: embed OSN in 2-, 3- or higher dimensional space Log Dimension Hypothesis
Dimension of an OSN • dimension of OSN: minimum number of attributes needed to classify nodes • like game of “20 Questions”: each question narrows range of possibilities • what is a credible mathematical formula for the dimension of an OSN? Log Dimension Hypothesis
Geometric model for OSNs • we consider a geometric model of OSNs, where • nodes are in m-dimensional Euclidean space • threshold value variable: a function of ranking of nodes Log Dimension Hypothesis
Geometric Protean (GEO-P) Model(Bonato, Janssen, Prałat, 12) • parameters: α, β in (0,1), α+β < 1; positive integer m • nodes live in m-dimensional hypercube (torus metric) • each node is ranked 1,2, …, n by some function r • 1 is best, n is worst • we use random initial ranking • at each time-step, one new node v is born, one randomly node chosen dies (and ranking is updated) • each existing node u has a region of influence with volume • add edge uv if v is in the region of influence of u Log Dimension Hypothesis
Notes on GEO-P model • models uses both geometry and ranking • number of nodes is static: fixed at n • order of OSNs at most number of people (roughly…) • top ranked nodes have larger regions of influence Log Dimension Hypothesis
Simulation with 5000 nodes Log Dimension Hypothesis
Simulation with 5000 nodes random geometric GEO-P Log Dimension Hypothesis
Properties of the GEO-P model (Bonato, Janssen, Prałat, 2012) • a.a.s. the GEO-P model generates graphs with the following properties: • power law degree distribution with exponent b = 1+1/α • average degree d =(1+o(1))n(1-α-β)/21-α • densification • diameter D = O(nβ/(1-α)m log2α/(1-α)m n) • small world: constant order if m= Clog n • clustering coefficient larger than in comparable random graph Log Dimension Hypothesis
Spectral properties • the spectral gapλ of G is defined by the difference between the two largest eigenvalues of the adjacency matrix of G • for G(n,p) random graphs, λtends to0 as order grows • in the GEO-P model, λis close to 1 • (Estrada, 06): bad spectral expansion in real OSN data Log Dimension Hypothesis
Dimension of OSNs • given the order of the network n, power law exponentb, average degree d, and diameterD, we can calculate m • gives formula for dimension of OSN: Log Dimension Hypothesis
6 Dimensions of Separation Log Dimension Hypothesis
Uncovering the hidden reality • reverse engineering approach • given network data (n, b, d, D), dimension of an OSN gives smallest number of attributes needed to identify users • that is, given the graph structure, we can (theoretically) recover the social space Log Dimension Hypothesis
Logarithmic Dimension Hypothesis • Logarithmic Dimension Hypothesis (LDH): the dimension of an OSN is best fit by about log n, where n is the number of users OSN • theoretical evidence GEO-P and MAG (Leskovec, Kim,12) models • empirical evidence? • (Sweeney, 2001) Log Dimension Hypothesis
Experimental design • supervised machine learning • Alternating Decision Trees (ADT) • approach of (Janssen et al, 12+) based on earlier work on PIN by (Middendorf et al, 05) • classify OSN data vs simulated graphs from GEO-P model in various dimensions • develop a feature vector (graphlets, degree distribution percentiles, average distance, etc) to classify the correct dimension • ADT will classify which dimension best fits the data • cross-validation and robustness testing Log Dimension Hypothesis
Example Log Dimension Hypothesis
preprints, reprints, contact: search: “Anthony Bonato” Log Dimension Hypothesis