The Logarithmic Dimension Hypothesis

MITACS International Problem Solving Workshop July 2012 The Logarithmic Dimension Hypothesis Anthony Bonato Ryerson University Log Dimension Hypothesis

Workshop team • David Gleich(Purdue) • Dieter Mitsche(Ryerson) • Stephen Young (UCSD) • Myunghwan Kim (Stanford) • Amanda Tian(York) Log Dimension Hypothesis

Friendship networks • network of friends (some real, some virtual) form a large web of interconnected links Log Dimension Hypothesis

6 degrees of separation • (Stanley Milgram, 67): famous chain letter experiment Log Dimension Hypothesis

6 Degrees in Facebook? • 900 million users, > 70 billion friendship links • (Backstrom et al., 2012) • 4 degrees of separation in Facebook • when considering another person in the world, a friend of your friend knows a friend of their friend, on average • similar results for Twitter and other OSNs Log Dimension Hypothesis

Complex Networks • web graph, social networks, biological networks, internet networks, … Log Dimension Hypothesis

nodes: web pages edges: links over 1 trillion nodes, with billions of nodes added each day The web graph Log Dimension Hypothesis

On-line Social Networks (OSNs)Facebook, Twitter, LinkedIn, Google+… Log Dimension Hypothesis

Key parameters • degree distribution: • average distance: • clustering coefficient: Log Dimension Hypothesis

Properties of Complex Networks • power law degree distribution (Broder et al, 01) Log Dimension Hypothesis

Power laws in OSNs (Mislove et al,07): Log Dimension Hypothesis

Small World Property • small world networks (Watts & Strogatz,98) • low distances • diam(G) = O(log n) • L(G) = O(loglog n) • higher clustering coefficient than random graph with same expected degree Log Dimension Hypothesis

Sample data: Flickr, YouTube, LiveJournal, Orkut • (Mislove et al,07): short average distances and high clustering coefficients Log Dimension Hypothesis

Community structure • (Zachary, 72) • (Mason et al, 09) • (Fortunato, 10) • (Li, Peng, 11): • small community • property Log Dimension Hypothesis

(Leskovec, Kleinberg, Faloutsos,05): • densification power law: average degree is increasing with time • decreasing distances • (Kumar et al, 06): observed in Flickr, Yahoo! 360 Log Dimension Hypothesis

Geometry of OSNs? • OSNs live in social space: proximity of nodes depends on common attributes (such as geography, gender, age, etc.) • IDEA: embed OSN in 2-, 3- or higher dimensional space Log Dimension Hypothesis

Dimension of an OSN • dimension of OSN: minimum number of attributes needed to classify nodes • like game of “20 Questions”: each question narrows range of possibilities • what is a credible mathematical formula for the dimension of an OSN? Log Dimension Hypothesis

Geometric model for OSNs • we consider a geometric model of OSNs, where • nodes are in m-dimensional Euclidean space • threshold value variable: a function of ranking of nodes Log Dimension Hypothesis

Geometric Protean (GEO-P) Model(Bonato, Janssen, Prałat, 12) • parameters: α, β in (0,1), α+β < 1; positive integer m • nodes live in m-dimensional hypercube (torus metric) • each node is ranked 1,2, …, n by some function r • 1 is best, n is worst • we use random initial ranking • at each time-step, one new node v is born, one randomly node chosen dies (and ranking is updated) • each existing node u has a region of influence with volume • add edge uv if v is in the region of influence of u Log Dimension Hypothesis

Notes on GEO-P model • models uses both geometry and ranking • number of nodes is static: fixed at n • order of OSNs at most number of people (roughly…) • top ranked nodes have larger regions of influence Log Dimension Hypothesis

Simulation with 5000 nodes Log Dimension Hypothesis

Simulation with 5000 nodes random geometric GEO-P Log Dimension Hypothesis

Properties of the GEO-P model (Bonato, Janssen, Prałat, 2012) • a.a.s. the GEO-P model generates graphs with the following properties: • power law degree distribution with exponent b = 1+1/α • average degree d =(1+o(1))n(1-α-β)/21-α • densification • diameter D = O(nβ/(1-α)m log2α/(1-α)m n) • small world: constant order if m= Clog n • clustering coefficient larger than in comparable random graph Log Dimension Hypothesis

Spectral properties • the spectral gapλ of G is defined by the difference between the two largest eigenvalues of the adjacency matrix of G • for G(n,p) random graphs, λtends to0 as order grows • in the GEO-P model, λis close to 1 • (Estrada, 06): bad spectral expansion in real OSN data Log Dimension Hypothesis

Dimension of OSNs • given the order of the network n, power law exponentb, average degree d, and diameterD, we can calculate m • gives formula for dimension of OSN: Log Dimension Hypothesis

6 Dimensions of Separation Log Dimension Hypothesis

Uncovering the hidden reality • reverse engineering approach • given network data (n, b, d, D), dimension of an OSN gives smallest number of attributes needed to identify users • that is, given the graph structure, we can (theoretically) recover the social space Log Dimension Hypothesis

Logarithmic Dimension Hypothesis • Logarithmic Dimension Hypothesis (LDH): the dimension of an OSN is best ﬁt by about log n, where n is the number of users OSN • theoretical evidence GEO-P and MAG (Leskovec, Kim,12) models • empirical evidence? • (Sweeney, 2001) Log Dimension Hypothesis

Experimental design • supervised machine learning • Alternating Decision Trees (ADT) • approach of (Janssen et al, 12+) based on earlier work on PIN by (Middendorf et al, 05) • classify OSN data vs simulated graphs from GEO-P model in various dimensions • develop a feature vector (graphlets, degree distribution percentiles, average distance, etc) to classify the correct dimension • ADT will classify which dimension best fits the data • cross-validation and robustness testing Log Dimension Hypothesis

Example Log Dimension Hypothesis

preprints, reprints, contact: search: “Anthony Bonato” Log Dimension Hypothesis

Log Dimension Hypothesis

The Logarithmic Dimension Hypothesis

The Logarithmic Dimension Hypothesis

Presentation Transcript

THE MARITIME DIMENSION

THE TIME DIMENSION

Formulating the Hypothesis

The territorial dimension

The EU Dimension

The HYPOTHESIS

The Human Dimension

The Ethical Dimension:

The attraction hypothesis

The Human Dimension

The Natural Logarithmic Function

The Fourth Dimension

The hypothesis

The Catharsis Hypothesis

The Auditory Dimension

The Military Dimension

The Religious Dimension

The hypothesis

The Natural Logarithmic Function

The Acoustic Dimension

The Logarithmic Function

THE TIME DIMENSION