E N D
Social networks from the perspective of PhysicsJános Kertész1,2Jukka-Pekka Onnela2, Jari Saramäki2, Jörkki Hyvönen2, Kimmo Kaski2, Jussi Kumpula2David Lazer3Gábor Szabó3,4, Albert-László Barabási3,41Budapest University of Technology and Economics, Hungary2Helsinki University of Technology, Finland3Harvard University4University of Notre Dame, USA
Outline 0. Introduction • Constructing the social network • Basic statistics • Granovetter’s hypothesis • Thresholding (percolation) • Spreading • Modeling • Conclusions
Introduction Complex systems: More input needed than mere interactions Forget about interactions Networks: Scaffold of complexity Useful to concentrate on the carrying NW structure (nodes and links): Holistic approach with very general statements Spectacular recent development: Abundance of data due to IT + new concepts
Introduction WEIGHTED NW-S Step toward reductionism: Interactions have different strength weights on links Weights: Fluxes (traffic or chemical reactions), correlation based networks, etc. (Often no negative weights, wij 0.) How to characterize weighted NW-s? E.g. STRENGTH of node i: si = j wij Intensity, coherence of subgraphs; clustering, motifs etc. (see: Onnela et al. PRE 71, 065103(R) (2005)
Introduction SOCIAL NW-S: Much has been taken from Sociology: betweennes, clustering, assortativity… Main method: Questionnaires (10 - 10 000) Weighted social nw-s: Strength of social relationships varies over wide range „I know him/her” „We are on first name basis” „We are friends” „We are good friends” „We are very good friends”… How to measure? Scale? Subjectivity?
Introduction Advantage of questionnaires: Ask whatever you are interested in. It enables complex studies, multi-factor analyses. Disadvantage: Difficulty in quantification and subjectivity E.g., AddHealth: Quantification of tie strength by number of joint activities Mutuality test fails very often M.Gonzales et al.Physica A 379, 307-316. (2007) Alternative approach: Use communication databases (email, phone etc)
Outline 0. Introduction • Constructing the social network • Basic statistics • Granovetter’s hypothesis • Thresholding (percolation) • Spreading • Modeling • Conclusions
Constructing the Network • Use a network constructed from mobile phone calls as a proxy for a social network • In the network: Nodes individuals Links voice calls • Link weights: • Number of calls • Total call duration (time & money)
15 min X X 20 min Y Y 5 min Constructing the Network • Over 7 million private mobile phone subscriptions • Focus: voice calls within the home operator • Data aggregated from a period of 18 weeks • Require reciprocity (XY AND YX) for a link • Customers are anonymous (hash codes) • Data from an European mobile operator
Outline 0. Introduction • Constructing the social network • Basic statistics • Granovetter’s hypothesis • Thresholding (percolation) • Spreading • Modeling • Conclusions
Basic Statistics: Visualisation • Largest connected component dominates • 3.9M / 4.6M nodes • 6.5M / 7.0M links • Use it for analysis!
Basic Statistics: Distributions Vertex degree distribution Link weight distribution Fat tail Dunbarnumber(monkeysphere): max ~150 connections
Outline 0. Introduction • Constructing the social network • Basic statistics • Granovetter’s hypothesis • Thresholding (percolation) • Spreading • Modeling • Conclusions
Granovetter’s Weak Ties Hypothesis • Granovetter* suggests analysis of social networks as a tool for linking micro and macro levels of sociological theory • Considers the macro level implications of tie (micro level) strengths: • “The strength of a tie is a (probably linear) combination of the amount of time, the emotional intensity, the intimacy (mutual confiding), and the reciprocal services which characterize the tie.” • Formulates a hypothesis: • The relative overlap of two individual’s friendship networks varies directly with the strength of their tie to one another • Explores the impact of the hypothesis on, e.g. diffusion of information, stressing the cohesive power of weak ties • * M. Granovetter, The Strength of Weak Ties, • The American Journal of Sociology78, 1360-1380, 1973.
Granovetter’s Weak Ties Hypothesis • Hypothesis based on theoretical work and some direct evidence • Present network is suitable for testing the hypothesis: • (i) Call durations time commitment tie strength • (ii) Call durations monetary commitment tie strength • (iii) Largest weighted social network so far • (Problem: Other factors, such as emotional intensity or reciprocal services?) • What is the coupling between network topology and link weights? • Consider two connected nodes. We would like to characterize their relative neighborhood overlap, i.e. proportion of common friends • This leads naturally to link neighborhood overlap
Overlap • Definition: relative neighborhood overlap (topological) • where the number of triangles around edge (vi, vj) is nij • Illustration of the concept:
Empirical Verification • Let <O>w denote Oij averaged over a bin of w-values • Use cumulative link weight distribution: • (the fraction of links with weights less than w’) • Relative neighbourhood overlap increases as a function of link weight • Verifies Granovetter’s hypothesis (~95%) • (Exception: Top 5% of weights) • Blue curve: empirical network • Red curve: weight randomised network
Local Implications • Implication for strong links? Neighbourhood overlap is high People form strongly connected communities • Implication for weak links? Neighbourhood overlap is low Communities are connected by weak links
A Piece of the Network weak links strong links community
Overlap Global optimization to transport would put high weights to links with high betweenness centrality (# passing shortest paths) In contrast, <O > decreases with b
High Weight Links? • Weak links: Strengh of both adjacent nodes (min & max) considerably higher than link weight • Strong links: Strength of both adjacent nodes (min & max) about as high as the link weight • Indication: High weight relationships clearly dominate on-air time of both, others negligible • Time ratio spent communicating with one other person converges to 1 at roughly w ≈ 104 • Consequence: Less time to interact with others • Explaining onset of decreasing trend for <O>w
Outline 0. Introduction • Constructing the social network • Basic statistics • Granovetter’s hypothesis • Thresholding (percolation) • Spreading • Modeling • Conclusions
Thresholding Analysis: Introduction • Children’s approach: Break to learn! • We do this systematically using thresholding analysis: • Order the links by weight • Delete the links, one by one, based on their order • Control parameter f is the fraction of removed links • We can continuously interpolate, in either direction, between the initial connected network (f=0) and the set of isolated nodes (f=1) • We use two different thresholding schemes • (i) Increasing thresholding (remove low wij/Oij links first) • (ii) Descending thresholding (remove high wij/Oij links first) • Question: How does the network respond to link removal? • How similar is the response to wij and Oij driven thresholding?
Thresholding • Initial connected network (f=0) • All links are intact, i.e. the network is in its initial stage
Thresholding • Increasing weight thresholded network (f=0.8) • 80% of the weakest links removed, strongest 20% remain
Thresholding • Initial connected network (f=0) • All links are intact, i.e. the network is in its initial stage
Thresholding • Decreasing weight thresholded network (f=0.8) • 80% of the strongest links removed, weakest 20% remain
Thresholding • We will study, as a function of the control parameter f, the following: • Order parameter (size of the largest component) • “Susceptibility” (average size of other components) • Average path lengths (in LCC) • Average clustering coefficient in the LCC
Thresholding: Size of Largest Component • RLCC is the fraction of nodes in the largest connected component • LCC is able to sustain its integrity for moderate values of f • Least affected by removal of high Oij links (in tight communities) • Most affected by removal of low Oij links (between communities) • Difference between removal of low and high wij links is small, but LCC breaks earlier if weak links are removed (Granovetter) • Very few links are required for global connectivity remove low first remove high first (c)
Thresholding: Size of Other Components • Collapse for different values of f, but what is its nature? • “Susceptibility” (average cluster size excl. LCC) • ns is the number of clusters with s nodes • Percolation theory: S→∞ as f→fc • Finite signature of divergence: fc ≈ 0.60 (incr. o.) fc ≈ 0.82 (incr. w.) • Demarcation between weak and strong links given by fc ≈ 0.82 • Qualitatively different role for weak and strong links remove low first remove high first (c)
Outline 0. Introduction • Constructing the social network • Basic statistics • Granovetter’s hypothesis • Thresholding (percolation) • Diffusion of infromation • Modeling • Conclusions
Diffusion of information Knowledge of information diffusion based on unweighted networks Use the present network to study diffusion on a weighted network: Does the local relationship between topology and tie strength have an effect? Spreading simulation: infect one node with new information (1) Empirical:pij wij (2) Reference:pij <w> Spreading significantly faster on the reference (average weight) network Information gets trapped in communities in the real network Reference Empirical
Empirical Reference Diffusion of information • Where do individuals get their information? Majority of infections through (1) Empirical: ties of intermediatestrength (2) Reference: (would be) weak ties • Both weak and strong ties have a diminishing role as information sources: The weakness of weak and strong ties
Diffusion of information - Start spreading 100 times (large red node) - Information flows differently due to the local organizational principle (1) Empirical: information flows along a strong tie backbone (2) Reference: information mainly flows along the shortest paths Best search results: Reach out of your own community Empirical Reference
Outline 0. Introduction • Constructing the social network • Basic statistics • Granovetter’s hypothesis • Thresholding (percolation) • Spreading • Modeling • Conclusions
Modeling • What is all this good for? • Understanding structure and mechanisms of the society • Improving spreading of news and opinions • (Developing marketing strategies and other tools of mass manipulation) • MODELING needed
Modeling Needed: Weighted network model, which reflects the observations with possibly limited input Links created by random encounterson acquaintance basis Weights generated by one-to-one activities (phone calls) Take into account the different time scales: Encounter (call) frequency Lifetime of relationships Lifetime of nodes treated together
Modeling imeets jwith prob. wij , who meetskwith prob. wjk. If k is a common friend wij, wjkwkiare increased by (a). If k is not connected to i,wik = w0( = 1) is created with probability p(b). With prob. pr new links with w0 weight are created (c). With prob. pd a node with all links is deleted and a new one is born with no links.
Microscopic rules in the model Summary of the model Weighted local search for new acquaintances Reinforcement of existing (popular) links Unweighted global search for new acquaintances Node removal, exp.link & weight lifetimes:<τ>=2 <τw>=(pd)-1 Model parameters δFreeweight reinforcement parameter pr = 10-3 Sets the time scale of the model < τN > =1/pd (average node lifetime of 1000 time steps) pr = 5×10-4 Global connections; results not sensitive for it (one random link per node during 1000 time steps) pΔAdjusted in relation toδto keep <k> constant (structure changes due to only link re-organisations)
Social network model Communities with dense & strong internal and sparse & weak external connections (cf. phone network) No communities Communities start nucleating Communities forming Samples of N=105 networkfor variable weight-increaseδ Tie strength: weak →intermediate →strong tie
Communities by inspection Average number of links constant: <L> = N <k>/2 (<k> ≈ 10 ) => All changes in structure due to re-organisation of links Increasing δtraps search in communities, further enhancing trapping effect => Clear communities form Triangles accumulate weight and act as nuclei for communities to emerge δ= 0 δ= 0.1 δ= 0.5 δ= 1
Communities by k-clique method k-clique algorithm as definition for communities* Focus on 4-cliques (smallest non-trivial cliques) Relative largest community size Rk=4 [0,1] Average community size <ns> (excl. largest) Observe clique percolation through the system for small δ Increasing δleads to condensation of communities * G. Palla et al., “Uncovering the overlapping community structure...”, Nature 435, 814 (2005)
Global consequences Model network Phone network Ascending & Descending Ascending link removal Descending link removal 0 Fraction of links, f 1 f f Phase transition for ascending tie removal (weaker first)
Modeling • The model fulfills essential criteria of social nw-s: • Broad (but not scale free degree) distribution • Assortative mixing (popular people attract each other) • High clustering: many triangles (by construction) • Community structure with strong links inside and weak ones between them
Outline 0. Introduction • Constructing the social network • Basic statistics • Granovetter’s hypothesis • Thresholding (percolation) • Spreading • Modeling • Conclusions
Discussion and Conclusion • Weak ties maintain network’s structural integrity; Strong ties maintain local communities; Intermediate ties mostly responsible for first-time infections • How can one efficiently search for information in a social network? ”Go out of your community!” • Social networks seem better suited to local processing than global transmission of information • Are there simple rules or mechanisms that lead to observed properties? • Efficient modeling possible Publications: J.-P. Onnela, et al. PNAS 104, 7332-7336 (2007) J.-P. Onnela, et al. New J. Phys. 9, 179 (2007) J.M. Kumpula, et al. PRL (to be published)www.phy.bme.hu/~kertesz/