440 likes | 518 Views
Temporal aspects of social behavior from mobile phone data. Dynamics of interactions in a large communication network. János Kertész Budapest University of Technology and Economics and Aalto University, Helsinki.
E N D
Temporal aspects of social behavior from mobile phone data Dynamics of interactions in a large communication network János Kertész Budapest University of Technology and Economics and Aalto University, Helsinki Márton Karsai, Mikko Kivelä, Raj Pan, Jari Saramäki, Kimmo Kaski, Albert-László Barabási In collaboration with Support: EU FP7 FET-Open 238597, FiDiPro, OTKA
Transmission on a linear chain VERY slow
May 29, 2003: calls to the Hungarian Central Office for Combatting Catastrophies that people obtained sms messages like: „Large nuclear accident in Paks Stay home, close doors-windows, don’t eat lettuce” In reality nothing happened. Police investigation revealed that a nurse had heared children talking in the kindergarden about such things (who heared them from their parents as a rumor); she called her relatives, the „information” reached a journalist, who started an sms campaign... Was spreading fast or slow?
Outline • Spreading phenomena in complex networks, • small world • Mobile phone call network: A proxy for the social • network. Nodes, links and weights • The Granovetterian structure of the society • Event list and modeling spreading • Different sources of correlations: • topology • weight-topology • daily (weekly) patterns • burstiness • link-link • Differentiating between contributions to spreading • Burst statistics • Summary
Spreading phenomena in networks • epidemics (bio- and computer) • rumors, information, opinion • innovations • etc. • Nodes of a network can be: • Susceptible • Infected • Recovered (immune) • Corresponding models: SI, SIR, SIS... • Important: speed of spreading (SLOW)
Spreading curve (SI) Late m(t)=Ninf /Ntot Intermediate Early
Spreading in the society Small world property; “Six Degrees of Separation”, Erdős number, WWW, collaboration network, Kevin Bacon game etc. Not only social nw-s: Internet, genetic transcription, etc. In many networks the average distance btw two arbitrary nodes is small (grows at most log with system size). Distance: length of shortest path btw two nodes
Small world: fast spreading? • There are short, efficient paths. Are they used? • Needed information: • Structure of the society: Network at the societal level • Local transmission dynamics: Detailed description, how information (rumor, opinions etc) is transmitted Impossible to know – use a proxy: The usage of mobile phones in the adult population is close to 100% All interactions are recorded – use call network as a proxy of the network at the societal level
15 min X X 20 min Y Y 5 min Constructing social network from mobilephone data • Over 7 million private mobile phone subscriptions • Focus: voice calls within the home operator • Data aggregated from a period of 18 weeks • Require reciprocity (XY AND YX) for a link • Customers are anonymous (hash codes) • Data from an European mobile operator (20% market share) • Weights: either call duration or number of calls J.-P. Onnela, et al. PNAS 104, 7332-7336 (2007) J.-P. Onnela, et al. New J. Phys. 9, 179 (2007)
Huge network: proxy for network at societal level Small world
The strength of weak ties (M.Granovetter, 1973) • Hypothesis about the small scale (micro-) • structure of the society: • 1. “The strength of a tie is a (probably linear) • combination of the amount of time, the emotional intensity, the intimacy (mutual confiding), and the reciprocal services which characterize the tie.” • 2. “The stronger the tie between A and B, the larger the proportion of individuals S to whom both are tied.” • Consequences on large (macro-) scale: • Society consists of strongly wired communities linked by weak ties. The latter hold the society together. Granovetter, Mark S. (May 1973), "The Strength of Weak Ties", American Journal of Sociology 78 (6): 1360–1380
Overlap • Definition: relative neighborhood overlap (topological) • where the number of triangles around edge (vi, vj) is nij • Illustration of the concept:
Empirical Verification • Let <O>w denote Oij averaged over a bin of w-values • Use cumulative link weight distribution: • (the fraction of links with weights less than w’) • Relative neighbourhood overlap increases as a function of link weight • Verifies Granovetter’s hypothesis (~95%) • (Exception: Top 5% of weights) • Blue curve: empirical network • Red curve: weight randomised network
High Weight Links? • Weak links: Strengh of both adjacent nodes (min & max) considerably higher than link weight • Strong links: Strength of both adjacent nodes (min & max) about as high as the link weight • Indication: High weight relationships clearly dominate on-air time of both, others negligible • Time ratio spent communicating with one other person converges to 1 at roughly w ≈ 104 • Consequence: Less time to interact with others • Explaining onset of decreasing trend for <O>w si wij sj si=Σjwij
Study revealed the structure of the network, the interplay btw weigths and communities, the relations btw local, mesoscopic and global structure Possible to ask unprecedented questions and even find the answers to them
Spreading of information Knowledge of information diffusion based on unweighted networks Use the present network to study diffusion on a weighted network: Does the local relationship between topology and tie strength have an effect? Spreading simulation: infect one node with new information (1) Granovetterian:pij wij (2) Reference:pij <w> Spreading significantly faster on the reference (average weight) network Information gets trapped in communities in the real network Reference Granovetterian
Small but slow world We have data about - who called whom, voice, SMS, MMS - when - how long they talked (+ metadata – gender, age, postal code + mostly used tower,…) 306 million mobile call records of 4.9 million individuals during 4 months with 1s resolution M.Karsai et al. http://arxiv.org/abs/1006.2125 Movie Nodes: Subscribers Links: If mutual calls existed in the aggregate data voice calls SMS
More accurate study of spreading is possible: Infect (info, gossip, etc.) a node at time t0=0. Transmission, whenever call with uninfected takes place (SI model). Watch m(t)=<I(t)/N>, the ratio of infected nodes with an average over initiators. Time sequence is made periodic Is this fast or slow? What to compare with? The problem of null models
Correlations influence spreading speed • Topology (community structure) • Weight-topology (Granovetter-structure) • Bursty dynamics • Daily pattern • Link-link dynamic correlations Movie: spreading
Bursty dynamics: inhomogeneous activity patterns Poissonian A.-L. Barabási, Nature 207, 435 (2005) Bursty
Bursty call patterns for individual users Average user Busy user Note the different scales
Daily pattern of call density. Weekly pattern too (here disregarded)
Scaled inter-event time distr. Binned according to weights (here: number of calls) Calls are non-Poissonian Inset: time shuffled
Dynamic link-link correlations triggered calls, cascades, etc. How to identify the effect of the different correlations on the spreading? Introduce different null models by appropriate shuffling of the data.
Time shuffling Destroyes burstiness (and link-link correlations) but keeps weight and daily pattern
Link sequence shuffling Select random pairs of links sequences and exchange Destroys topology-weight and link-link correlation, keeps burstiness
Equal weight link sequence shuffling Destroyes link-link correlations Keeps weight-topology correlations and bursty dynamics
The role of the daily pattern Model calculation: Take the empirical topology, with weights Compare homogeneous amd imhomogeneous Poissonians Little effect Slowing down mainly due to Granovetterian structure and bursty character of human activity
A closer look to burstiness Define a bursty period (BP) In a series of signals a BP(Δt) is a sequence of signals with an empty period of length Δt both at the beginning and at the end of the sequence Δt { The end is measured from the end of the talk.
Bursty dynamics: a closer look What is a burst? Define it relative to a window Δt: A bursty period (BP) is a sequence of events separated from the rest by empty periods of at least Δt lengths. Δt Δt
Statistics of bursts: Δt Δt=10 is too small Frequency of length of BP length of BP
Modeling: 1. Independent events: Whatever the distribution of inter-event times is, we get for P(E = n) ~ exp(-An) in contrast to the observed power law
Queuing model (Barabási, 2005) We have a to do list, which contain the tasks in a hierarchical order. Always the highest priority task is executed. Tasks arrive at random and get a hierarchy paramater at random. Qualitatively good (power law waiting times and number of events in BP but wrong exponents
Summary • Mobile phone call network used as a proxy for the human network at the societal level • Structure of the society follows Granovetter’s picture (up to 95%) • Micro and macro structures are related • Several different types of correlations • Spreading slowed down mainly by weight-topology correlations and burstiness of human activities • Bursty are highly correlated events (not explainable by circadic patterns) • Strong short time correlations, no model – new explanation needed
J.-P. Onnela, et al. PNAS 104, 7332-7336 (2007) J.-P. Onnela, et al. New J. Phys. 9, 179 (2007) J. Kumpula et al. PRL 99, 228701 (2007) M. Karsai et al. arXiv:1006.2125