Diffusion, Persistence and Mobility: An Overview

Diffusion, Persistence and Mobility: An Overview Omar Malik

Overview • Diffusion and Persistence • The diffusion equation • Persistence as a measure of diffusion • Persistence of different networks • What can it be used for? • Mobility • Human mobility: predictable and cyclical • Mobility patterns and (the lack of) anonymity • Mobility patterns from scarce data

Diffusion and Persistence

Diffusion • The net statistical movement of particles. • The spread of the smell of perfume throughout a room. • The spread of news on social media. • The spread of a highly contagious disease through a population.

Diffusion • Described by a differential equation. The diffusive field. Diffusion constant

Diffusion • Must be discretized to describe the process on networks Field at i-th location at (n+1) time step Diffusion constant

Diffusion • F bundles the rate of diffusion and the length of the discrete time step. • If it is too large the equation does not converge properly. • For networks the maximum number that F can take is related to the largest degree of the network

Persistence • Persistence refers to whether or not a node changes state up to a given time. • When studying diffusive processes persistence becomes a measure of the progress of diffusion. • The number of nodes that have not switched from state 0 to 1. • The number of uninfected in an epidemic. • The fraction of the room where the smell of perfume has not permeated yet.

Regular Networks 1-D Lattices

Regular Networks 2-D Lattices The occupation probability i.e. the fraction of the full network.

Random Networks ER Networks

Persistence

Persistence • Network topology determines the persistence behavior • Networks with regular structure show power-law scaling. • This scaling is absent in random graphs. • Networks become more persistent as the network diameter and the average shortest path increases.

Persistence A Real World Example • GitHub is a network of developers. • Changing trends may be modelled diffusively. • For example, a network of users that start to switch from Python 2 to Python 3. • Our group is working on a project that involves predicting large scale patterns of activity on GitHub.

Persistence A Real World Example

Mobility

Mobility • Human mobility: random or predictable? • Older models tended to model human mobility patterns with stochastic processes. • However, Barabasi showed that while randomness exists, most people spend > 60% of their time in just two locations. • These two locations are usually work and home.

Mobility • Entropy is a measure of the information we have about a user’s whereabouts.

Mobility Accounts for the order in which locations were visited as well as the time spent at each locations. Accounts for the frequency with which different locations were visited but without regards to order. Accounts only for the number of locations visited.

Mobility The maximum possible predictability for a given user’s movements (depends on S) Corresponding predictability measures from other measures of entropy

Mobility • Assuming that the user’s behavior is completely random gives us a predictability of less than 1%. • Incorporating order and temporal information pushes the upper bound on predictability to, on average, 93%.

Mobility • Surprisingly, people who travel farther are only slightly less predictable than relatively sedentary individuals. • This suggests that even users who travel a lot have fairly regular patterns of movement.

Mobility • The probability of finding a user at one of their top n most visited locations. • The top two most visited locations give us ~60% predictability.

Anonymity • How distinctive are our movement patterns? • How much many snapshots of a person’s location at a particular time are needed to uniquely determine them?

Anonymity • Just four randomly chosen points are enough to uniquely identify an individual user with ~95% certainty.

Anonymity • Even with loss of resolution (in both the space and time axes) the pattern remains highly predictive.

Sparse Self-submitted Data • So far the data we have been talking about is gathered automatically. • If the data is user-submitted (and therefore exceptional rather than typical) does it exhibit more randomness? • This is the nature of a current project, informally called the Gowalla project, which consists of millions of user-submitted check-ins.

Sparse Self-submitted Data • By using clustering techniques we can still determine behavioral patterns. • On average, most users spend ~57% of their time in their two most frequented location clusters. • It might be possible to uniquely identify users from their check-in data even if it is anonymised.

Sparse Self-submitted Data • So far the data we have been talking about is gathered automatically. • If the data is user-submitted (and therefore exceptional rather than typical) does it exhibit more randomness? • Even in self-submitted data the top two locations represent ~60% of the check-ins.

Questions • Recktenwald. (2011). Finite-Difference Approximations to the Heat Equation. Retrieved from http://www.nada.kth.se/~jjalap/numme/FDheat.pdf • Derrida, Bernard, et al. “Persistent Spins in the Linear Diffusion Approximation of Phase Ordering and Zeros of Stationary Gaussian Processes.” Physical Review Letters, vol. 77, no. 14, 1996, pp. 2871–2874., doi:10.1103/physrevlett.77.2871. • Song, C., et al. “Limits of Predictability in Human Mobility.” Science, vol. 327, no. 5968, 2010, pp. 1018–1021., doi:10.1126/science.1177170. • Montjoye, Yves-Alexandre De, et al. “Unique in the Crowd: The Privacy Bounds of Human Mobility.” Scientific Reports, vol. 3, no. 1, 2013, doi:10.1038/srep01376.

Diffusion, Persistence and Mobility: An Overview