CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information Christopher Manning Lecture 19: (Social) Networks: Fat Tails, Small Worlds, and Weak Ties Slides from Dan Jurafsky who mainly got them from LadaAdanic;some also from James Moody, Bing Liu, James Moody

Networks • We’ve looked a lot at information in language • We’ve seen that you can also get information from the patterns of pages across the Web … PageRank • But there are also patterns of people who interact • Maybe it’s also useful to study and understand that? • Many goals: • Sociology • … • Better ad targeting Slide from Chris Manning

Social network analysis • Social network analysis is the study of entities (people in an organization), and their interactions and relationships. • The interactions and relationships can be represented with a network or graph, • each vertex (or node) represents an actor and • each link represents a relationship. May be directed or not. CS583, Bing Liu, UIC Nov 5, 2009

Centrality • Important or prominent actors are those that are involved with other actors extensively. • A person with extensive contacts or communications with many other people in the organization is considered more important • The contacts can also be called links or ties. • A central actor is one involved in many ties. • Degree centrality: The number of direct connections a node has • What really matters is where those connections lead to and how they connect the otherwise unconnected. CS583, Bing Liu, UIC

Prestige • Prestige is a more refined measure of prominence of an actor than centrality. • Distinguish: ties sent (out-links) and ties received (in-links). • A prestigious actor is one who is the object of extensive ties as a recipient. • To compute the prestige: we use only in-links. • PageRank is based on prestige CS583, Bing Liu, UIC

Social Network Analysis 2. Betweenness Centrality: A node with high betweenness has great influence over what flows in the network indicating important links and single point of failure. 3. Closeness Centrality: The measure of nodes which are close to everyone else. The pattern of direct and indirect ties allows the nodes to get to any other node in the network more quickly than anyone else. They have the shortest paths to all others. from AmitSharma UT Austin

Outline • Sketch of some real networks • Fat Tails • Small Worlds • Weak Ties

High school dating Peter S. Bearman, James Moody and Katherine Stovel Chains of affection: The structure of adolescent romantic and sexual networks American Journal of Sociology 110 44-91 (2004) Image drawn by Mark Newman Slide from Drago Radev

The Harlem Shake: Anatomy of a Viral Meme by GiladLotan (SocialFlow)

Slide from Chris Manning

Degree of nodes • Many nodes on the internet have low degree • One or two connections • A few (hubs) have very high degree • The number P(k) of nodes with degree k follows a power law: • Where alpha for the internet is about 2.1

What is a heavy tailed-distribution? • Unlike a normal distribution, power-law distributions have no “scale” [I’ll explain that later…] • Right skew • normal distribution (not heavy tailed) • Zipf’sor power-law distribution (heavy tailed) • High ratio of max to min • human heights vs. • city sizes Slide from Lada Adamic

Normal (Gaussian) distribution of human heights average value close to most typical distribution close to symmetric around average value Slide from Lada Adamic

Power-law distribution • linear scale • log-log scale • high skew (asymmetry) • straight line on a log-log plot Slide from Lada Adamic

Power laws are seemingly everywherenote: these are cumulative distributions scientific papers 1981-1997 AOL users visiting sites ‘97 Moby Dick Slide from Lada Adamic bestsellers 1895-1965 AT&T customers on 1 day California 1910-1992

Yet more power laws wars (1816-1980) Moon Solar flares Slide from Lada Adamic richest individuals 2003 US family names 1990 US cities 2003

Power law distribution • Straight line on a log-log plot • Exponentiate both sides to get that p(x), theprobability of observing an item of size ‘x’ is given by normalizationconstant (probabilities over all x must sum to 1) power law exponent a Slide from Lada Adamic

What does it mean to be scale free? • A power law looks the same no mater what scale we look at it on (2 to 50 or 200 to 5000) • Only true of a power-law distribution! • p(bx) = g(b) p(x) – shape of the distribution is unchanged except for a multiplicative constant • p(bx) = (bx)-a = b-a x-a log(p(x)) x →b*x Slide from Lada Adamic log(x)

Many real world networks are power law Slide from Lada Adamic

Power laws and “fat tails”

“Fat tails” in the news

Hey, not everything is a power law • number of sightings of 591 bird species in the North American Bird survey in 2003. cumulative distribution • another examples: • size of wildfires (in acres) Slide from Lada Adamic

Not every network is power law distributed • email address books • power grid • Roget’s thesaurus • company directors… Slide from Lada Adamic

Zipf’s law is a power-law • Zipf • George Kingsley Zipf • how frequent is the 3rd or 8th or 100th most common word? • Intuition: small number of very frequent words (“the”, “of”) • lots and lots of rare words (“expressive”, “Jurafsky”) • Zipf's law: the frequency of the r'th most frequent word is inversely proportional to its rank: y ~ r -b , with b close to unity.

Pareto’s law and power-laws • Pareto • The Italian economist Vilfredo Pareto was interested in the distribution of income. • Pareto’s law is expressed in terms of the cumulative distribution (the probability that a person earns X or more). P[X > x] ~ x-k • Here we recognize k as just a -1, where a is the power-law exponent Slide from Lada Adamic

80/20 rule • The fraction W of the wealth in the hands of the richest P of the population is given by W = P(a-2)/(a-1) • Example: US wealth: a = 2.1 • richest 20% of the population holds 86% of the wealth Slide from Lada Adamic

Generative processes for power-laws • Many different processes can lead to power laws • There is no one unique mechanism that explains it all Slide from Lada Adamic

One mechanism for generating a power-law distribution:Preferential attachment • First considered by [Price 65] as a model for citation networks • each new paper is generated with m citations (mean) • new papers cite previous papers with probability proportional to their in-degree (citations) • what about papers without any citations? • each paper is considered to have a “default” citation • probability of citing a paper with degree k, proportional to k+1 • Power law with exponent α = 2+1/m Slide from Lada Adamic

Small worlds Slide from Lada Adamic

Small world experiments MA NE Stanley Milgram’s experiment (1967): Given a target individual and a particular property, pass the message to a person you correspond with who is “closest” to the target. Slide from Lada Adamic

Milgram’s small world experiment • Target person worked in Boston as a stockbroker. • 296 senders from Boston and Omaha. • 20% of senders reached target. • average chain length = 6.5. • “Six degrees of separation” Slide from Lada Adamic

Duncan Watts: Networks, Dynamics and the Small-World Phenomenon Asks why we see the small world pattern and what implications it has for the dynamical properties of social systems. His contribution is to show that globally significant changes can result from locally insignificant network change. Slide from James Moody

Duncan Watts: Networks, Dynamics and the Small-World Phenomenon Watts says there are 4 conditions that make the small world phenomenon interesting: 1) The network is large - O(Billions) 2) The network is sparse - people are connected to a small fraction of the total network 3) The network is decentralized -- no single (or small #) of stars 4) The network is highly clustered -- most friendship circles are overlapping Slide from James Moody

Duncan Watts: Networks, Dynamics and the Small-World Phenomenon Formally, we can characterize a graph through 2 statistics. • 1) The characteristic path length, L (the diameter) • The average length of the shortest paths connecting any two actors. • 2) The clustering coefficient, C • Is the average local density. That is, Cv = ego-network density, and C = Cv /n • A small world graph is any graph with a relatively small L and a relatively large C. Slide from James Moody

Local clustering coefficient (Watts&Strogatz 1998) • For a vertex i • The fraction pairs of neighbors of the node that are themselves connected • Let ni be the number of neighbors of vertex i number of connections between i’s neighbors maximum number of possible connections between i’s neighbors # directed connections between i’s neighbors ni * (ni -1) # undirected connections between i’s neighbors ni * (ni -1)/2 Ci = Ci directed = Ci undirected = Slide from Lada Adamic

Local clustering coefficient (Watts&Strogatz 1998) • Average over all n vertices ni = 4 max number of connections: 4*3/2 = 6 3 connections present Ci = 3/6 = 0.5 i link present link absent Slide from Lada Adamic

Why does this work? Key is fraction of shortcuts in the network In a highly clustered, ordered network, a single random connection will create a shortcut that lowers L dramatically Watts demonstrates that Small world graphs occur in graphs with a small number of shortcuts Slide from Lada Adamic

Watts and Strogatz model [WS98] • Start with a ring, where every node is connected to the next z nodes ( a regular lattice) • With probability p, rewire every edge (or, add a shortcut) to a uniformly chosen destination. order p = 0 0 < p < 1 p = 1 Slide from Lada Adamic Small world randomness

Clustering and Path Length Random Graphs have a low clustering coefficient but a low diameter Regular Graphs have a high clustering coefficient but also a high diameter Slide from Lada Adamic

Weak links • Mark Granovetter (1960s) studied how people find jobs. He found out that most job referrals were through personal contacts • But more by acquaintances and not close friends. • Accepted by the American Journal of Sociology after 4 years of unsuccessful attempts elsewhere. • One of the most cited papers in sociology. Slide from Drago Radev

Mark Granovetter: The strength of weak ties • Strength of ties • amount of time spent together • emotional intensity • intimacy (mutual confiding) • reciprocal services • Many strong ties are transitive • we meet our friends through other friends • if we spend a lot of time with our strong ties, they will tend to overlap • balance theory – if two of my friends do not like each other – we will all be unhappy. Triad closure is the happy solution = Homophily Slide from James Moody

Strength of weak ties • Weak ties can occur between cohesive groups • old college friend • former colleague from work weak ties will tend to have low transitivity Slide from James Moody

Strength of weak ties • Evidence from small world experiments • Small world experiment at Columbia:acquaintanceship ties more effective than family, close friends Slide from James Moody

Strength of weak ties – how to get a job • Granovetter: How often did you see the contact that helped you find the job prior to the job search • 16.7% often (at least once a week) • 55.6% occasionally (more than once a year but less than twice a week) • 27.8% rarely – once a year or less • Weak ties will tend to have different information than we and our close contacts do • Long paths rare • 39.1 % info came directly from employer • 45.3 % one intermediary • 3.1 % > 2 (more frequent with younger, inexperienced job seekers) • Compatible with Watts/Strogatz small world model: short average shortest paths thanks to ‘shortcuts’ that are non-transitive Slide from James Moody

Finding paths • Watts model shows how these short paths can exist • small world networks • But how do people find the paths? • People seem to be successful by making greedy local decisions • The existence of findable short paths depends on further elucidating the structure of the network Slide from Lada Adamic

Spatial search Kleinberg, ‘The Small World Phenomenon, An Algorithmic Perspective’Proc. 32nd ACM Symposium on Theory of Computing, 2000. (Nature 2000) “The geographic movement of the [message] from Nebraska to Massachusetts is striking. There is a progressive closing in on the target area as each new person is added to the chain” S.Milgram ‘The small world problem’, Psychology Today 1,61,1967 nodes are placed on a lattice and connect to nearest neighbors additional links placed with puv~ Slide from Lada Adamic

Increasing r favors near nodes • r=0, • Link to each other node equally likely • r=1, inverse of distance • If a node is twice as far away, 1/2 as likely • r=2, inverse squared • If a node is twice as far away, 1/4 as likely d -r(u,v) =1 , Uniform Distribution Slide from Lada Adamic

When u is the current node, choose next v: the closest to t (use lattice distance) with (u,v) a local or random edge. t Kleinberg’s SW networkis Greedy Routable iff r=2 • Greedy routing algorithm using local information only, find a short path from s to t v u s Slide from Lada Adamic

t Kleinberg’s SW networkis Greedy Routable iff r=2 • A greedy routing algorithm using local information only, find a short path from s to t v u s • The number of hops is the the ‘delivery time’ • This greedy routing achieves • expected ‘delivery time’ of O(log2n), i.e. the st paths have expected length O(log2n). Slide from Lada Adamic

CS 124/LINGUIST 180 From Languages to Information