Internet Economics כלכלת האינטרנט

Internet Economicsכלכלת האינטרנט Class 10 – it’s a small world

Outline • Reminder from last week • Milgram’s experiment • Small world phenomena • A random model of social network • Some statistics if we have time….

Outline • Last week: we modeled social network as graph, added some natural assumptions and definitions.Motivation: searching for information in the network. • Today: more specific modeling of social network, emphasis on geographical locations.Motivation: spread of information in social networks (very close to search…)

Modeling Social Networks • What is a social network? A graph. • Nodes … (participants) • Edges …. (meaning “friendship, know eachother,…) An edge:“A and B are friends” Sami Charlie Shimon Tony Moni Seffi Al David

Triadic Closure • “if A and B have a friend in common, there is an increase likelihood that they will become friends in the future” • Creating a “triangle”. B C A

Strong/weak ties • Remember the job-search example. • We need to distinguish between strengths of friendships. In our model, two types of friends: • Strong ties: mean “friends”. • Weak ties: mean “acquaintances”. Solid lines:strong ties E D A G B H Dashed lines:weak ties F C

Outline

Graphs: connectivity • How information is transmitted in social networks? • New job, new electronic gadget, rumors, etc. • Well, they need to be connected… B E D H A C G F

Graphs: connectivity B E D H A C G F • This graph is connected: there is a path between every pair of nodes.

Graphs: connected components B E D H A C G F • This graph is not connected • Three connected components

Connectivity in social networks • Are social networks connected? • Probably not… • Even one isolated person can cause it • “an isolated tropical island” • But we can see that real social networks have high connectivity: • usually have a giant component • And usually only one… • Examples:

Giant component (1) • Nodes: high school students (male and female) • Edges: “have been in a romantic touch within the past 18 months”(think about sexually transmitted diseases… (which actually was the focus of this research)

Giant component (2) • Collaboration of researchers in 9 institutes (biological research center Structural Genomics of Pathogenic Protozoa) • 3 connected component, one of which is giant.

Distances in social networks • Distance between nodes in a social network:minimal number of edges in a connecting path • Example: distance from H, Distance 3 (“friends of friends of friends”) B E E D D H A C G G F F Distance 1 (“friends”) Distance 2 (“friends of friends”)

Distances • We saw: most of the nodes in social networks lie in a giant connected component. • But what about the distance between some two nodes? Can it be large? • Answer: in principle, no. • “small world phenomenon”:not only do you have paths of friends connecting you to a large fraction of the world’s population, but these paths are surprisingly short.

Small world • What lead to this observation?

Experiment (Milgram, 1960’s) • Asked 300 randomly chosen “starters” • They should try forward a letter to a target person (with name and address). • Restriction: Forward it only to person you know (“first-name basis”).

Experiment (Milgram, 1960’s) • Results: 64 letter reached the target. Median length of path: 6 (!)

Six Degrees of Separation • A play with this title was published in 1990 by John Guare. • About 20 years after Milgram’sexperiiment. • Movie in 1993 • “I read somewhere that everybody on this planet is separated by only six other people. Six degrees of separation between is and everyone else on this planet”

A more recent experiment • Social network: users of Microsoft Instant Messenger. • Edge: the users communicated at least once over the last month. • 240 Million active user accounts. • Results: • A giant component containing almost all the nodes. • Average distance between nodes: 6.6 • Median of distances between nodes: 7

A more recent experiment • Plot created by sampling 1000 users

Experiment conclusion • The instant-messenger experiment proved Milgram’s observations for gigantic social networks • (well, for technology-oriented particpants). • More examples: • Kevin Bacon • Average Bacon number of actors on IMDB is 2.9 • Finding an actor with a bacon number > 5 is very hard. • 8 is the maximum known. • Erdos number • Most mathematicians have Erdos number of at most 5.

Distances • So it turns out that distances in social networks are short. • The more interesting part of Milgram’s experiment:how do people find the short paths? • People decide to forward the message to their friends, without observing the whole network. • Shortest paths  easy search, but only when flooding is allowed. (In the experiment, each agent forwarded the letter to a single friend.)

Possible reason: exponential expansion • Is this surprising?. • Let’s count: • I have 100 friends. • Each friend has 100 friends • Each friend of my friend has 100 friends • Overall, in only 3 hops I can access 1003=million people! • What’s wrong with this argument? • Triadic closure! Many of my friends are friends of each other. • Makes small world more surprising: network is clustered, no obvious paths. B C A

A probabilistic model of networks • The following model (Watts & Strogatz 1998) is based on the following properties: • Homophily • Weak ties

Probabilistic model of networks A “grid” Two parameters: Structural: r – strong ties with all neighbors in radius r(homophily) Random: k – each agent has k randomly selected friends (weak ties) r Search using weak ties will hardly involve triadic closure (will be close to the exponential expansion model)

Probabilistic model of networks A “grid” It can be shown that short paths exist even with very little randomness (k=1, that is each agent has a single random weak tie).

The grid model • Can the grid model we have just seen explain Milgram’s small world phenomenon? • Problem:the choice of weak ties seems to be “too random”.

The grid model (Taken from Milgram’s original paper)

Searching over the grid • The agent has a message to deliver to a target person: • can only forward the message to his friends. • knows the location of the target on the grid. • knows only his own friends • Neighbours (strong ties) • Random edges (weak ties) • Important:the agent does not know the random edges of the others! • A reasonable strategy: deliver to a friend which is closest/closer to the target. • Problem: even when short paths exist, the delivery time might be long.

A modified model: Inverse Square Two parameters: Structural: r – strong ties with all neighbors in radius r(homophily) Random: k – each agent has k randomly selected friends (weak ties) r Now, chance of a random edge decays with geographic distance. If d is the distance between (A,B), they will have a weak tie with probability 1/d2.

Intuition: inverse squares 8d 4d 2d d

Intuition: inverse squares 8d • More population is in further zones. • With (uniformly) random weak ties: many ties to further zone, much less in closer areas. 4d 2d d

Intuition: inverse squares • Think about how many people you know: • In your street • Neighborhood • City • Israel • World 8d • Agents can forward a message to each tier. Makes search more efficient. 4d • With inverse squares: • how many people in a distance between (x,2x)? About x2 • The probability of a link to each person there: 1/x2  number of ties at each tier: x2 * 1/x2 = constant. • Same number of friends at each tier. 2d d

The grid model (Taken from Milgram’s original paper)

The grid model • Next: does the grid model fit real data? • Let’s consider LiveJournal – a blogging website with about 500,000 users. • For each user we have: US zip code, list of friends. • Problem: non-uniform population density LiveJournal user population

Distance by rank • Let w be in distance d from v if only d-1 nodes are geographically closer. • Then the “rank” of w is d.

Distance by rank • “distance by rank” is a generalization of the distance we saw before. • With uniform population: • Therefore: in the rank model, w will be a friend of v with probability 1/rank(w)

Distance by rank • It can be shown that (Liben-Nowell et al.) • If we have a link to each node with probability that is inversely proportional to the number of closer nodes,then this network can be searched efficiently. What about the LiveJournal network?

Distance by rank Probability of friendship Geographic rank In the data, probability of friendship is between and Pretty close to the theoretical prediction of

Decentralized search: conclusion We had a look at a research process • Start with a simple experiment • Interesting observations and conjectures. • Build a mathematical model (based on the experiment) • Make a prediction (based on the mathematical model) • Validate the prediction on real data. In this case:prediction from a highly simplified model still explain real data.

Some statistics for desert…

Taxi in New York Suppose: • 80% of the taxis in NY are black • 20% are yellow An eyewitness to a hit-and-run accident reported that the running taxi was yellow. • But we know that eyewitnesses report the true color 80% of the time… • What is the probability that the running taxi is yellow given a yellow report?

Taxi in New York We want to calculate: Pr[ true color = Y | report = Y] Bayes’ rule: Pr[ true = Y ] = 0.2 Pr[ rep=Y | true=Y] = 0.8 Pr[ rep=Y ] = 0.8*0.2+0.2*0.8 = 0.32 

Bayes rule Conditional probability: Bayes rule follows from: A AandBB

Bayes’ rule Central in probabilistic decision making:the way people update their beliefs. Prior probability Posterior probability

Balloon game In our balloon game: 2 red and 1 blue  red bag1 red and 2 blue  blue bag Bag is 50% red and 50% blue Pr[ bag = blue ] = 0.5 Pr[bag=blue | observation=blue] Guessing “blue” after observing “blue” was correct!

Internet Economics כלכלת האינטרנט