430 likes | 581 Views
CSE 522 – Algorithmic and Economic Aspects of the Internet. Instructors: Nicole Immorlica Mohammad Mahdian. Topics covered in the course. Structure and modeling of social networks Power law graphs; Small world phenomenon; High clustering coefficient; Probabilistic and game theoretic models
E N D
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian
Topics covered in the course • Structure and modeling of social networks Power law graphs; Small world phenomenon; High clustering coefficient; Probabilistic and game theoretic models • Algorithms for link analysis Crawling the web; HITS; Page Rank; Webspam; Rank aggregation; Spectral clustering • Economic aspects of the Internet Peering relations; Alternative mechanisms for routing; P2P networks • Topics motivated by e-commerce Reputation mechanisms; Recommendation systems; Ad auctions
Logistics • Course web page: http://www.cs.washington.edu/education/courses/cse522/05au/ • Course work: • reading papers (1/week on avg) • possibly a few problem sets • How to contact us: {nickle,mahdian}@microsoft.com
Social Networks • A social network is a graph that represents relationships between independent entities. • Graph of friendships (or in the virtual world, networks like orkut) • Web of sexual contact • Graph of scientific collaborations • Cross-posts in newsgroups • Web graph (links between webpages) • Internet: Inter/Intra-domain graph
Scientific Collaboration Network • 400,000 nodes, authors in Mathematical Reviews database • An edge between two authors if they have a joint paper • Just 676,000 edges Picture from orgnet.com
Scientific Collaboration Network • Average degree 3.36 • A few high-degrees: • Paul Erdös, 509 • Frank Harary, 268 • Yuri Alekseevich Mitropolskii, 244 • Many low-degrees: (100,000 of degree 1) Picture from orgnet.com
Scientific Collaboration Network • Short paths • Max Erdös # is 13 • Any two authors connected by path of length at most 23 • Average distance between two authors is 7.64 • e.g.: John Nash → Shapley → Fulkerson → Hoffman → Paul Erdös • Many triangles … Picture from orgnet.com
9/11 Terrorist Network Picture from orgnet.com
Newsgroup Cross-Post Graph • Nodes are newsgroups, essentially archived email lists • Edges are cross-posts, i.e. there is an edge between two newsgroups to which an identical email is posted alt.microsoft.sucks alt.linux.sucks
Internet Graphs • Inter-domain graphs • Nodes are autonomous systemsor domains • Edges are inter-domain connections SPRINT AOL
Inter-domain graph Picture from caida.org
Internet Graphs • Intra-domain graphs • Nodes are routers • Edges are links between routers 199.45.130.13 199.45.143.14
Colored by AS number Picture from lumeta.com
World Wide Web • Nodes are webpages • Arcs (i.e., directed edges) are hyperlinks http://research.microsoft.com/~mahdian http://theory.csail.mit.edu
Web graph, Chicago Tribune Page Picture generated by Nicheworks
Why Study These Networks • Understand the creation of these networks • Understand viral epidemics • Help design crawling strategies for the web • Analyze behavior of algorithms (web/internet) • Predict evolution of the network and emergence of new phenomena
In this lecture • Common properties of social networks • Power law degree distribution • Small world phenomenon • High clustering coefficient • Structure of the web graph
Power Laws • Two quantities x and y are related by a power lawif y is proportional to x(-c) for a constant c y = .x(-c) • If x and y are related by a power law, then the graph of log(y) versus log(x) is a straight line log(y) = -c.log(x) + log() • The slopeof the log-log plot is the power exponentc
Power Law Distributions • A random variable X has a power law distributionif Pr[X=k] is proportional to k(-c) for a constant c • The cumulative distribution, Pr[X>k], of a power law distribution is proportional to k(-c+1), and is called the Pareto law • Similar to a power law, the Zipf lawrelates the rank r of X to its size: the r’th largest instance of X is proportional to r(-c’)
Example: City Populations • New York 7,322,564 • Los Angeles 3,485,398 • Chicago 2,783,726 • Houston 1,630,553 • Philadelphia 1,585,577 • San Diego 1,110,549 • Detroit 1,027,974 • Dallas 1,006,877 • Phoenix 983,403 • San Antonio 935,933
Example: City Populations • New York 7,322,564 • Los Angeles 3,485,398 • Chicago 2,783,726 • Seattle 516,259 • Spokane, WA 177,196 • Tacoma, WA 176,664 • Little Rock, AR 175,795 • Bakersfield, CA 174,820 • Fremont, CA 173,339 • Fort Wayne, IN 173,072 • Arlington, VA 170,936
Example: City Populations • Power law exponent: c = 0.74
Power Laws in Networks • Degree distribution often satisfies a power law: fraction of nodes fdof degree d is proportional to d-c
Example: Collaboration Graph • Power law exp: c = 2.97 • With exponential decay factor, c = 2.46
Example: Cross-Post Graph • Power law exponent: c = 1.3
Example: Inter-Domain Internet • Power law exponent: 2.15 < c < 2.2
Example: Intra-Domain Internet • Power law exponent: c = 2.48
Example: Web Graph In-Degree • Power law exponent: c = 2.09
Example: Web Graph Out-Degree • Power law exponent: c = 2.72
Small World Phenomenon Six degrees of separation: “Everybody on this planet is separated by only six other people. Six degrees of separation between us and everyone else on this planet. The President of the United States, a gondolier in Venice, just fill in the names.”
Small World Phenomenon • Milgram’s famous experiment (1960s): • Choose a random person in Nebraska, Bob • Ask Bob to deliver a letter to a random person in Massachusetts, Lashawn • Tell Bob target’s name, address, and occupation • Instruct Bob to only send letter to people he knows on a first-name basis
Small World Phenomenon Bernard, David’s cousin who went to college with David, mayor of Bob’s town Bob, a farmer in Nebraska Maya, who grew up in Boston Six Degrees of Separation With Lashawn
Small World Phenomenon in Graphs • The diameterof a graph is the maximum distance (number of edges) between any pair of nodes • The average distanceof a graph is the average distance between any pair of nodes • The average connected distanceof a graph is the average distance between any pair of connected nodes
Small World Phenomenon in Graphs • A graph exhibits a small world phenomenonif it has low diameter or average (connected) distance • Typically, the average distance of a small world graph is on the order of log n (where n is the number of nodes)
Examples • Collaboration graph • 401,000 nodes, 676,000 edges (average degree 3.37) • Diameter: 23, Average distance: 7.64 • Cross-post graph, giant component • 30,000 nodes, 800,000 edges (average degree 53.3) • Diameter: 13, Average distance: 3.8 • Web graph • 200 million nodes, 1.5 billion edges (average degree 15) • Average connected distance: 16 • Inter-domain Internet • 3500 nodes, 6500 edges (average degree 3.71) • 95% of pairs of nodes within distance 5
High Clustering Coefficient • The clustering coefficientof a graph is the fraction of triangles among connected triples of nodes • Intuitively, the clustering coefficient reflects the probability that your friends are themselves friends • We expect social networks to have a high clustering coefficient
Examples • Collaboration graph • Clustering coefficient is 0.14 • Density of edges is 0.000008 • Cross-post graph • Clustering coefficient is 0.4492 • Density of edges is 0.0016
Assignment READ: A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, Graph structure in the web, WWW, 2000.
Graph Structure of the Web • Breadth-first search from randomly chosen start nodes • Follow both forward and backward links • Reveal directed and undirected graph structure • Over 90% of nodes reachable if links are treated as undirected • Directed graph reveals complex bow-tie structure
Bow-Tie Structure of Web Graph Picture from the Nature journal
Next Time Probabilistic models for social networks