240 likes | 320 Views
Social Networks 101. Prof. Jason Hartline and Prof. Nicole Immorlica. Lecture Ten : The web and PageRank . The internet vs the web. The internet : The world wide web : Nodes = machines Nodes = webpages Edges = wires Edges = hyperlinks. The web is a directed graph. Cows:
E N D
Social Networks 101 Prof. Jason Hartline and Prof. Nicole Immorlica
Lecture Ten: The web and PageRank.
The internet vs the web The internet: The world wide web: Nodes = machines Nodes = webpages Edges = wires Edges = hyperlinks
The web is a directed graph Cows: Dairy Meat Dairy: Cheese Milk Meat: Cow Lamb
Directed graphs a b Edge (a,b) = edge from a to b.
Directed paths v2 v3 v1 Path (v1, v2, v3, v4). v4 Definition: A directed path from v1 to vk is a sequence of nodes (v1, …, vk) such that for any adjacent pair vi and vi+1, there’s an edge from vi to vi+1.
Strongly connected components Not strongly connected. Strongly connected. Definition: A strongly connected component is a subset of nodes {v1, …, vk} such that for any pair vi and vj in the set, there’s a path from vi to vj.
What does the web look like? Strongly connected component 56 million nodes
What does the web look like? Disconnected components Strongly connected component In Out Tubes Tendrils
Searching the web Q. How can Google answer your questions without understanding them? A. It uses the hyperlink structure.
Basic ideas • A link to a page is an endorsement of that page’s quality. • Links from high quality pages are better than links from low quality pages.
First attempt Initialize: Each page has equal rank (“tokens”). Repeat: Each page divides its tokens equally among all out-going links.
1/5 1/5 1/5 1/5 Initialization 1/5
4/15 3/15 1/15 3/15 First round 4/15
What could go wrong? Some node eventually collects all tokens.
What could go wrong? Some node eventually collects all tokens.
PageRank Initialize: Each page has equal rank (“tokens”). Repeat: Each page divides 1. an s fraction of its tokens equally among all out-going links. 2. a (1-s) fraction equally among all nodes
Important properties of PageRank • It converges (the PageRank of a page is the number of tokens it owns in the limit). • The initialization doesn’t matter.
Random walks and PageRank Randy browses the web randomly.
Start at arbitrary node. With prob. s, travel to random out-going link, With prob. (1-s), travel to random node. Repeat forever and ever.
Important properties Randy’s walk, 1. Converges: the probability Randy is on any given page approaches a fixed number in the limit. 2. It doesn’t matter where he starts.
Randy’s walk = PageRank The probability Randy is on a given page is proportional to that page’s PageRank.
Extensions Anchor text Click probabilities Link/click spam
Next time TBA