390 likes | 567 Views
Link Building. Martin Olsen Department of Computer Science Aarhus University. Outline. Motivation and Introduction Contribution Link Building Communities in Networks Hedonic Games Simple Games. What is Search Engine Optimization (SEO) ?.
E N D
Link Building Martin Olsen Department of Computer Science Aarhus University
Outline • Motivation and Introduction • Contribution • Link Building • Communities in Networks • Hedonic Games • Simple Games
What is Search Engine Optimization (SEO) ? • ... in 2012, companies will spend almost $9 billion on search engine optimization … The New York Times, January 2009 Objective of SEO: A link to your page appears here on page 1
www as a Graph = =
Random Surfer Zaps with probability 0.15 PageRank. Random Surfer Perspective 100 1 3 100 2 100 4 5 6 7 8 9 10 100 100 100 100 100 100 100 1000 random surfers
Random Surfer Zaps with probability 0.15 PageRank. Random Surfer Perspective 143 = 85 + 85/2 +15 1 3 355 = 4 85 + 15 2 270 4 5 6 7 8 9 10 15 58 15 15 15 15 100 1000 random surfers Distribution after one tick
Random Surfer Zaps with probability 0.15 PageRank. Random Surfer Perspective 281 1 3 66 2 280 4 5 6 7 8 9 10 254 15 43 15 15 15 15 1000 random surfers Stationary distribution after 50 ticks
Random Surfer Zaps with probability 0.15 PageRank. Random Surfer Perspective 0.281 1 3 0.066 2 0.280 4 5 6 7 8 9 10 0.254 0.015 0.043 0.015 0.015 0.015 0.015
Random Surfer Zaps with probability 0.15 PageRank. Random Surfer Perspective 0.281 1 3 0.066 2 0.280 4 5 6 7 8 9 10 0.254 0.015 0.043 0.015 0.015 0.015 0.015 PageRank Ranking: 1, 2, 4, 3, 6 PageRank is an important ingredient of the ranking mechanism Relevance counts as well!
Contribution/Link Building The Computational Complexity of Link Building (Cocoon ´08) Olsen Maximizing PageRank with new Backlinks (submitted) Olsen MILP for Link Building (In preparation) Olsen, Viglas
The Link Building Problem. Formal Definition • LINK BUILDING • Instance : G(V, E), t V, k Z+ • Solution : S V {t} with S k • maximizing t after adding • S {t} to E
Link Building is not Trivial 0.096 2 0.091 0.060 7 3 0.272 1 8 0.250 6 4 0.085 0.069 0.054 2 0.039 2 5 0.078 0.042 0.042 0.049 0.035 7 3 0.375 7 3 1 0.367 1 8 0.337 6 4 8 0.331 6 4 0.054 0.054 0.070 0.049 5 0.042 5 0.060
PageRank Topology Theorem*) : The expected number of visits to p for a random surfer starting at u prior to the first zapping event i 1 increase in PageRank 1 j
k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING • Does the graph contain an independent set of size k? • Can we turn this question into a Link Building problem? j i
k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING j y x 1 i OPT! Basic idea: Make zij relatively big
k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING j LINK BUILDING is W[1]-hard *): LINK BUILDING solvable in time f(k) nc k-REGULAR INDEPENDENT SET solvable in time f(k) nc W[1] = FPT Another result: FPTAS for LINK BUILDING NP = P y x 1 i OPT! Basic idea: Make zij relatively big
Upper Bound: k = 1 fixed 0.070 0.096 2 2 0.060 0.091 0.048 0.060 7 3 7 3 0.338 0.272 1 1 8 0.306 8 0.250 6 4 6 4 0.048 0.085 0.060 0.069 5 5 0.070 0.078 The dashed link can be found in time corresponding to O(1) PageRank computations with a randomized scheme *).
Upper Bound: Mixed Integer Linear Programming Approach *) Price for link from i Compute the cheapest set of new incoming links that would make node 5 rank highest 0.061 2 0.099 0.036 7 3 0.187 1 8 0.178 6 4 0.189 0.049 5 0.200
A Quiz: Which of the two situations would be optimal for Martin?
Contribution/Communities in Networks Communities in Large Networks: Identification and Ranking (WAW ´06) Olsen
Communities in Networks Dolphins in Doubtful Sound [Newman, Girvan ´04]:
What is a Community? Informally: A community C is a set of nodes with relatively many links between them Assumption/Observation: A CS site has relatively many CS links! Formal definition based on assumption *) : v C,u C: wvC ≤ wuC C
A Greedy Approach for Detecting Members of a Community *) Repeat until C is a Community: • Find v Cwith maximum attention to C • CC {v} • Update attentions Use two priority queues holding elements in C and V C 1) Old C 2) New C
An Experiment. A Danish CS Community • Crawl of the dk-domain with 180.468 sites in total • Representatives = 4 CS sites • CS-Community with 556 sites • Minimum attention, : 15.8% • Maximum attention, : 15.4% Ranking: • www.daimi.au.dk (CS U Aarhus) • www.diku.dk (CS U Copenhagen) • www.itu.dk (ITU Copenhagen) • www.cs.auc.dk (CS U Aalborg) • www.brics.dk (CS PhD School) • www.imm.dtu.dk (Informatics/Mathematical modeling DTU Copenhagen) … • www.imada.sdu.dk (CS/Mathematics U Southern Denmark)
Other Results Computing non trivial communities by the definition given is NP-hard A simple model for the evolution of communities is presented. These communities are probably obeying the definition for large n if the out degree of the nodes is (log n).
Contribution/Hedonic Games Nash Stability in Additively Separable Hedonic Games Is NP-Hard (CiE ´07) Olsen Extended version: Nash Stability in Additively Separable Hedonic Games and Community Structures (Theory of Computing Systems ´09) Olsen
An Additively Separable Hedonic Game Two buffaloes b1 and b2 that hate each other. They are only thirsty if they have a parasite on their back in which case they have to drink 9 l/h. Two gigantic parasites p1 and p2. They only want to sit on b1 and b2 respectively. Five waterholes w1, …,w5 with capacities 1, 2, 3, 4 and 8 l/h respectively.
An Additively Separable Hedonic Game One Nash Equilibrium for the game: PARTITION ≤ NE in ASHG NPC *)
Community Structures in Networks Put a 1 on each connection between two dolphins. The community structure is a NE! NE community structure? NE’s are NP-hard to compute even with symmetric and positive payoffs*)
Contribution/Simple Games On the Complexity of Problems on Simple Games (submitted) Freixas, Molinero, Olsen, Serna
Open Problems/Future Work • In the thesis we show LINK BUILDING APX. Is there a PTAS for LINK BUILDING? • Surgical Link Building: • Isolate the Community C • Model all pages in V C as one page • Use MILP • Use information on distribution of PageRank • Does the stuff presented really work? • Thank You!
Link Building. A Real World Example Dear X We are trying to get more links to our website to help improve its rating on the search engines. We were wondering if you could put a link to our site … on your webpage or blog. If you have a website or a Blog and put a link to our page on it then to say thank you for each month it is up, I will give you … Source: An e-mail to a colleague X
Link Building is not Trivial. 2nd Example 1 Assumption: Obtaining a link from one green node is slightly better for node 1 compared to obtaining a link from one blue node. Now node 1 can pick three incoming links for free. What should node 1 choose?
No FPTAS for LINK BUILDING if NP ≠ P *) j y x 1 i OPT!
Fixed Parameter Tractability: FPT and W[1] W[1] k-INDEPENDENT SET k-REGULAR INDEPENDENT SET Solvable in time f(k) nc FPT k-VERTEX COVER Complete for W[1] LINK BUILDING is W[1]-hard *)
Other Results Computing non trivial communities by the definition given is NP-hard A simple model for the evolution of communities is presented. These communities are probably obeying the definition for large n if the out degree of the nodes is (log n). C
Upper Bound: Mixed Integer Linear Programming Approach *) price for 0.061 0.096 2 2 0.099 0.036 0.091 0.060 7 3 7 3 0.187 0.272 1 1 8 0.178 8 0.250 6 4 6 4 0.189 0.085 0.049 0.069 5 5 0.200 0.078 The dashed links show the cheapest modification that will bring node 5 to the top of the ranking. Computed using a MILP approach. Alternatively we could go for the maximum improvement in the ranking for a given budget.