240 likes | 374 Views
The effect of New Links on Google Pagerank. By Hui Xie Apr , 07. Computing PageRank. Matrix representation Let P be an n n matrix and p ij be the entry at the i-th row and j-th column. If page i has k>0 outgoing links p ij = 1/k if page i has a link to page j
E N D
The effect of New Links on Google Pagerank By Hui Xie Apr , 07
Computing PageRank Matrix representation Let P be an nn matrix and pij be the entry at the i-th row and j-th column. If page i has k>0 outgoing links pij = 1/k if page i has a link to page j pij = 0 if there is no link from i to j If page I has no outgoing links pij = 1/n j=1,…,n
Google matrix • G=cP+(1-c)(1/n)eeT e=(1,…,1)T • G is stochastic matrix Ge=e • There exists a unique column vector π such that πT G= πT, πT e=1 • πT =(1-c)/n eT(I-cP)-1
Discrete Time Markov Chains • A sequence of random variables {Xn} is called a Markov chain if it has the Markov property: • States are usually labeled {(0,)1,2,…} • State space can be finite or infinite
Transition Probability • Probability to jump from state i to state j • Assume stationary: independent of time • Transition probability matrix: P = (pij) • Two state MC:
Side Topic: Markov Chains • A discrete time stochastic process is a sequence of random variables {X0, X1, …, Xn, …} where the 0, 1, …, n, … are discrete points in time. • A Markov chain is a discrete-time stochastic process defined over a finite (or countably infinite) set of states S in terms of a matrix P of transition probabilities. • Memorylessness property: for a Markov chain • Pr[Xt+1 = j | X0 = i0, X1 = i1, …, Xt = i] = Pr[Xt+1 = j | Xt = i]
Side Topic: Markov Chains • Let pi(t) be the probability of being in state i at time step t. • Let p(t) = [p0(t), p1(t), … ] be the vector of probabilities at time t. • For an initial probability distribution p(0), the probabilities at time n are • p(n) = p(0) Pn • A probability distribution p is stationary if p = p P • P(Xm+n =j|Xm = i) = P(Xn =j|X0 = i) = Pn(i,j)
absorbing Markov chain Define a discrete-time absorbing markov chain {Xt ,t=0,1,…}with the state space {0,1,…,n} Where transitions between the states 1,…, n are conducted by the matrix cP, and the state 0 is absorbing. The transition matrix is
Random walk interpretation • Walk starts at a uniformly chosen web page • At each step, if currently at page p • W/p α, go to a uniformly chosen outneighbor of p • W/p 1 - α, stop
Let Njbe the total number of visits to state j before absorption including the visit at time t = 0 if X0 is j . Formally, • Then zij=(I-cP)-1ij=E(Nj|X0=I) • Let qij be the probability of reaching the state j before absorption if the initial state is i. Then we have
Theorem Let X denote a Markov chain with state space E. The total number of visits to a state j∈E under the condition that the chain starts in state i is given by P(Nj=m|X0=j)=qjjm-1(1-qjj) and for i!=j P(Nj=m|X0=i)= 1-qij if m=0 qij qjjm-1(1-qjj) if m>=1 Corollary For all i,j ∈E the relations zij=(1-qii)-1 and zij=qijzjj hold
Outgoing links from i do not affect qji for any j!=I So by changing the outgoing links, a page can control its PageRank up to multiplication by a factor zii=1/(1-qii) For 0<=qii<=c2 , 1<=zii<=(1-c2)-1≈3.6 for c=0.85
Rank one update of google pagerank • Page 1 with k0 old links has k1 newly created links to page 2 to k1+1 • k=k0+k1 , p1T be the first row of matrix P • Updated hyperlink matrix
According to (9) the ranking of page 1 increases when For z11=1/(1-q11), zi2=qi1z11, i>1 The above is equivalent to
Hence, the page 1 increases its ranking when it refers to pages that are characterized by a high value of qi1. These must be the pages that refer to page 1 or at least belong to the same Web community. Here by a Web community we mean a set of Web pages that a surfer can reach from one to another in a relatively small number of steps.
the PageRank of page j increases if if several new links are added then the PageRank of page j might actually decrease even if this page receives one of the new links. Such situation occurs when most of newly created links point to “irrelevant” pages.
For instance, let j = 2 and assume that there is no hyperlink path from pages 3,…,k+1 to page 2.Then zijis close to zero for i = 3,…, k + 1, and the PageRank of page 2 will increase only if (c/k1)z22> z12, which is not necessarily true, especially if z12 and k1 are considerably large.
Asymptotic analysis • Let be the stopping time of the first visit to the state j • Mij=E( |X0=i) be the average time needed to reach j starting from i(mean first passage time)
Optimal Linking Strategy • Consider a page i = 1,…,n and assume that i has links to pages i1,…,ikdistinct from i. Further, let mij(c) be the mean first passage time from page i to page j for the Google transition matrix Gwith parameter c.
outgoing links from i do not affect mji(c) for any j!= i. Thus, by linking from i to j , one can only alter k, this means that the owner of the page I has very little control over its pagerank. The best that he can do is to link only to one page j* such that Note that (surprisingly) the PageRank of j* plays no role here.
Theorem. The optimal linking strategy for a Web page is to have only one outgoing link pointing to a Web page with a shortest mean first passage time back to the original page.
Conclusions • Our main conclusion is that a Web page cannot significantly manipulate its PageRank by changing its outgoing links. • Furthermore, keeping a logical hyperlink structure and linking to a relevant Web community is the most sensible and rewarding policy.