The effect of New Links on Google Pagerank

The effect of New Links on Google Pagerank By Hui Xie Apr , 07

Computing PageRank Matrix representation Let P be an nn matrix and pij be the entry at the i-th row and j-th column. If page i has k>0 outgoing links pij = 1/k if page i has a link to page j pij = 0 if there is no link from i to j If page I has no outgoing links pij = 1/n j=1,…,n

Google matrix • G=cP+(1-c)(1/n)eeT e=(1,…,1)T • G is stochastic matrix Ge=e • There exists a unique column vector π such that πT G= πT, πT e=1 • πT =(1-c)/n eT(I-cP)-1

Discrete Time Markov Chains • A sequence of random variables {Xn} is called a Markov chain if it has the Markov property: • States are usually labeled {(0,)1,2,…} • State space can be finite or infinite

Transition Probability • Probability to jump from state i to state j • Assume stationary: independent of time • Transition probability matrix: P = (pij) • Two state MC:

Side Topic: Markov Chains • A discrete time stochastic process is a sequence of random variables {X0, X1, …, Xn, …} where the 0, 1, …, n, … are discrete points in time. • A Markov chain is a discrete-time stochastic process defined over a finite (or countably infinite) set of states S in terms of a matrix P of transition probabilities. • Memorylessness property: for a Markov chain • Pr[Xt+1 = j | X0 = i0, X1 = i1, …, Xt = i] = Pr[Xt+1 = j | Xt = i]

Side Topic: Markov Chains • Let pi(t) be the probability of being in state i at time step t. • Let p(t) = [p0(t), p1(t), … ] be the vector of probabilities at time t. • For an initial probability distribution p(0), the probabilities at time n are • p(n) = p(0) Pn • A probability distribution p is stationary if p = p P • P(Xm+n =j|Xm = i) = P(Xn =j|X0 = i) = Pn(i,j)

absorbing Markov chain Define a discrete-time absorbing markov chain {Xt ,t=0,1,…}with the state space {0,1,…,n} Where transitions between the states 1,…, n are conducted by the matrix cP, and the state 0 is absorbing. The transition matrix is

Random walk interpretation • Walk starts at a uniformly chosen web page • At each step, if currently at page p • W/p α, go to a uniformly chosen outneighbor of p • W/p 1 - α, stop

Let Njbe the total number of visits to state j before absorption including the visit at time t = 0 if X0 is j . Formally, • Then zij=(I-cP)-1ij=E(Nj|X0=I) • Let qij be the probability of reaching the state j before absorption if the initial state is i. Then we have

Theorem Let X denote a Markov chain with state space E. The total number of visits to a state j∈E under the condition that the chain starts in state i is given by P(Nj=m|X0=j)=qjjm-1(1-qjj) and for i!=j P(Nj=m|X0=i)= 1-qij if m=0 qij qjjm-1(1-qjj) if m>=1 Corollary For all i,j ∈E the relations zij=(1-qii)-1 and zij=qijzjj hold

Outgoing links from i do not affect qji for any j!=I So by changing the outgoing links, a page can control its PageRank up to multiplication by a factor zii=1/(1-qii) For 0<=qii<=c2 , 1<=zii<=(1-c2)-1≈3.6 for c=0.85

Rank one update of google pagerank • Page 1 with k0 old links has k1 newly created links to page 2 to k1+1 • k=k0+k1 , p1T be the first row of matrix P • Updated hyperlink matrix

According to (9) the ranking of page 1 increases when For z11=1/(1-q11), zi2=qi1z11, i>1 The above is equivalent to

Hence, the page 1 increases its ranking when it refers to pages that are characterized by a high value of qi1. These must be the pages that refer to page 1 or at least belong to the same Web community. Here by a Web community we mean a set of Web pages that a surfer can reach from one to another in a relatively small number of steps.

the PageRank of page j increases if

the PageRank of page j increases if if several new links are added then the PageRank of page j might actually decrease even if this page receives one of the new links. Such situation occurs when most of newly created links point to “irrelevant” pages.

For instance, let j = 2 and assume that there is no hyperlink path from pages 3,…,k+1 to page 2.Then zijis close to zero for i = 3,…, k + 1, and the PageRank of page 2 will increase only if (c/k1)z22> z12, which is not necessarily true, especially if z12 and k1 are considerably large.

Asymptotic analysis • Let be the stopping time of the first visit to the state j • Mij=E( |X0=i) be the average time needed to reach j starting from i(mean first passage time)

Optimal Linking Strategy • Consider a page i = 1,…,n and assume that i has links to pages i1,…,ikdistinct from i. Further, let mij(c) be the mean first passage time from page i to page j for the Google transition matrix Gwith parameter c.

outgoing links from i do not affect mji(c) for any j!= i. Thus, by linking from i to j , one can only alter k, this means that the owner of the page I has very little control over its pagerank. The best that he can do is to link only to one page j* such that Note that (surprisingly) the PageRank of j* plays no role here.

Theorem. The optimal linking strategy for a Web page is to have only one outgoing link pointing to a Web page with a shortest mean first passage time back to the original page.

Conclusions • Our main conclusion is that a Web page cannot significantly manipulate its PageRank by changing its outgoing links. • Furthermore, keeping a logical hyperlink structure and linking to a relevant Web community is the most sensible and rewarding policy.

The effect of New Links on Google Pagerank

The effect of New Links on Google Pagerank

Presentation Transcript

PageRank

Local Approximation of PageRank and Reverse PageRank

PageRank

Description of PageRank

Experiments with MATLAB Google PageRank

The Library Links Feature of Google Scholar

PageRank

The Google Pagerank Algorithm and How It Works

PageRank

The effect of new health legislation on farms

PageRank

The Effect of…on…..

Google Pagerank: how Google orders your webpages

Effect of links on DHT routing algorithms

New Links

Google PageRank Check

Google PageRank

Google PageRank Check

Google Reduced Spammy Links

PageRank

SUPPORT LINKS TO YOUR BUSINESS PROFILE ON GOOGLE