160 likes | 196 Views
Brief introduction to HITS Algorithm
E N D
Hypertext-Induced Topic Search • Algorithm developed by Kleinberg in 1998. • Attempts to computationally determine hubs and authorities on a particular topic through analysis of a relevant sub-graph of the web. • This is the mutually reinforcing relationship. • Hubs point to lots of authorities. • Authorities are pointed to by lots of hubs.
Key Terms • HUBS • Authority • In degree • Outdegree
Hubs & Authorities •Good hub: page that points to many good authorities. •Good authority: page pointed to by many good hubs. • Out-degree of p: number of nodes it has links to. • In-degree of p: number of nodes that have links to it. •Given Keyword Query, assign a hub and an authoritative value to each page. •Pages with high authority are results of query
Computing Hubs and authorities • Computes hubs and authorities for a particular topic specified by a normal query. • First determines a set of relevant pages for the query called the base set S. • Analyze the link structure of the web subgraph defined by S to find authority and hub pages in this set.
Constructing a Base Subgraph •For a specific query Q, let the set of documents returned by a standard search engine be called the root set R. •Initialize S to R. •Add to S all pages pointed to by any page in R. •Add to S all pages that point to any page in R.
Adjacency Matrix • Graphs are represented by Adjacency matrix. • Adjacency matrix is a square matrix used to represent a finite graph. • Elements of a matrix indicate whether the pair of vertices are adjacent or not to graph. • Adjacent points are represented with “1” and not adjacent with “0”.
Graph from Adjacency Matrix • Adjacency Matrix • Node Graph N2 N1 N4 N3
HUB and Authority • Outdegree • Indegree method • Hub(Rank) = N1,{N2,N3} Tie, N4 • Authority(Rank)=N4,N3,{N1,N2} Tie
Hub and Authority using Matrix • A = Adjacency Matrix • u = Hub (Weight Vector) • v = Authority (Vector) • v = • u = A * v • Initial Rank of the hubs are unknown so assigning equal ranks to all hubs. • u =
Hub and Authority using Matrix • v = • v = Note here we are using K=1 Where K is the number of iterations. • u = A * v • u = • u =
Hub and Authority using Matrix • Matrix Method • Hub(Rank) = N1,N2,N3,N4 • Authority(Rank)=N4,N3,{N1,N2} Tie
Hub and Authority using Matrix • Let us calculate Ranks for Hubs and Authorities for K=2 • Divide each element in u and v matrix with square root of sum of squares of each elements in uand vobtained in previous iteration. • = • = • v = , u = Note that there is no change in the ranks and authority
PageRank v.s. Authorities PageRank (Google) HITS (CLEVER) performed on the set of retrieved web pages for each query computes authorities and hubs easy to compute, but real-time execution is hard • computed for all web pages stored in the database prior to the query • computes authorities only • Trivial and fast to compute