360 likes | 377 Views
Explore how Static and Dynamic Scoring in Web Page Grouping improves accuracy and relevance of information retrieval on the internet. Discover the approach for solving problems in PageRank and HITS algorithms by enhancing link structures and relevance to retrieval queries. Experiment and evaluate the effectiveness of the proposed technique.
E N D
Static and Dynamic Scoringby Web Page Grouping Hitoshi NAKAKUBO Takashi SATO
Introduction • Huge information exists on WWW space. • The extraction of information, which the internet users need, is difficult. • Web Search Engine • It extracts information by a simple full-text search. • However, there is a limit in accuracy by a simple full-text search.
Related Works • Link Structure Analysis • PageRankAlgorithm • This algorithm defines the link act as the recommendation act on linked web pages. • HITSAlgorithm • This algorithm defines two scores of Authority and Hub. • This algorithm can extract web communities which have similar information.
Related Works’ Problem • PageRankAlgorithm • Is the link act a really recommendation act? • Web sites which refuse the link act excluding a specific page exist. • HITSAlgorithm • This algorithm has a known problem. • This algorithm cannot extract appropriate web communities at any time.
Our Approachfor Problem Solving • PageRankAlgorithm • Problem • The score is decided based on the adjacent relation. • Our approach • We consider linking constructional adjacent relation recurrently solved. Enhancing of link structure
Our Approachfor Problem Solving • HITSAlgorithm • Problem • An unrelated Web pages to the retrieval query are considered. • Our approach • We consider similarity of the algorithm application area and the retrieval query. Relation to retrieval query
Proposal • Web Page Grouping • Making web page group with similar information • Static and Dynamic Scoring • Link structure analysis with Web Page Grouping. • Ranking • The final rank is decided by score annexation.
Web Page Grouping • Purpose • Enhancing related to adjacent link structure • Concept • Groups are made on web pages with similar information. • Similar information: same authors / same contents • Two kinds of methods: directory structure / link structure
Directory Structure Method Directory structure in a web site is defined as a tree structure. The leaves which have the same branch are made a group. A B D C E Grouping Algorithm Web Page Group Document root
Static Scoring • Purpose • Decision of importance degree of Web pages • The problem of the PageRank algorithm is reduced. • Concept • Target document: all web pages • Target link structure: after Grouping
Dynamic Scoring • Purpose • Decision of importance degree of web pages of retrieval query dependence • The problem of the HITS algorithm is reduced. • Concept • Target document: full-text search result set • Target link structure: before Grouping (#1) / after Grouping (#2)
Ranking • Purpose • Decide a final score and the rank. • Concept • Each score is regularized. • The power root is applied to each score. • The weighting factor is multiplied to the score, and it adds. • The weighting factor is decided by the experiment.
Experiment • Purpose • Effectiveness verification of proposal technique • Experiment item • Grouping evaluation • Score evaluation • Weighting factor best value verification
Environment • Full-text search system • Variable-length gram base index • Retrieval target • Test collection “NW100G-01” (NTCIR-4 Web) • Retrieval query • 77 queries (NTCIR-4 Web) • Evaluation method • Weighted Reciprocal Rank
Grouping Evaluation:# of Web Pages in Each Groups • Each groups have biased the number of web pages. • It influences the number of links in each groups. The technique of Grouping requires reexamining.
> > > < Grouping Evaluation:Comparison of Grouping Result • Static: Number of Nodes: decrease Number of Links: decrease • Dynamic: Number of Nodes: decrease Number of Links: increase Processing result of expectation
Score Evaluation: Comparison of Scoring Result • Leveling of score by Grouping Decrease in relevance document extraction ability
Score Evaluation:Weighted Reciprocal Rank Grouping application Static Score unit: Relevant documents cannot be extracted.
Score Evaluation:Weighted Reciprocal Rank Dynamic Score: Domination changes by the rank.
Static Score Evaluation:Relevant Document Extraction Ratio None Not Apply & Apply Not Apply Not Apply Apply Apply Grouping … Not Apply: 61% / Apply: 13%
Dynamic Score Evaluation:Relevant Document Extraction Ratio None Not Apply & Apply Not Apply Not Apply Apply Apply Grouping … Not Apply: 32% / Apply: 31%
The score is annexed based on a specific score. • The score of the Grouping application existence is annexed. Examination of Score Annexation • The influence on the rank is small in each score unit. • As for the Grouping application existence, the feature of the score is an opposite.
Annexation ScoreCalculation Expression • Annex Score(p) = Wr ・Search Score(p) + Static Score(p) + Dynamic Score(p) • Static Score(p) = Ws1 ・Static Score w/o Grouping(p) + Ws2 ・Static Score w/ Grouping(p) • Dynamic Score(p) = Wd1 ・Dynamic Score #1(p)+ Wd2 ・Dynamic Score #2(p)
(Wr, Ws1, Ws2, Wd1, Wd2) [ Rank ] Wr = {1, 2}, Wx = {0, 1, 2}, x∈{s1, s2, d1, d2} Weighting Factor Best Value Verification … … … …
Annexation score that doesn't contain Dynamic Scores Weighting Factor Best Value Verification … … … …
Annexation score including Dynamic Score #1 or #2 Weighting Factor Best Value Verification … … … …
Annexation score including both Dynamic Scores Weighting Factor Best Value Verification … … … …
+6% +180% vs. “Full-text Search+PageRank”Weighted Reciprocal Rank
Consideration ofWeb Page Grouping • Method • Each groups have biased the number of web pages. • Effect • Static: Number of nodes: decrease Number of links: decrease • Dynamic: Number of nodes: decrease Number of links: increase Effectiveness is confirmed. The technique of grouping requires reexamining.
Consideration ofStatic Scoring • Influence of Grouping • Change that applies score • It is scoring of documents different from existing techniques. • Leveling of score • The influence level to the ranking decreases. The documents that cannot be extracted by existing techniques are extractive. It is impossible to make the ranking change greatly.
Consideration ofDynamic Scoring • Accuracy is very inferior. • A lot of incompatible documents are extracted. • Influence of accuracy of Grouping • Feature of each score • #1:The same document as existing technique A little influence • #2:Document different from existing technique Big influence It is necessary to experiment again.
Consideration ofRanking • Evaluation result • The weighting factor of the best evaluation does not annex Dynamic Scores. • Influence of Grouping accuracy • Accuracy improvement of about 6% compared with existing technique • Score annexation expression • It is not possible to decide it by this experiment. The accuracy improvement by the proposal technique is confirmed.
Conclusion • We proposed the ranking technique by Grouping. • Confirmation of effectiveness of each proposal technique • Confirmation of accuracy improvement by proposal technique • Future Work • Reexamination of Grouping • Investigation concerning Web page composition that each technique works effectively
< < < < > > < < Score Evaluation: Comparison of Scoring Result • The score distribution tendency changes by the Grouping application existence. Maximum: Not Apply > Apply Minimum: Not Apply < Apply