220 likes | 351 Views
Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness. Wang Hua 王化 情報科学科四年. Motivation. Too many search engines More than 20 major general-purpose engines More specific-purpose engines Simple aggregation of rankings is popular.
E N D
Measuring Closeness of Search Engine- Identification of Outliers - Visualization of Closeness WangHua 王化 情報科学科四年
Motivation • Too many search engines • More than 20 major general-purpose engines • More specific-purpose engines • Simple aggregation of rankings is popular. • We address the need to quantify and visualize the closeness between search engines.
Too Many Search Engines with Different Policy • Major search engines • Yahoo, Altavista, Google,Lycos etc. • Distinct ranking policy • Directory type • Robot type • Pagerank type with hyperlink
Outline of Methods • Ranking • List distance measure • Distance between search engines
Ranking • Partial List • Cases for WWW web sites • Top 100 list
Footrule Distance among Ranking Lists • s, t:ranking lists • Si |s(i) -t(i)| • [a,b,c,d,e] [a,d,e,c,b] 0+2+1+2+3=8
Kendall-tau Distance • Definition [Dwork, WWW10, 2001] • Counts the number of pairwise disagreements between two lists | { i < j | s(i) < s(j) but t(i) > t(j) } | • [a,b,c,d][a,d,c,b]6 pairs: (a,b) (a,c) (a,d) (b,c) (b,d) (c,d) 0+0+0+1+1+1=3
Characterof Distance • Kendall-tau has O(n log n)-time complexity • Meets triangle inequality and norm distance
Matrix of Distance • Keyword = “university
Visualization • Kernighan-Lin Algorithm • Kamada Spring Model • Comparison of the 2 methods
Kernighan-Lin Method • Brief explanation
Kernighan-Lin by Color Coding Keyword1 =“Totti” Keyword2=“Nakata”
Kernighan-Lin by Color Coding • Keyword1=“Gucci” Keyword2=“Hermes”
Kamada Spring Model • Brief explanation
Kamada Spring Model • Keyword1=“Totti” Keyword2=“Nakata”
Results • Distances between search engines are different. • Different fields have different characters • Some search engines such as Sprinks are far away from others. • Excite, Aol are near to each other in most cases.
Conclusion • Address the need to quantify and visualize the closeness between search engines. • Provide users GUI to see the closeness of search engines. • Help users to select the proper search engines • Help users to see the features of each search engines in carious fields.
Future Work • Use more search engines • Use both general-purpose and special-purpose search engines • Use hyperlinks to find the resemblance • Apply this idea to other fields