220 likes | 233 Views
Explore methods to quantify and visualize closeness among various search engines to aid users in selecting the most suitable option. Analyze rankings and distances to compare search engine policies.
E N D
Measuring Closeness of Search Engine- Identification of Outliers - Visualization of Closeness WangHua 王化 情報科学科四年
Motivation • Too many search engines • More than 20 major general-purpose engines • More specific-purpose engines • Simple aggregation of rankings is popular. • We address the need to quantify and visualize the closeness between search engines.
Too Many Search Engines with Different Policy • Major search engines • Yahoo, Altavista, Google,Lycos etc. • Distinct ranking policy • Directory type • Robot type • Pagerank type with hyperlink
Outline of Methods • Ranking • List distance measure • Distance between search engines
Ranking • Partial List • Cases for WWW web sites • Top 100 list
Footrule Distance among Ranking Lists • s, t:ranking lists • Si |s(i) -t(i)| • [a,b,c,d,e] [a,d,e,c,b] 0+2+1+2+3=8
Kendall-tau Distance • Definition [Dwork, WWW10, 2001] • Counts the number of pairwise disagreements between two lists | { i < j | s(i) < s(j) but t(i) > t(j) } | • [a,b,c,d][a,d,c,b]6 pairs: (a,b) (a,c) (a,d) (b,c) (b,d) (c,d) 0+0+0+1+1+1=3
Characterof Distance • Kendall-tau has O(n log n)-time complexity • Meets triangle inequality and norm distance
Matrix of Distance • Keyword = “university
Visualization • Kernighan-Lin Algorithm • Kamada Spring Model • Comparison of the 2 methods
Kernighan-Lin Method • Brief explanation
Kernighan-Lin by Color Coding Keyword1 =“Totti” Keyword2=“Nakata”
Kernighan-Lin by Color Coding • Keyword1=“Gucci” Keyword2=“Hermes”
Kamada Spring Model • Brief explanation
Kamada Spring Model • Keyword1=“Totti” Keyword2=“Nakata”
Results • Distances between search engines are different. • Different fields have different characters • Some search engines such as Sprinks are far away from others. • Excite, Aol are near to each other in most cases.
Conclusion • Address the need to quantify and visualize the closeness between search engines. • Provide users GUI to see the closeness of search engines. • Help users to select the proper search engines • Help users to see the features of each search engines in carious fields.
Future Work • Use more search engines • Use both general-purpose and special-purpose search engines • Use hyperlinks to find the resemblance • Apply this idea to other fields