310 likes | 449 Views
Optimizing Web Search Using Social Annotations. By Worasit Choochaiwattana. Agenda. Optimizing Web Search Using Social Annotations Studies on Improving the Quality of Web Search Social Annotation Based Web Search Experimental Results Discussion and Conclusion.
E N D
Optimizing Web Search Using Social Annotations ByWorasit Choochaiwattana
Agenda • Optimizing Web Search Using Social Annotations • Studies on Improving the Quality of Web Search • Social Annotation Based Web Search • Experimental Results • Discussion and Conclusion
Optimizing Web Search Using Social Annotations • Exploring the use for social annotations to improve web search • Social annotations can benefit web search • Good summaries of corresponding web pages • Count of annotations indicates the popularity of web pages • They proposed SocialSimRank(SSR) and SocialPageRank(SPK)
Studies on Improving the Quality of Web Search • Two aspects • Ordering the web pages according to the query-document similarity e.g. anchor text generation, metadata extraction, link analysis, and search log mining. • Ordering the web pages according to their qualities, aka query-independent ranking or static ranking e.g. PageRank, HITS, and fRank • The retrieved results are ranked base on both page quality and query-page similarity
Studies on Improving the Quality of Web Search • Web users are creating annotations for web pages at an incredible speed • del.cio.us >1 million registered users • Social annotations are useful information that can be used in various ways e.g. Folksonomy, Semantic Web, and Enterprise Search.
Social Annotation Based Web Search • Web page creators provide not only the web pages and anchor texts for similarity ranking, but also the link structure for static ranking f • The interaction log of Search engine users also benefits web search by providing the click-through data. • Social annotation based web search focuses on how Web page annotator can contribute to web search
Social Annotation Based Web Search • SocialSimRank(SSR) measures the similarity between the query and annotations based on their semantic relation. • SocialPageRank(SPR) measures the popularity of web pages from web page annotators’ point of view.
Similarity Ranking Between Query and Social Annotations • Term-Matching Based Similarity Ranking • Calculate the similarity based on the count of shared terms between query and annotations • Some pages’ annotations are quite sparse and the term-matching base approach suffers more or less for the synonymy problem
Similarity Ranking Between Query and Social Annotations • Social Similarity Ranking • Observation • Similar (semantically-related) annotations are usually assigned to similar (semantically-related) web pages by users with common interests. In the social annotation environment, the similarity among annotations in various forms can further be identified by the common web pages they annotated
Similarity Ranking Between Query and Social Annotations • Social Similarity Ranking
Similarity Ranking Between Query and Social Annotations • Social Similarity Ranking • Assume that there are NA annotations, NPweb pages and NUweb users. • MAPis the NA×NP association matrix between annotations and pages. • MAP(ax,py) denotes the number of users who assign annotation axto page py • SA is the NA×NAmatrix whose element SA(ai, aj) indicates the similarity score between annotations aiand aj • SPis the NP×NPmatrix each of whose element stores the similarity between two web pages • SocialSimRank(SSR) is iterative algorithm to quantitatively evaluate the similarity between any two annotations
Similarity Ranking Between Query and Social Annotations • Social Similarity Ranking • The time complexity of SSR alrotighm is O(NA2NP2) • If the scale of social annotations keeps growing exponentially, the speed of convergence for the algorithms may slow down. • The similarity calculation method base on the SocialSimRank is
Page Quality Estimation Using Social Annotations • Social Page Rank • Observation • High quality web pages are usually popularly annotated and popular web pages, up-to-date web users and hot social annotations have the following relations: • popular web pages are bookmarked by many up-to-date users and annotated by hot annotations; • up-to-date users like to bookmark popular pages and use hot annotations; • hot annotations are used to annotate popular web pages and used by up-to-date users.
Page Quality Estimation Using Social Annotations • Social Page Rank • To quantitatively evaluate the page quality (popularity) indicated by social annotations • The intuition behind the algorithm is the mutual enhancement relation among popular web pages, up-to-date web users, and hot social annotations.
Page Quality Estimation Using Social Annotations • Social Page Rank • Assume that there are NAannotations, NPweb pages and NU web users. • MPUis the NP×NUassociation matrix between pages and users • MAPis the NA×NPassociation matrix between annotations and pages • MUAis the NU×NAassociation matrix between users and annotations • Element MPU(pi,uj) is assigned with the count of annotations used by user ujto annotate page pi. • Elements of MAPand MUAare initialized similarly. • P0 be the vector containing randomly initialized SocialPageRank scores.
Page Quality Estimation Using Social Annotations • Social Page Rank
Page Quality Estimation Using Social Annotations • Social Page Rank • The time complexity of the algorithm is O(NUNP+NANP+NUNA)
Dynamic Ranking with Social Information • Incorporate both similarity and static feature exploited from social annotations into the ranking function by using RankSVM
Experimental Results • Delicious Data • The data crawling from del.icio.us during May 2006, which consists of 1,736,268 web pages and 269,566 different annotations, has been used. • Compound annotations in various forms e.g. java.programming or java/programming were split into standard words with the help of WordNet before using them in the experiments.
Experimental Results • Evaluation of Annotation Similarities • With the SocialSimRank algorithm converged with 12 iterations, they are able to find semantically related annotations
Experimental Results • Evaluation of SPR Results
Experimental Results • Dynamic Ranking with Social Annotation • Both Manual query set (MQ) and Automatic query set (AQ) are used. • 50 MQ and their corresponding ground truths obtained from a group of CS students. • 3000 AQ and their corresponding ground truths obtained from the Open Directory Project
Experimental Results • Dynamic Ranking with Social Annotation • DocSimilarity is taken as the base line which calculated based on the BM25 formula
Experimental Results • Dynamic Ranking with Social Annotation • Two popular retrieval metrics are used to evaluate the ranking algorithms • Mean Average Precision • NDCG at K
Experimental Results • Dynamic Ranking with Social Annotation • Dynamic Ranking Using Social Similarity
Experimental Results • Dynamic Ranking with Social Annotation • Dynamic Ranking Using Social Page Rank
Experimental Results • Dynamic Ranking with Social Annotation • Dynamic Ranking Using Both SSR and SPR
Discussion and Conclusion • The social annotations do benefit web search but there are still several problems • Annotation Coverage • Submitted queries may not match any social annotation. • Many web pages may have no annotations • Annotation Ambiguity • SSR may find the similar term to the query terms while fail to disambiguate terms that have more than one meanings • Annotation Spamming • Malicious annotation have a good opportunity to harm the search quality
Discussion and Conclusion • The main contributions can be concluded as follows: • The study on how to use social annotations to improve the quality of web search • SocialSimRank algorithm to measure the association among various annotations • SocialPageRank algorithm to measure a web page’s static ranking based on social annotations