310 likes | 430 Views
Social Search and Discovery Using a Unified Approach. Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar IDB Tagging Team, School of CSE, SNU Presented by Kangpyo Lee. A Variety of Web Search Types. Social Search
E N D
Social Search and Discovery Using a Unified Approach EinatAmitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar IDB Tagging Team, School of CSE, SNU Presented by Kangpyo Lee
A Variety of Web Search Types Social Search Personalized Search Exploratory Search Unified Search Universal Search Multi-entity Search Vertical Search Faceted Search Multi-faceted Search
Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary
Introduction • Recent Web 2.0 applications (e.g., web logs, collaborative bookmarking systems, and social networks) introduce new entities & relations in addition to regular web pages • Web 2.0 entities relate to each other in several ways • Documents may relate to other documents by referencing each other • A user may relate to a document through authorship relation, as a tagger, as an author, or as mentioned in the page’s content • A user may relate to other users through social relations • A tag relates to the bookmark it is associated with, and also to the tagger • These entities & relations may prove valuable in enhancing the search experience • By serving as potential search results • By influencing ranking algorithms
Introduction • We present and evaluate novel methods for leveraging social information to enhance search results and discover relations between Web 2.0 applications • Our approach leverages a unified representation of the entities and their relations • We then use this intricate heterogeneous collection to establish an all-encompassing social search solution
Introduction • Social search solution • Allows users to query for specific entities and retrieve results of all relevant types • The system returns, in addition to standard search results, users related to the query, as well as tags that are associated with relevant documents • These tags can be further used to categorize the search results and to better refine the searcher’s information need • We use the term social search engine to describe this multi-entity search system based on “social” data • Our social search system is the only one that provides a unified approach for searching and retrieving entities of all types
Introduction- Unified Approach • Our social data include records of users’ public activity with documents • such as bookmarking, tagging, rating, or comments made to other public Web 2.0 entities • Our system allows the search for any object type (e.g., documents, person, or tag) and the retrieval of all entity types • The system supports • Standard textual queries • Entity queries • Any combination of the two
Introduction- Unified Approach • The social search engine is based on the unified search approach • Unified search • A.k.a. heterogeneous interrelated entity search • An emerging paradigm within IR • The search space is expanded to represent heterogeneous information about objects that may relate to each other in several ways • Direct relations • Indirect relations • The system must be scalable, responsive, and reflect the rapid update patterns typical in Web 2.0 systems
Introduction- Unified Approach • We present a novel realization of unified search paradigm based on multifaceted search • Represents each of the system’s entities by a retrievable document • Direct relations between entities are represented by marking one of the elements as a “facet” of its counterpart • The strength of the relationship between the two objects is represented by the strength of document-facet relationship A Direct Relation B • A is one of B’s facets • B is one of A’s facets
Introduction- Unified Approach • An efficient mechanism for updating relations between objects as well as efficient search over the heterogeneous data • Only direct relations between objects need to be updated when new entities are added • Indirect relations are dynamically induced from the direct relations and computed on-the-fly during query execution time • Directly-related objects are retrieved and scored during run-time using the search engine’s regular scoring mechanisms • Indirectly-related entities are retrieved and scored using an implementation of faceted search
Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary
Related Work • Social search • The set of annotations provided by the public can be used to enrich the page content • The # of annotations of a web page can be used as additional evidence of document quality for improved ranking of search results • Social data enables users to search for other people with whom thy maintain relationships in the network • Social ranking • Ranking all entities retrieved by the social search engine • FolkRank and SocialPageRank • Applying PageRank-like computation depends heavily on the graph size and is expected to be very slow • Different entity types provide different retrieval values for the searcher, hence they should be ranked according to their own characteristics
Related Work- Multi-Entity Search • Multi-entity search • Extending basic search functionality by answering user queries with many types of entities • Usually based on analysis of the relationship between entities and documents relevant to the query • Searching over a multi-entity graph • Nodes are entities (terms, documents, persons, annotations) • Edges are the relations between the entities • SimFusion uses a Unified Relationship Matrix (URM) to represent the multi-entity graph
Related Work- Multi-Entity Search • Unified Relationship Matrix (URM) • Relations between two object types are represented via a relationship matrix Mij • The (k, l) entry of matrix Mij represents the strength of the relation between the object pairs (ok, ol) of types Oi and Oj respectively • The URM matrix U • Encapsulates all matrices to provide a unified representation of the unified search space • Provides relationship strength between any two directly related entities, along with a theoretically elegant way to calculate indirect relations through matrix multiplication
Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary
Implementation • Our solution to unified search represents each object in the system in two ways • (1) as a retrievable document • (2) as a facet (category) of all the objects to which it relates • A unified representation of a collaborative bookmarking system • Three object types – web pages, users, and tags • Each object type is associated with a corresponding document – a web page document, a user document, and a tag document • Three relationship types • A user-type facet between a user & the tagged web page • A tag-type facet between a tag & the associated web page • A user-type facet between a user & a tag used for bookmarking
Implementation- Scoring Indirectly Related Objects • The strength of the indirect relation between object o1 & o2 • U(o, o’) – the corresponding entry in the URM matrix • Equivalent to squaring the URM matrix • Provides the relationship strength of order two between any two objects • Eq. 1 can be generalized to score objects based on their indirect relations with any query • The score vector s0(q) provides the direct scores of all N objects in the system to the query • The score vector s1(q) provides the indirect scores of all objects
Implementation- Scoring Indirectly Related Objects • In addition, objects can be scored according to their relative popularity, or authority • FolkRank or SocialPageRank can be used • Inverse entity frequency (ief) score • N – the # of all objects in the system • No – the # of objects directly related to o • Penalizes objects that are related to many objects in general • The final score of object o for a query q
Implementation- Multifaceted Search • Multifaceted search aims to combine the two main search approaches: • Direct search • Navigational search – offering navigational refinement on the results by categorizing the search results into predefined facets along with the counts of results per facet • Multifaceted search has become the prevailing user interaction mechanism in e-commerce sites • Now being extended to deal with semi-structured data, continuous dimensions, and folksonomies
Implementation- Multifaceted Search • The scores of directly related objects are equivalent to the scores as represented by s0(q) • The score of an indirectly related object, o, is computed by aggregating its relationship strength with all matching documents, multiplied by their direct score • w(o, oi) – the relationship strength between the document oi & its facet o • Equivalent to Eq. 2 since w(o, oi) = U(o, oi) • Indirectly related objects are represented by accumulating all facets of the same type
Implementation- Efficiency Factors • Two issues regarding use of the URM matrix for social search • 1) the need for efficient computation of indirect relations • 2) efficient dynamic updates • The universal query (q = ‘*’) that retrieves all the objects, indexed by the system as well as all objects related to them, has a query runtime of less than four seconds • Dynamic updates are handled by a mechanism that is implemented by storing the changes in an external databases
Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary
Social Search within the Enterprise Textual Query Entity Query
Social Search within the Enterprise- Social Data & Social Search Application • Web 2.0 services of IBM • Dogear – a collaborative bookmarking service (373,821 bookmarks, 234,856 web pages) • BlogCentral – a central blog service (77,930 blog threads) • BluePages – the enterprise directory and employee profile application (15,779 IBMers) • About 700,000 unique entities • Cow Search – the social search application available to all users of IBM’s intranet
Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary
User Study • Our goalwas to measure both the quality of the returned document set and the related users and tags • The evaluation methodologies for documents are well known and have standard measures • There are no standard ways of measuring the quality of related users of tags • A user study was thus used • The retrieved documents were examined and marked with three relevance levels (0-not relevant, 1-marginally relevant, 2-highly relevant) • The quality of search results was measured by the normalized discount cumulative gain (NDCG) measure • To evaluate the effectiveness of the related people, we emailed and asked the 612 random users to rate on a Likert scale of 1 to 5
User Study- Results • Social data contribution to enterprise search • We measure the quality of search results using manual assessments of the top-k search results for the 50 chosen queries
User Study- Results • Related users • Related tags
Outline • Introduction • Related Work • Implementation • Social Search within the Enterprise • User Study • Summary
Summary • Social data is valuable • 1. The high precision of top retrieved documents demonstrate that user feedback identifies high quality content in the corpus • 2. User comments and tags are highly beneficial in general and augment the description of system entities, while providing additional evidence for object popularity • Future research • Exploiting personal social networks for search personalization • Documents or tags recommendations • Quantifying the contribution of social objects to the effectiveness of the search system
Thank You! Any Questions or Comments?