550 likes | 720 Views
연결자 기반의 시맨틱 정보 검색 모델 (Connectives based semantic information retrieval model). Contents. Motivation Limitations of keyword-based information retrieval Related Work Overview of Semantic Search Overview of Recommendation A Semantic Information Retrieval Model
E N D
연결자 기반의 시맨틱 정보 검색 모델 (Connectives based semantic information retrieval model)
Contents • Motivation • Limitations of keyword-based information retrieval • Related Work • Overview of Semantic Search • Overview of Recommendation • A Semantic Information Retrieval Model • Unified Graph Model for Semantic Information Retrieval • Modeling • Modeling of Connectives • Modeling of Relationships • Probabilistic Approach to Ranking • Experiments • Conclusion
Motivation • The number and variety of items available on the Web has grown explosively • Two types oftechnologies are widely used to overcome information overload problems user query user profile Useful Information Massive Data Information Retrieval System
Motivation • Most information retrieval (IR) systems are based on keywords due to its simplicity and efficiency [Xu et al., 2008] • Documents and users’ needs (e.g., queries, profiles) are represented with keywords • Keywords are referred to as connectives that connects documents to users’ needs • Exact Matching of Keywords • Since keyword-based IR systems exploit the exact matching of keywords btw documents and users’ needs, it is impossible to return semantically relevant documents Query orProfile = “OLAP” [Balmin et al., 2004]
Limitations of Keyword-based IR[problem] • Semantic Ambiguity of Keywords [Dragut et al., 2006] • Homonym (“apple” as fruit vs. “apple” as company) • Synonym (“movies” vs. “films”) • Example of Homonym Web Documents (items) Information Seeker query term : “apple” index terms : “apple” concepts : “computer” concepts : “fruit”
Limitations of Keyword-based IR[solution] • Semantic Ambiguity of Keywords • Query Expansion Approaches • Co-occurrence [Wolfmanet al., 1999] • Ontology, Thesaurus [Vogel et al., 2005][Gong et al., 2005][Shen et al., 2006] • Query Expansion using Term Co-occurrence • Query Expansion using Concept Keywords Contents related to "Apple" computer (irrelevant to the user) Contents related to "Apple" fruit (relevant to the user)
Limitations of Keyword-based IR[problem] • Sparse Annotation • In keyword-based IR systems, users’ needs and items are represented with “bag of keywords” • Vector-space model • Due to the sparse annotation for items, it is hard to compute the degree of relevance exactly • Some items may be not provided to users, although they are semantically relevant to the given needs t1 t2 t3 t4 t5 t6 t7 t8 q document-term matrix t1 t2 t3 t4 t5 t6 t7 t8 cos(q, d1) = 0 ; it is impossible to retrieve, although the semantic relevance is high d1 d2 d3 d4 d5
Limitations of Keyword-based IR[solution] • Sparse Annotation t1 t2 t3 t4 t5 t6 t7 t8 q t1 t2 t3 t4 t5 t6 t7 t8 d1 cos(q, d1) = 0 ; it is impossible to retrieve, although the semantic relevance is high d2 d3 d4 d5 document-term matrix cos(q, d1) = 0.95 c1 c2 d1 d2 d3 d4 d5 document-concept matrix
Semantic Information Retrieval • Semantic Ambiguity of Keywords • Although documents contain the keywords derived from users’ needs • (queries, profiles), they may be irrelevant to the given users’ needs • Sparse Annotation • Due to the sparse annotation for documents, it is hard to compute exact • relevance between documents and users’ needs • Semantic Information Retrieval • using conceptual matching (semantic relevance) instead of keyword • matching (literal relevance) between documents and users’ needs • concepts (not keywords) are utilized as connectives
Overview of Semantic Search • Logic-based Approaches • Expressing users’ needs (i.e., queries) with specific ontology languages (e.g., RDQL, OWL-QL) • Logically inferred search results are provided to users • OWL-QL [Fikes et al., 2004] • ONTOWEB [Kim, 2005] • OWL-QL [Fikes et al., 2004] • 예시
Overview of Semantic Search • Link-based Approaches (Graph Traverse Approaches) • Searching semantically relevant documents through the hyperlinks between Web documents • TAP [Guha et al., 2003] • Hybrid Spread Activation [Roch et al., 2004] • ObjectRank[Balmin et al., 2004] • ObjectRank P5 “Modeling Multidimensional Databases” P1 “Index Selection for OLAP” P4 “Data Cube: A Relational…” P2 “Range Queries in OLAP Data Cubes” P3 “Implementing Data Cubes Efficiently” Initial Results of “OLAP”
Overview of Semantic Search • Concept-based Approaches • Representing documents and users’ needs with concepts derived from domain knowledge • Some studies regards controlled vocabulary as concepts Concepts (connectives) “music” Company: Starbucks City: Seattle Music: When I Fall in Love Sports Team: Seattle Sonics Person: Howard Schultz
Overview of Semantic Search • Concept-based Approaches • ResultProcessing • Re-ranking documents according to each user’s conceptual profiles • Keywordmatching based approach • OBIWAN [Gauch et al., 2004] • DySe[Rinaldi et al., 2009] • OntoSearch[Jiang et al., 2009] • Query Expansion • Converting queries and documents in a keyword space to those in a concept space • Conceptual matching based approach • Adaptive Vector Approach [Vallet et al., 2005][Castells et al., 2007] • Folksonomy Approach [Wu et al., 2006][Xu et al., 2008]
Overview of Semantic Search • Adaptive Vector Approach [Castells et al., 2007] Concepts of Knowledge Base concept vectors of query & document
Overview of Semantic Search • Folksonomy Approach [Xu et al., 2008] • Regarding tags as concepts • A user annotates a document with tags which represent his/her interests • A document has tags which represent the semantics of the document User annotated tags Web page has tags
Overview of Semantic Search • Comparison of Three Approaches
Overview of Semantic Search • Issues of Previous Concept-based Semantic Searches • Sparse Annotation • It is difficult to completely annotate the semantics of documents (or users’ queries) with a few connectives • Lexical analysis utilizing the exact matching of connectives (i.e., concepts and keywords) • There is a possibility that semantically relevant documents cannot be provided • Example • Concept Vector of user: <1, 1, 0> • Concept Vector of Document: <0, 0, 1> Concepts (connectives) Romeo & Juliet (1968) Hollywood Romance Romeo & Juliet (1996) Movie u
Overview of Semantic Search • Issues of Previous Concept-based Semantic Searches • Authority • It is hard to determine the ranks if some documents have the same degree of relevance • Some search engines such as Google, the authority of documents is used • Documents that are frequently referenced by others have high authority • Example Concepts (connectives) semantic relevance: 0.5 authority: 0.2 Romeo & Juliet (1968) Hollywood Romance Movie u Romeo & Juliet (1996) semantic relevance: 0.5 authority: 0.8
Overview of Recommendation • Content-based Filtering • Recommending documents similar to those a given user has preferred in the past • Similar to keyword search • Collaborative Filtering • Identifying like-minded users whose preferences are similar to those of the given user • Recommending documents that the like-minded users have preferred
Overview of Recommendation • Example of Collaborative Filtering • Identifying like-minded users whose preferences are similar to those of the given user • The preference of user1 is similar to that of userm • Recommending documents that the like-minded users have preferred documents users recommend
Overview of Recommendation • Limitations of Previous Recommendation Systems • Content-based Filtering • Ambiguity of keywords • Sparse Annotation • Collaborative Filtering • Sparse Annotation • Dimension Reduction [Billsus et al., 1998][Sarwar et al., 2000] • Removing insignificant users or documents Loss of information • Hybrid Approaches of Content-based and Collaborative Filtering [Balabanovic et al., 1997][Pazzani et al., 1999] • Keywords-based connectives • Clustering of Users [Chee et al., 2002] • Bad quality of recommendations • Tag [Zanardi et al., 2008][Kim et al., 2010] • Explicit feedback (users’ annoyance or hesitation)
Unified Graph Model for Semantic Information Retrieval • A Unified Graph for Semantic Information Retrieval • Objects are interrelated to each other in the real world • We assume that 4 types of objects are interrelated to each other • Users, documents, terms, concepts (Complete 4-partite graph) • The graph can be expanded to an n-partite graph depending on applications (or domains) documents users accessing d1 c1 preferring d2 submitting relating c2 d3 containing Document-Concept Relationship containing terms concepts
Unified Graph Model for Semantic Information Retrieval • Derivatives in a Unified Graph • Keyword Search • Documents containing keywords submitted by a user are regarded as search results documents users accessing u1 preferring submitting d3 containing submitting relating t1 containing connectives containing terms concepts
Unified Graph Model for Semantic Information Retrieval • Derivatives in a Unified Graph • Conventional Collaborative Filtering • Identifying like-minded users whose preferences are similar to those of an active user • The preferences can be derived from the click-through log (or rating log) connectives documents users accessing u1 preferring accessing d1 accessing submitting relating u2 accessing containing d3 containing terms concepts
Unified Graph Model for Semantic Information Retrieval • Derivatives in a Unified Graph • Concept-based Semantic Search • Representing a user’s query and documents with their corresponding concepts • Documents containing concepts derived from a user’s query are regarded as search results documents users accessing u1 submitting preferring t1 containing submitting relating c1 relating containing d3 connectives containing terms concepts
Unified Graph Model for Semantic Information Retrieval • Derivatives in a Unified Graph • Semantic Collaborative Filtering • Identifying like-minded users by utilizing the concepts derived from users’ preferences, • Although users have accessed different document, it is possible to compute the semantic relevance between them connectives documents users accessing u1 preferring preferring c1 preferring submitting relating u2 accessing containing d3 containing terms concepts
Unified Graph Model for Semantic Information Retrieval • Semantic Information Retrieval in a Unified Graph • Ambiguity of Keywords • Exploiting concept connectives • Sparse Annotation • Exploiting lexical analysis and non-lexical analysis through heterogeneous connectives • Authority • Exploiting collaborative filtering to derived implicit authority • Documents that like-minded users preferred have high authority users documents accessing keyword search preferring collaborative filtering submitting relating concept-based semantic search containing semantic collaborative filtering concepts terms containing
Unified Graph Model for Semantic Information Retrieval • Analysis of Unified Graph documents accessing A few users access many documents containing terms A few documents contain many terms Many users access a few documents • Links btw. Users & Documents Many documents contain a few terms (Sparsity : 0.999) Sparse Relationship • Links btw. Documents & Terms (Sparsity : 0.998)
Unified Graph Model for Semantic Information Retrieval • Analysis of Unified Graph documents Many concepts are related to many documents relating concepts Many concepts are related to many documents • Links btw. Concepts (ODP) & Documents (Sparsity : 0.614) Dense Relationship • Links btw. Concepts (Wikipedia) & Documents (Sparsity : 0.575)
Unified Graph Model for Semantic Information Retrieval • Analysis of Unified Graph terms concepts containing A few terms are contained in many concepts A few terms are contained in many concepts Many terms are contained in a few concepts • Links btw. Terms & Concepts (ODP) (Sparsity : 0.999) Many terms are contained in a few concepts Sparse Relationship • Links btw. Terms & Concepts (Wikipedia) (Sparsity : 0.998)
Unified Graph Model for Semantic Information Retrieval • Analysis of Unified Graph documents users preferring Many users prefer many concepts concepts Many users prefer many concepts • Links btw. Users& Concepts (ODP) (Sparsity : 0.418) Dense Relationship • Links btw. Users& Concepts (Wikipedia) (Sparsity : 0.371)
Unified Graph Model for Semantic Information Retrieval • Types of Relationships documents users accessing Dense Relationships submitting preferring relating Sparse Relationships containing containing terms concepts
Unified Graph Model for Semantic Information Retrieval • Research Questions • What kind of relationship exists between the performance of semantic IR and the density between objects (i.e., nodes in a unified graph)? • What combination of relationships (or connectives) can contribute to the improvement of performance in semantic IR? • Whether both dense relationships and sparse relationships contribute to the improvement of performance or not Performance (e.g., precision) Density
Unified Graph Model for Semantic Information Retrieval • Combination of Relationships Recommen- dation Search users users users documents documents documents 1Dense Relationship (user-concept) concepts concepts concepts terms terms terms NoDense Relationship users users users users documents documents documents documents concepts concepts concepts concepts terms terms terms terms users users users users documents documents documents documents 2Dense Relationships (user-concept & document-concept) 1Dense Relationship (document-concept) concepts concepts concepts concepts terms terms terms terms users users documents users users documents documents documents concepts concepts concepts concepts terms terms terms terms
Unified Graph Model for Semantic Information Retrieval • Comparison of Research Coverage Recommen- dation Search users users users documents documents documents concepts concepts concepts terms terms terms users users users users documents documents documents documents Coverage of Previous Approaches Coverage of Our Approach concepts concepts concepts concepts terms terms terms terms users users users users documents documents documents documents concepts concepts concepts concepts terms terms terms terms users users documents users users documents documents documents concepts concepts concepts concepts terms terms terms terms
Modeling for Connectives - Document • Document • Each document is represented by a |V| dimensional term vector • To remove the effect of document length, the term vector is normalized ,where wn,k isthe weight (tf-idf) of the kth term in dnand V is the set of index terms t1 t3 t2
Modeling for Connectives – User • User • Explicit Approach • A user is represented by keywords that the user explicitly provides to IR systems • Implicit Approach • By analyzing a user’s access log, it is possible to represent the user with keywords derived from his/her access log • A user is defined as the average of term vectors • The derived term vector is normalized to remove the length effect t2 t3 d1 d2 access access t1 t4 u access , whereand d3 t6 t5
Modeling for Connectives – Concept • Concept • Definition from the American Heritage Dictionary • A general idea derived or inferred from specific instancesor occurrences • A concept is defined as the average of term vectors derived from objects (or attributes) that belong to the concept • If the objects are documents, the concept modeling is similar to the user modeling • The derived term vector is also normalized to remove the length effect t1 t3 object or attribute object or attribute t2 t4 belong to , whereand belong to concept belong to object or attribute t6 t5
Modeling for Relationships • Relationships • Explicit Relationship • Relationships that explicitly exist between two types of objects • Examplein Document-Term Relationships documents users User-Document (User Access Log) User-Term (Explicit Approach) Document –Term terms concepts Concept-Term , wherew(di, tk) denotes the weight of kth term in di
Modeling for Relationships • Relationships • Implicit Relationship • Relationships that are inferred (or derived) from explicit relationships documents users Document-Concept Relationship User Modeling (Implicit Approach) terms concepts User-Concept Relationship
Modeling for Relationships • Relationships • Implicit Relationship • Relevance between two objects (oi, oj) is estimated with a conditional probability Pr(oi|oj) • Assuming that prior probabilities Pr(oi), Pr(oj),Pr(er) are constant for their random variables Bayes’ theorem the definition of conditional probability the law of total probability assuming oi and oj areconditionally independent on er Bayes’ theorem relevance between oi & oj connectives connecting oiwith oj
Modeling for Relationships • Relationships • Implicit Relationship between Users and Terms between Documents and Concepts between Users and Concepts users documents Document-Concept Relationship User Modeling (Implicit Approach) terms concepts User-Concept Relationship
Probabilistic Approach to Ranking • Search • KeywordSearch • Semantic Search offline computation (document-concept relationship)
Probabilistic Approach to Ranking • Recommendation (Collaborative Filtering-based Approach) • Conventional Collaborative Filtering • Semantic Collaborative Filtering users documents concepts terms users documents concepts terms offline computation (user-concept relationship)
documents users connectives terms concepts
Contributions • Proposing a Unified Model for Semantic Information Retrieval • 멀티 타입 (multi-typed) 연결자를 이용한 시맨틱 정보 검색 모델 • 시맨틱기반검색 (Search) 및 추천 (Recommendation)을 아우르는 모델 • 관련 연구들은 제안된 모델된 특정 링크 정보를 이용한 특별한 형태임 • Providing a Guide to Ranking in Semantic Information Retrieval • 제안된 모델 내에서 연결자들 사이의 관계를 고려한 랭킹 모델 고찰 및 제안 • 다양한 개념 연결자 타입들을 이용하여, 시맨틱 정보 검색 모델의 특성 고찰 • Resolving Limitations of Previous Approaches • 통합 모델을 이용하여 이전 연구들의 한계점들을 극복 • Ambiguity of Keywords • Sparse Annotation • Exact Matching of Concept-based Approaches • Novelty