AN ADAPTATION OF THE VECTOR-SPACE MODEL FOR ONTOLOGY-BASED INFORMATION RETRIEVAL

AN ADAPTATION OF THE VECTOR-SPACE MODEL FOR ONTOLOGY-BASED INFORMATION RETRIEVAL Authors- Pablo Castells, Miriam Ferna´ndez, and David Vallet PRESENTED BY-AMALA RANGNEKAR

OVERVIEW • INTRODUCTION • EARLIER MODELS’ ISSUES • PROPOSED SYSTEM • SEMI-AUTOMATIC ANNOTATION • WEIGHING ANNOTATIONS • ANNOTATION ISSUES • QUERY PROCESSING • RANKING ALGORITHM • ISSUES, COMBSUM SOLUTION • EXPERIMENTS • FINAL OBSERVATIONS • COMPARISON WITH CONVENTIONAL SYSTEM • STRENGTHS • CURRENT ISSUES

INTRODUCTION • Most search engines use keyword based techniques to return documents in response to user queries. • This approach is Boolean: ‘yes/no’ • A more intelligent IR using semantic search is necessary in combination with the present method. • Any reasons/examples as to why?

EG. US POPULATION FIG.1

EARLIER MODELS’ ISSUES • The Absence of a ‘weight’ for each term in the query. • ‘RELEVANCE’ of a term is not proportional to its ‘FREQUENCY’ . • Not making use of ‘RARITY’ of a term. HOW WOULD THIS HELP?? Eg. Arachnocentric (of spiders)

PROPOSED SYSTEM • ‘Conceptual searching’ techniques for heterogeneous KB have drawbacks. Do you know KIM? (https://www.ontotext.com/sites/default/files/publications/KIM_SAP_ISWC168.pdf) • Ranking: our concern is to rank docs annotated by query answers and not the answers themselves.

PROPOSED SYSTEM • Domain-Concept Superclass base concept(root). • Topic ‘Property’ of a class used for classification. • Document The proxy info. source to be searched upon.

FIG.2

SEMI-AUTOMATIC ANNOTATION • Domain Concept instances stores a multi-valued property called ‘label’ for every instance. (This is the most usual text form of the instance). • Whenever an occurrence is found, an annotation is created between the instance and the document. Instance Annotation Document FIG.3

WEIGHING ANNOTATIONS • ‘Weight’ assigned to every annotation instead of doc. • Shows relevance of instance with doc. • Weight computed by adaptation of TF-IDF algo. • Weight ‘dx’ for any instance ‘x’ occurring in doc ‘d’:

WEIGHING ANNOTATIONS Adaptation of the TF-IDF algorithm • freqx,d: of occurrences in d of the keywords attached to x • maxyfreqy,d: frequency of the most repeated instance in d • nx: #of documents annotated with x • D: the set of all documents in the search space

ANNOTATION ISSUES • METONYMY(Table Tennis=Ping pong) SOLUTION?? • Extending labeling schemes UNRESOLVED ISSUE: • SYNECDOCHE (Picasso.. …The painter also…) • Counting imprecision

QUERY PROCESSING RDQL queries are used to express: • Ontology instances • Document properties • Classification values Variables can be weighted: • Manually • Automatically

QUERY PROCESSING FIG. 4

RANKING ALGORITHM Semantic similarity value between Query and doc. • O: the set of all classes & instances in the ontology • D: the set of all documents • Qx: Extended query vector • Vq: the set of variables in the SELECT clause of q RANKING RETRIEVAL ANNOTATION

RANKING ALGORITHM • w: weight vector (0-1) • T: Tuples in the query result set • D: Doc search space • dx: wt of annotation of doc ‘d’ with instance x • q €Q: an RDQL query • Similarity :

ISSUES, COMBSUM SOLUTION • Normalizing required. • Incomplete KB results in lesser similarity value for even relevant docs. • Method needs to combined with keyword-based algo. Any suggestions for solutions?? • CombSUM

EXPERIMENTS KIM domain ontology and KB Complete KB includes: • 281 classes • 138 properties • 35,689 instances Automatic generation of concept-keyword mapping • 3 * 106annotations • Average observed response time below 30 sec • Weight of query variables set to 1

QUERY A: News about banks that trade on NASDAK, with fiscal net income > 2 million dollars Keyword-based: • Limited expressive power • Fails to express query condition Semantic Search: • Handles condition • Annotates relevant instances Ontology: • KB large, not massive. • KB doesn’t contain all banks hence precision is lesser at 100% recall FIG. 5

QUERY B:News about telecom companies Keyword-based: • KB contains few instances Semantic: • Keyword-based better, so linear combination value better Ontology: • Low precision • KB incomplete FIG. 6

QUERY C: News about insurance companies in USA. Ontology: • Performance is spoiled by incorrect annotations. (Kaye=company and person’s name) Semantic: • Since keyword-based result is better, the linear combination value is also better. FIG. 7

FINAL OBSERVATION • An average comparison of system over 20 queries. • Results: Situations where ontology-only search performs bad are compensated on average. FIG. 8

COMPARISON WITH CONVENTIONAL SYSTEM FIG.9

STRENGTHS • Better recall: Query for specific instances using class hierarchies & rules. • Better precision: Using weights, reducing ambiguities (extending labels), using structured semantic queries. • Combination of conditions on concepts Better results: • With increase in the # of clauses in the formal query • With complete and high quality KB

CURRENT ISSUES Further work neededas follows: • Automatic annotation. • Advanced NLP to replace human supervision. • Score combination strategy. • Model extension with profile of user interests for personalized search. ANY MORE?

THANK YOU FOR LISTENING ANY QUESTIONS?

AN ADAPTATION OF THE VECTOR-SPACE MODEL FOR ONTOLOGY-BASED INFORMATION RETRIEVAL

AN ADAPTATION OF THE VECTOR-SPACE MODEL FOR ONTOLOGY-BASED INFORMATION RETRIEVAL

Presentation Transcript

Information Retrieval on the Semantic Web Using Ontology-based Visualization

Information Retrieval Model

Gravitation-Based Model for Information Retrieval

Retrieval Effectiveness of an Ontology-based Model for Information Selection

An IPC-based vector space model for patent retrieval

An F-Measure for Context-Based Information Retrieval

Ontology : the challenging tool for Web based Information Retrieval System

Ontology Based Context Model

The Vector Space Model

Generalized Vector Space Model

The Vector Space Model

Information Retrieval and Vector Space Model Presented by Jun Miao York University

Free-text Medical Document Retrieval via Phrase-based Vector Space Model

Boolean and Vector Space Retrieval Models

Implementation of Vector Space Model

Vector Space Model

Vector Space Model

Term weighting and Vector space retrieval

Boolean and Vector Space Retrieval Models

Free-text Medical Document Retrieval via Phrase-based Vector Space Model