170 likes | 271 Views
Similarity Measures for Query Expansion in TopX. Caroline Gherbaoui. Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2 - Informatik. Max-Planck-Institut für Informatik AG 5 - Datenbanken und Informationssysteme Prof. Dr. Gerhard Weikum. Overview.
E N D
SimilarityMeasuresfor Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2 - Informatik Max-Planck-Institut für Informatik AG 5 - Datenbanken und Informationssysteme Prof. Dr. Gerhard Weikum
Overview • background knowledge • similarity measures for the query expansion • evaluation of the computed similarity values • changes in TopX • conclusion
Background • top-k query processing • provides k most relevant results • query expansion • extends source query terms • word sense disambiguation • extracts correct meaning • ontology • amount of terms with their meanings and semantic relations
Word Sense Disambiguation „java, coffee“ „island“ „coffee“ „java “ „programming language“ …
Query Expansion „COFFEE“ „drink, espresso“
TopX • top-k retrieval engine • text and XML data • word sense disambiguation • query expansion • ontology
TopX – WordNet Ontology • lexicon for the English language • hierarchical relations • one relation one direction • ~160,000 words • ~120,000 synsets • ~210,000 relations
TopX – YAGO Ontology • Wikipedia and WordNet • hierarchical and not hierarchical relations • one relation two directions • ~2,100,000 words • ~2,200,000 concepts • ~6,000,000 relations
Similarity Measures • Dice similarity • the already used measure in TopX • NAGA similarity • applied measure for YAGO • Best WordNet similarity • measure with best result among WordNet measures
Dice Similarity Measure • sdfsdf • measures the intersection of two regions
NAGA Similarity Measure • sdfasfsdf • combination of the confidence of a relation and the informativeness of a relation
Best WordNet Similarity Measure • sdfsdfsdf • product of the transfer function of the path length and the transfer function of the concept depth
Evaluation • DICE measure applicable • also on the YAGO ontology • NAGA measure applicable • with omitting of the forward direction • Best WordNet measure not applicable • due to the density of YAGO
Changes for TopX • tuning of some procedures • Dijkstra algorithm • word sense disambiguation • query expansion • extension of configuration file
Conclusion • larger knowledge base • more flexibility • increased complexity • further measure for the similarity computation NAGA similarity