200 likes | 366 Views
Indexing and Retrieval Semantic Search. Fatemeh Lashkari UNB University May 7 th 2014. Outline. Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance. Indexing. Inverted Index Sort-based inversion Single-pass in memory inversion HYB Index
E N D
Indexing and Retrieval Semantic Search FatemehLashkari UNB University May 7th 2014
Outline • Indexing • Semantic Search • Semantic Search Architecture • Index process • Index Maintenance
Indexing • Inverted Index • Sort-based inversion • Single-pass in memory inversion • HYB Index • Prefix search • Autocompletion search • Expansion query and faceted search • Fast error tolerant search • Support ‘’select’’ and ‘’join’’ in database-style
Outline Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance
Semantic Search http://broccoli.cs.uni-freiburg.de/demos/BroccoliFreebase/ Query: “astronauts walk on moon”
Outline • Indexing • Semantic Search • Semantic Search Architecture • Index process • Index Maintenance
Semantic Search Architecture Ontology Text Collection Answers of the question Indexing Query Process
Outline • Indexing • Semantic Search • Semantic Search Architecture • Index process • Parsing • Index Maintenance
Parsing • Preprocessing • Stemming • Lower case General Motors general motors • Remove some of stop words • e.g is, do, a, of, .. • Annotation text • Annotators • Machine learning approaches
Outline • Indexing • Semantic Search • Semantic Search Architecture • Index process • Parsing • Index Structure • Index Maintenance
Index Structure • The fast and efficient index does not • need the whole vocabulary of the indexed collection in main memory • need to sort postings • need merge postings • cache efficiently
Outline • Indexing • Semantic Search • Semantic Search Architecture • Index Process • Parsing • Index Structure • Building Index • Index Maintenance
Building Index (Tasks to Decide) • How many index do we need? • Index for relation • Index for text • What is the structure of vocabulary? • What is the structure of posting? • What are statistic information that a posting contains? e.g <docId, position, score, entity> apple: <6, 10, 0.3, class: fruit> <4, 2,0.9, class: company>
Building Index (Tasks to Decide) • How to compute score to improve the final result? • How to save index? • Distribute index • Process query parallel • Which methods of compression can be used?
Outline • Indexing • Semantic Search • Semantic Search Architecture • Index process • Index Maintenance
Index Maintenance • Strategies for maintaining index: • Merge-based (remerge) • In-place • Hybrid index update operation • Geometric partitioning
Reference 1] Bast, Hannah, and MarjanCelikik. "Fast construction of the HYB index." ACM Transactions on Information Systems (TOIS) 29.3 (2011): 16. 2] Bast, Holger, and Ingmar Weber. "Type less, find more: fast autocompletion search with a succinct index." Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006 [3]Celikik, Marjan, and Hannah Bast. "Fast single-pass construction of a half-inverted index." String Processing and Information Retrieval. Springer Berlin Heidelberg, 2009. [4] Heinz, S., Zobel, J.: Efficient single-pass index construction for text databases. Jour. of the American Society for Information Science and Technology (2003) [5]Celikik, Marjan, and HolgerBast. "Fast error-tolerant search on very large texts." Proceedings of the 2009 ACM symposium on Applied Computing. ACM, 2009. [6] Bast, Holger, DebapriyoMajumdar, and Ingmar Weber. "Efficient interactive query expansion with complete search." Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 2007.
Reference [7] Bast, Hannah, et al. "A case for semantic full-text search." Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search. ACM, 2012. [8] Bast, Holger, et al. "ESTER: efficient search on text, entities, and relations." Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007. [9]Bast, Holger, Fabian Suchanek, and Ingmar Weber. "Semantic Full-Text Search with ESTER: Scalable, Easy, Fast." Data Mining Workshops, 2008. ICDMW'08. IEEE International Conference on. IEEE, 2008. [10] Bast, Hannah, et al. "Broccoli: Semantic full-text search at your fingertips." arXiv preprint arXiv:1207.2615 (2012). [11] Bast, Hannah, and Elmar Haussmann. "Open information extraction via contextual sentence decomposition." Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on. IEEE, 2013. [12] Cheng, Tao, and Kevin Chen-Chuan Chang. "Beyond pages: supporting efficient, scalable entity search with dual-inversion index." Proceedings of the 13th International Conference on Extending Database Technology. ACM, 2010.