200 likes | 346 Views
Unlock the books with IntelligentCAPTURE. Xavier Baumgartner University of St. Gallen. Outline. 1 Background of the Project: Euregio Bodensee - Library Cooperation Project AGI and VLB = Vorarlberger Landesbibliothek IBH = Internationale Bodenseehochschule 2 Project Partners:
E N D
Unlock the books with IntelligentCAPTURE Xavier Baumgartner University of St. Gallen
Outline • 1 Background of the Project: • Euregio Bodensee - Library Cooperation • Project AGI and VLB = Vorarlberger Landesbibliothek • IBH = Internationale Bodenseehochschule • 2 Project Partners: • AGI: http://www.agi-imc.de/ • Libraries
Outline • 3 Project Tools: • intelligentCAPTURE • IC CAI-Engine • intelligentSEARCH • 4 Project Results: • Library catalogue: http://www.vorarlberg.at/vlb/ • Portal: http://www.dandelon.com
1 BackgroundEuregio Bodensee • Region extending for roughly 50km around Lake Constance • (Bodensee) • Covers the southern German districts of Konstanz, • Sigmaringen, Ravensburg, Lindau, and Oberallgäu und • Bodenseekreis • Austrian province of Vorarlberg • Swiss cantons of St. Gallen, Schaffhausen, Appenzell- • Innerrhoden and Appenzell-Ausserrhoden • Principality of Liechtenstein.
1 BackgroundEuregio Bodensee - Library Cooperation http://www.ub.uni-konstanz.de/euregio/bodkat.htm http://www.ub.uni-konstanz.de/boddb/
1 BackgroundIBH = Internationale Bodensee-HochschuleInternational Lake Constance University • - Virtual University • - Network of 24 independent universities • Aim: promote cooperation among member universities in fields of science, • research and infrastructure • - Use synergies to mutual advantage
2 Project PartnersAGI - Information Management Consultants - Focused on information and knowledge managment - Consulting - Software development and long-term maintenance - Use advanced recognition technologies in: Automatic indexing and text mining (CAI) Machine translation (MT) Optical character recognition (OCR) Recognition of text structures in PDF documents Voice recognition
2 Project PartnersAGI - Information Management Consultants • Products: • based on IBM technical platform Lotus Notes & Domino • intelligentCAPTURE -> tool for document capturing and • machine indexing • IC INDEX -> tool for developing topic maps, taxonomies, • thesauri and classifications • intelligentSEARCH -> tool for information retrieval, • vizualization
2 Project PartnersLibraries - University of Applied Sciences Dornbirn - University of Applied Sciences Kempten - University of Applied Sciences Liechtenstein - Central Library Zurich for University Zurich - University of Applied Sciences Konstanz - University of St. Gallen
3 Project toolsintelligentCAPTURE • Software intelligentCAPTURE installed locally and • connected to scanner • Workflow: • - Identification of document via barcode • - Scanning table of contents of books • - Character recognition process (OCR) • - Quick check of result of OCR
3 Project toolsintelligentCAPTURE • Workflow (cont): • - Generation of PDF file • - Compression of files • - Automatic indexing (CAI engine) • - Transfer of PDF file to file system • - Export of indexing results and PDF files • to Local library system • to Local intelligentSEARCH database • to Central database, hosted by AGI
3 Project toolsIC CAI Engine • Automatic indexing much more specific and comprehensive • than just indexing of title and intellectual indexing with • controlled vocabulary • Document analysis on basis of linguistic methods and • procedures from computer linguistics • All words are reduced to linguistic base form (morphems) • Uses large semantic nets (thesauri, topic maps etc.) • Statistical rules for relevance ranking
3 Project toolsIC CAI-Engine • Output of most important terms in groups: • - geographical terms • - personal/corporate terms • - branches areas of activity • - decriptors: words from internal thesaurus • - important words and phrases from text • Libraries: use broad generic thesaurus, approx. 300‘000 • German terms and smaller English thesaurus • Languages: German and English in use, French and Spanish • available
Library1 Library 2 Library 3 iCAPT ILS iCAPT ILS iCAPT ILS Indexing PDF Indexing PDF Indexing PDF AGI
3 Project toolsintelligent SEARCH • Search engine, simple (Google like) interface, with IBM • GTR (Global Text Retrieval) as core engine • Search terms input -> automatically expanded • semantically • Main features of GTR: • Operators: Boolean, adjacency, near, paragraph sentence, right and left truncation, wildcard, fuzzy searching, sorting by relevance
3 Project toolsintelligent SEARCH • AGI developed features: • - Highlighting • - Interfaces to library system, book seller, web via google • - Query expansion by semantic nets • - Vizualization and browsing of topic maps
4 Project ResultsProject Results • Library OPAC Vorarlberger Landesbibliothek: • http://vlb-katalog.vorarlberg.at • Portal: www.dandelon.com
4 Project results www.dandelon.com • Portal with semantic search engine (intelligentSEARCH) • Content: automatically indexed content pages of books and • other publications; PDF files of contents pages • Search terms expanded semantically • Relevance ranking • - Highlighting
4 Project results www.dandelon.com - Links to libraries holding the book, to booksellers, to internet search engines - View topic maps