210 likes | 223 Views
Explore the new challenges facing open access repositories and the advancements in BASE search service at the Univ. of Glasgow 2006 event. Discover how BASE integrates FAST Data Search for efficient data retrieval, BASE interfaces written in PHP for user-friendly experience, and the use of web crawlers for content indexing. Learn about the complexities of OAI harvesting, document processing, and tuning repository search functions. Dive into BASE's document processing, tuning, administration, and debugging features with web crawlers and file traversal methods. Stay updated on the latest in scholarly document retrieval through BASE's refined search capabilities with integrated Google Scholar citations.
E N D
Open Scholarship 2006 New Challenges for Open Access Repositories Univ. of Glasgow, 18-20 October 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Friedrich Summann Bielefeld University Library
Overview: • BASE: concept and content • Overview BASE user-interface and further visions • BASE dataflow • OAI harvesting challenges • BASE interfaces • Demo
BASE: concept and content • BASE uses Fast Data Search • BASE uses Linux-based multi-node system • BASE contains intellectual selected resources with focus on OAI Servers but also web crawled content • BASE displays result lists as bibliographic data and full text hits • BASE frontend is written in PHP using the search API from Fast Data Search • BASE offers sorting, search refinement and search history http://www.base-search.net
TUNING, ADMINISTRATION and DEBUGGING SEARCH Search API CONNECTORS WEB CRAWLER FILE TRAVERSER BASE: concept and content Pipeline Pipeline QUERY & RESULT PROCESSING DOCUMENT PROCESSING INDEX FILES FILTER Pipeline
BASE: concept and content At present 3.8 mio documents in 274 collections, 15 of them web crawled data
Special view on IR server collections • Collections are listed in configuration file [ftubirmingham] url = "http://eprints.bham.ac.uk/" desc_de = "The Univ. of Birmingham: Eprints Archive" desc_en = "The Univ. of Birmingham: Eprints Archive" descdd_de = "Birmingham Univ." descdd_en = "Birmingham Univ." • Collections can be clustered for user-interface, e.g. “Institutional Repositories Europe” consists of [ftubarcelona], [ftubath], [ftubristol] , [ftuhelsinki], … • Parametric search possible • Frontend is ready for multi view (independent views with own configuration and layouts on the same backend)
BASE: end-user interface (1) Displays search results as bibliographic data and full text hits
BASE: end-user interface (2) If the document contains meta data (e.g. title, author, abstract) the displayed description is highlighted The result list (left hand side)
BASE: end-user interface (3) The result list (right hand side) Various options to sort the result set Search refinement by author, keyword, document type, language etc. Search history comprises up to 10 queries
BASE: end-user interface (4) Select an author ... Search Refinement ... only documents by this author are displayed
Google Scholar integration Check citations (citing articles) in Google Scholar ...
BASE dataflow Database Records Web Pages OAI-Data Harvesting Pre-Processing Processing Internal Index (FAST) User interface (PHP)
OAI-compliant university repositories in BASE 4 3 18 39 USA 82 Canada 14 South America 2 Africa 3 India 5 Australia 11 New Zealand 1 3 17 2 6 55 1 12 7 1 3 12 16 2 1
OAI harvesting challenges • Repositories do not response or deliver Error Messages • Links to the Document are not included or do not work • XML file is not well-formed • Data contain only References without any Fulltext • Access to fulltext often is restricted • Field content varies
Some Rules from the Harvesting Practice • Standard repository software is great • - for OAI harvesting as well • Small collections – small problems • Getting the related fulltext is complicated • Libraries produce better metadata • Writing e-mails helps - sometimes • Data aggregation may produce problems
BASE interfaces • Search form • HTTP calls • Web Service
Local integration (via search form) E-Repository Integration <form action="http://www.base-search.net/index.php" method="post" accept-charset="UTF-8"> <input maxlength="512" name="q" type="text" size="50" /> <input value="Search!" type="submit" /> <input value="all" name="s" type="hidden" /> </form>
Prototype: Search Based on SOAP interface (EU project DRIVER)