1 / 21

Enhancing Open Access Repositories: Challenges and Innovations

Explore the new challenges facing open access repositories and the advancements in BASE search service at the Univ. of Glasgow 2006 event. Discover how BASE integrates FAST Data Search for efficient data retrieval, BASE interfaces written in PHP for user-friendly experience, and the use of web crawlers for content indexing. Learn about the complexities of OAI harvesting, document processing, and tuning repository search functions. Dive into BASE's document processing, tuning, administration, and debugging features with web crawlers and file traversal methods. Stay updated on the latest in scholarly document retrieval through BASE's refined search capabilities with integrated Google Scholar citations.

Download Presentation

Enhancing Open Access Repositories: Challenges and Innovations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open Scholarship 2006 New Challenges for Open Access Repositories Univ. of Glasgow, 18-20 October 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Friedrich Summann Bielefeld University Library

  2. Overview: • BASE: concept and content • Overview BASE user-interface and further visions • BASE dataflow • OAI harvesting challenges • BASE interfaces • Demo

  3. BASE: concept and content • BASE uses Fast Data Search • BASE uses Linux-based multi-node system • BASE contains intellectual selected resources with focus on OAI Servers but also web crawled content • BASE displays result lists as bibliographic data and full text hits • BASE frontend is written in PHP using the search API from Fast Data Search • BASE offers sorting, search refinement and search history http://www.base-search.net

  4. TUNING, ADMINISTRATION and DEBUGGING SEARCH Search API CONNECTORS WEB CRAWLER FILE TRAVERSER BASE: concept and content Pipeline Pipeline QUERY & RESULT PROCESSING DOCUMENT PROCESSING INDEX FILES FILTER Pipeline

  5. BASE: concept and content At present 3.8 mio documents in 274 collections, 15 of them web crawled data

  6. BASE: concept and content

  7. Special view on IR server collections • Collections are listed in configuration file [ftubirmingham] url = "http://eprints.bham.ac.uk/" desc_de = "The Univ. of Birmingham: Eprints Archive" desc_en = "The Univ. of Birmingham: Eprints Archive" descdd_de = "Birmingham Univ." descdd_en = "Birmingham Univ." • Collections can be clustered for user-interface, e.g. “Institutional Repositories Europe” consists of [ftubarcelona], [ftubath], [ftubristol] , [ftuhelsinki], … • Parametric search possible • Frontend is ready for multi view (independent views with own configuration and layouts on the same backend)

  8. BASE: end-user interface (1) Displays search results as bibliographic data and full text hits

  9. BASE: end-user interface (2) If the document contains meta data (e.g. title, author, abstract) the displayed description is highlighted The result list (left hand side)

  10. BASE: end-user interface (3) The result list (right hand side) Various options to sort the result set Search refinement by author, keyword, document type, language etc. Search history comprises up to 10 queries

  11. BASE: end-user interface (4) Select an author ... Search Refinement ... only documents by this author are displayed

  12. Google Scholar integration Check citations (citing articles) in Google Scholar ...

  13. Vision: DDC Browsing

  14. BASE dataflow Database Records Web Pages OAI-Data Harvesting Pre-Processing Processing Internal Index (FAST) User interface (PHP)

  15. OAI-compliant university repositories in BASE 4 3 18 39 USA 82 Canada 14 South America 2 Africa 3 India 5 Australia 11 New Zealand 1 3 17 2 6 55 1 12 7 1 3 12 16 2 1

  16. OAI harvesting challenges • Repositories do not response or deliver Error Messages • Links to the Document are not included or do not work • XML file is not well-formed • Data contain only References without any Fulltext • Access to fulltext often is restricted • Field content varies

  17. Some Rules from the Harvesting Practice • Standard repository software is great • - for OAI harvesting as well • Small collections – small problems • Getting the related fulltext is complicated • Libraries produce better metadata • Writing e-mails helps - sometimes • Data aggregation may produce problems

  18. BASE interfaces • Search form • HTTP calls • Web Service

  19. Local integration (via search form) E-Repository Integration <form action="http://www.base-search.net/index.php" method="post" accept-charset="UTF-8"> <input maxlength="512" name="q" type="text" size="50" /> <input value="Search!" type="submit" /> <input value="all" name="s" type="hidden" /> </form>

  20. Prototype: Search Based on SOAP interface (EU project DRIVER)

  21. Thank you!

More Related