310 likes | 326 Views
Learn about the objectives, content, and services provided by the Bielefeld University Library's BASE, a search engine that integrates academic online information from various sources. Explore the challenges faced, milestones achieved, and future plans of BASE.
E N D
Beyond OAI-Services: Bielefeld Academic Search Engine (BASE) Dirk Pieper, Friedrich Summann Bielefeld University Library
Part 1: Bielefeld UL: from meta search to search engines BASE: objectives, content, services Outlook and further information Part 2: Backend, Frontend OAI dataflow, BASE dataflow OAI harvesting problems Further developments of BASE Overview:
From where we come from … One central on-site library divided into groups of subject libraries • 2 Mio books and other media items, the majority on open shelves • Active registered users in 2004: 28,000 • 2,675 reader workplaces • Budget for acquisitions in 2004: EUR 3,200,000 incl. special funds • Journals: about 5,700 subscriptions • Host of the International Bielefeld Conference series, a conference that offers every two years a major strategic discussion forum for library managers from all over Europe and beyond
From meta search to search engines (1) Integration of heterogenous information resources for users is a primary objective of UL Bielefeld at all times Milestones: 1993 Introduction of the document delivery system JASON 1995 Development of the first German library project for a cooperative electronic information supply IBIS 1998 Introduction of JASON-Subito. Online access to journals available in full-text versions (i.a. by consortial agreements with publishers) 1998-2001 main coordinator of the Digital Library NRW (a major grant of the NRW State Ministry) 2000 Combination of Digital Library NRW services and the library's local website in order to offer integrative services in corporate design 2002 Development of a netbased integrated learning and teaching environment (online learning) based on Blackboard and a university publications server (BieSOn) based on OPUS 2004 Launch of the Bielefeld Academic Search Engine (BASE) on the basis of FAST Data Search Software
From meta search to search engines (2) Integration on the level of library´s local system: OPAC: Local holdings Institutional repository servers (OAI, with focus on fulltext dissertations) Journal Article Database (JADE, about 39 Mio Articles) in combination with document delivery (JASON, Elsevier-ppv, Subito) Inside Serials Elsevier Springer JSTOR … Meta search for several subject portals (Digital Library)
BASE: objectives, content, services (1) First starting point: reality of academic online information web pages publishers‘ejournals library catalogues institutional repository servers subject databases commercial providers search engine digital libraries portals search
BASE: objectives, content, services (2) Second starting point: experience with meta search (Digital Library) and user studies: Users want search engine look and feel Search functionality of meta search environments is too slow when compared to search engines like Google Little integration of fulltext resources Little integration of the “visible web” Main objectives of BASE: to overcome the fragmentation of academic search information resources to use search & retrieval standards provided by search engine technology to provide comfortable search interfaces and flexible result presentation to handle with highly structured and unstructured data to create spacious shared indices for a new kind of “meta” search
BASE: objectives, content, services (3) web pages publishers‘ejournals library catalogues institutional repository servers subject databases search engine for academic online information
BASE: objectives, content, services (5) Services provided by UL Bielefeld within BASE: Identification and selection of high-quality content repositories Contact and negotiations with content providers (universities, libraries, commercial content providers) Data aggregation, data pre-processing and data-processing of internationally distributed and highly heterogeneous ressources Data production (e.g. german enlightment, JADE, ...) Delivering of indexes in standardised formats (XML) for platform-independent re-use by other search engine providers Integration of BASE within meta search environments (e.g. SISIS-Elektra) Providing access to additional content in local OPAC environments
BASE: Outlook and further information (1) The next steps: Leaving the „demonstrator“-status Increase the number of indexed OAI-Servers Integrate local library resources (OPAC and other databases) Integrate more commercial subject databases Increase fulltext indexing More use of FAST-features
DLF Spring Forum New Orleans 2004: http://www.diglib.org/forums/Spring2004/ Norbert Lossau: Search Engine Technology and Digital Libraries, Libraries Need to Discover the Academic Internet, in: D-Lib Magazine, June 2004 (Volume 10, Number 6) Friedrich Summann, Norbert Lossau: Search engine technology and digital libraries : moving from theory to practice, in: D-Lib Magazine, September 2004 (Volume 10, Number 9) http://base.ub.uni-bielefeld.de BASE: Outlook and further information (2)
TUNING, ADMINISTRATION and DEBUGGING SEARCH Search API CONNECTORS WEB CRAWLER General Web Content and Full Text Full Text Collections FILE TRAVERSER OAI-Sources (Metadata+Docs) Database Content (Bibl.Data) FAST based architecture and intelligent modifications Pipeline Pipeline QUERY & RESULT PROCESSING DOCUMENT PROCESSING INDEX FILES FILTER Pipeline
CONNECTORS Added functionalities: Connectors General Web Content and Full Text Full Text Collections OAI-Sources (Metadata+Docs) Database Content (Bibl.Data)
OAI dataflow OAI-Data Harvesting Articles (fulltext) PubMed, Euclid, ArXiv, CiteSeer, Citebase, DOAJ articles All ressources (texts, images, video,refernces .... Dissertations, monographs (fulltext) OPAC Article Database BASE Internal Index (FAST)
BASE dataflow Database Records OAI-Data Web Pages Harvesting Pre-Processing Processing Internal Index (FAST) User interface (PHP)
OAI university repositories in BASE 3 9 22 2 1 9 27 3 USA 34 Canada 7 Australia 8 4 1 3 6 11 1 6
OAI harvesting problems • Non-Responding repositories • Only References without fulltext • Restricted access • Invalid characterset (not well-formed) • Varying Field content
OAI Harvesting : Problems in Practice (Examples 1) <source>http://elib.suub.uni-bremen.de/publications/ ELibD905_diplom_allnoch.pdf</source> <dc:creator>Barry Wellman,Jeffrey Boase,Kakuko Miyata</dc:creator> <dc:subject>Barry Wellman,Jeffrey Boase,Kakuko Miyata The Mobile-izing ....</dc:subject> <dc:title>Talk P. Bruzzone</dc:title> <dc:creator>Bruzzone </dc:creator> <dc:creator>Pierluigi</dc:creator> <dc:date>2004-07-05</dc:date> <dc:type>Review </dc:type><dc:identifier>http://www.rbej.com/content/2/1/52</dc:identifier> Reproductive Biology and Endocrinology 2004, 2:52 doi:10.1186/1477-7827-2-52
OAI Harvesting : Problems in Practice (Examples 2) <dc:identifier>http://www.forex.uni-bremen.de/cgi-bin/forex2/user/publish?search=sqn&sqn=00005223 </dc:identifier>
combining metadata record and corresponding fulltext in result display [Done] Search history [Done] Truncation [Done] Flexible Templating (customised views) Improvement Search Interface (based on search API) Refinement on data deliverer Further Development (1): Frontend
automation of harvesting and content preprocessing Federated search, linking with external indexes search result improvement (ranking, boosting, linguistics) performance optimisation support of standard protocols (Z39.50, OAI, SOAP) as a target system Further Development (2): Backend
Integrating XML queries Link topology analysis Citations analysis Automatic linguistic analysis of anchor texts Push services Personalized ranking Cross-language information retrieval Further Visions