700 likes | 797 Views
Draft. Prototypes for searching multiple databases. Presentation to CC:DA Taskforce on Metadata 1/16/2000 San Antonio, Texas Steve Miller & Mary Woodley. c. 2000. CHARGE 4. Recommend ways in which libraries may best incorporate the use of metadata schemes into the
E N D
Draft Prototypes for searching multiple databases Presentation to CC:DA Taskforce on Metadata 1/16/2000 San Antonio, Texas Steve Miller & Mary Woodley c. 2000
CHARGE 4 Recommend ways in which libraries may best incorporate the use of metadata schemes into the current library methods or resource description and resource discovery 1. Patron uses ONE search interface to access all information, whether it is a number of different metadata types and standards, databases, and OPAC(s)
CHARGE 4 2. provide a seamless transition to the user to all information available, moving from the ILS system of a front-end search mechanism that accesses numerous resources, to a search Interface that can access all information available in any standard, format, location, or subject. (Example: interface can search local OPAC, World Wide Web, metadata standards (EAD, TEI, GILS, Dublin Core), special collections, museum holdings, etc., and present results in a useable format to the patron through one search mechanism)
DEFINITIONS • Prototype • A virtually seamless access to information and relevant • retrieval of information from the user's point of view. • Seamless • Multiple sources of information can be searched, • resulting in a single list of search results. A prototype • system may provide the user with access to multiple • authority control lists, whether thesauri or LCSH or • whatever, in order to assist in vocabulary usage • and search definition, BEFORE the search is enabled.
DEFINITIONS • INTEROPERABILITY • ability of two or more systems or components to • exchange information and use the exchanged • information without special effort on either system.
Different Kinds of Prototypes: • Searching multiple databases vs. • Collecting multiple metadata records into single database or repository • Searching different metadata semantics vs. • Converting / mapping diverse elements into single semantics standard searched by the interface • Subject-specific vs. Universal search interfaces
Questions for Each Possible Prototype: • Can the user select which databases to search: one or all or a combination of them? • Can the user select specific thesauri, subject headings, name authorities, etc. to use as part of the search? • Can the search results be sorted into lists by type of metadata or by thesaurus/authority file, or are all search results merged together without differentiation?
Three Partial Prototype Interfaces: • NESSTAR: Networked Social Science Tools and Resources • funded by DGXIII of the European Commission under the 4th Framework Telematics Applications Programme • AHDS: Arts and Humanities Data Service • Funded by the Joint Information Systems Committee of the UK’s Higher Education Funding Council • CORC: Cooperative Online Resource Catalog • OCLC project
NESSTAR = Networked Social Science Tools and Resources • a joint development project between • the Norwegian Social Science Data Services (NSD) • UK Data Archive • the Danish Data Archive (DDA) • Provides a common gateway to online social science data resources
Social Science Metadata • No single established standard. • Many local "dialects" of the most common standard. • Different data archives have adapted their metadata to fit different storage and retrieval systems. • = low level of standardization across archives. • DDI: Data Documentation Initiative established in 1995 to create a universally supported metadata standard for the social science community. • NESSTAR is using the XML-version of the DDI-standard as the fundamental structure of its metadata system. • Developed a set of metadata converters to ease translation of existing metadata.
Discovering Resources Across Archives • The resource discovery system of NESSTAR is metadata-driven. • The detailed structure of the DDI-DTD allows users to search for data with a very high precision. • Researchers interested in particular subjects can move beyond keywords and abstracts (normally included in OPACs) and search directly on variable descriptions, question texts, etc. • Searches can also be conducted on concepts such as method of data collection (e.g. telephone interviews, face-to-face interview or self-completion questionnaires) or sampling strategy (e.g. random, stratified, etc).
NESSTAR allows users to: • Locate multiple data sources across national boundaries • Browse detailed metadata about these data • Analyse and visualise data online • Download the appropriate subsets of data in one of a number of formats for local use
Three Search Screens • Simple free text search • Structured search on a selection of fields • (like title, abstract, year etc.) • Advanced boolean search on all relevant fields of the DDI-DTD
NESSTAR Project “Dream Machine”(Social scientists’ ideal data search & retrieval scenario) • All existing empirical data available on-line. • An integrated resource discovery gateway and search-system in order to identify and locate these resources. • Extensive amounts of metadata available (multimedia, hyperlinked and totally integrated with the data as such). • Ability to browse and visualise data on-line. • Ability to convert the data in one of a number of formats and copy, with the metadata, to a local machine. • “Active research agents" (knowbots) mining the net and informing the user when new data within their special field of interest are made available. • Efficient hyperlinks from the data sources to every scientific publication ever produced on the basis of a dataset. • Ditto e-mail/web addresses to all relevant researchers, departments etc. • Efficient feedback system to the body of metadata allowing the user to add to the collective memory of a dataset.
AHDS: Arts and Humanities Data Service • Funded by the Joint Information Systems Committee of the UK’s Higher Education Funding Council Five AHDS Databases: • Archaeology Data Service • History Data Service • Oxford Text Archive • Performing Arts Data Service • Visual Arts Data Service
The AHDS Resource Discovery Challenge • Integrate users’ online access to distributed and heterogeneous information resources. • Each collection presents information about its holdings differently • Several service providers have data exchange and interoperability agreements with third-parties. • The collections comprise a wide variety of resource types, including electronic texts, databases, digital images, geospatial information systems, and time-based film data. • Services have adopted very different resource description and cataloguing practices.
Searching the Databases Each collection can be accessed by one of two methods: • 1) Through each service provider’s native catalog • Different capabilities tailored to the information needs of different scholarly communities and to the resource description requirements of very different digital collections. • 2) Through the common AHDS Gateway • Presents different collections’ catalogs as a virtual uniform catalog and bases search and retrieval capabilities on an unqualified Dublin Core record.
The AHDS Gateway: • Provides a point of access to the electronic resources held by the five AHDS service providers • such as electronic texts, databases, images and video and audio clips, online where possible • Allows users to search across a variety of independent and very different online catalogues as if they were a single catalogue. • Provides access to high-quality information providers of data from the disciplines of archaeology, history, literature and language, the performing and visual arts • Resources selected for their quality and for their relevance to those interested in the Arts & Humanities. • Users also benefit from a more refined approach to searching, being able to search against specified fields (creator, subject, title, etc), which is not possible through most Search engines available on the Internet.
AHDS: “Challenges to be Confronted” • Digital Preservation and Archiving • Requires substantial infrastructural investment. • Controlled Vocabularies • No agreement likely among service providers about their use • OTA uses AACR; VADS uses AAT • Even greater variation in use of date and coverage elements. • How to assist users searching across catalogs with domain-specific controlled vocabularies? Major challenge for the future. • Z39.50 Interoperability • Relatively immature standard; few guidelines for its use yet developed. • Different Z39.50-aware applications may conform to standards yet remain incompatible with one another or interact in ways not meaningful or helpful to the user. • In a wider and impersonal networked environments, means will need to be developed to ensure interoperability.
AHDS: “Challenges to be Confronted” • User Registration, Authentication, and Resource Ordering • AHDS benefits from its circumscribed service environment. • Problems will arise as it integrates third-party systems into its Gateway where such services use independent registration, authentication, and resource ordering services. • A more automated approach will be required to support scholarly and heritage users who wish to locate, scrutinise, and acquire access to information objects of interest irrespective of their location, format, and management. • Users’ Resource Discovery Preferences • AHDS has so far operated with numerous assumptions about users’ resource discovery preferences in a distributed network environment. Those assumptions have shaped the development of the AHDS Gateway and associated systems. • How users actually exploit the Gateway, particularly in relation to their use of underlying Service Provider catalogues, will provide useful feedback for the systems’ further development, but also for applied research into resource discovery systems more generally.
CORC: Cooperative Online Resource Catalog • OCLC research project • Web-based prototype system • Offers both full USMARC cataloging and an enhanced cataloging mode for Dublin Core • Records can be imported into or exported from CORC using: • (1)MARC • (2) HTML • (3) RDF-compliantXML
Authority Control in CORC • OCLC is defining how the authority component will work. • CORC users will have access to a copy of the OCLC Authority File and the ability to create provisional authority records for use by other CORC participants. • The first version of the CORC authority search interface will not offer all cross-references available in the Authority File. • CORC currently supports automation-assisted authority control during resource record creation and editing for selected fields.
The Future of the CORC Project • OCLC will introduce CORC as a production service in July 2000. • CORC 1.0 is the first phase of OCLC's next generation of cataloging services. • Version 1.0 will focus on an optimized metadata creation services for electronic resources and on providing an integrated view of those resources with other bibliographic records in WorldCat (the OCLC Online Union Catalog).
Projects by the Getty Trust: aka http://www.ahip.getty.edu/aka/ (retired) Faces of LA: http://facesla.org (retired) Arthur http://www.ahip.getty.edu/arthur/ (retired) Getty Research Institute auction catalog records: http://opac.pub.getty.edu
aka • Developed by the Getty Information Institute whose • mission was to create and support standards of description, • tools, and guidelines for sharing cultural information • Designed to demonstrate the value of controlled • vocabulary in searching electronic resources, including the • Web • Public mode searched 4 databases with the option of using • Art and Architecture Thesaurus, Union List of Artists • Names; Staff mode searched over 26 databases: relational • databases as well as texts in Web resources
aka System • System components: • Web harvester (public domain) • WAIS text indexing system (public domain) • Vocabulary Searching Interface (Getty developed • using Sybase) • Supported Boolean searching • Ability to expand, limit query & modify results
THE aka INTERFACE Multiple Collections Vocabulary Search Option Text Searching Field Slide courtesy of Marty Harris
The Art and Archaeology Technical Abstracts Avery Index to Architectural Periodicals Bibliography of the History of Art (BHA) Getty Education Institute ArtsEdNet Web Site Getty Museum Collections Management System Getty Research Institute Integrated Catalog, IRIS GII Index of Cultural Heritage Web Sites GRI Photo Study Collection - Antiquities GRI Photo Study Collection - Illuminated Manuscripts GRI Photo Study Collection - Max Hutzel Collection International Repertory of the Literature of Art (RILA) Provenance Index Sale Catalogues Provenance Index Sale Contents The American Film Institute OnLine The Autry Museum of Western Heritage CSU Northridge: Special Collections and Archives CSU Northridge: Special Collections and Archives Fine Arts Museum of San Francisco Art Imagebase Istituto Centrale per il Catalogo e la Documentazione IIstituto Centrale per il Catalogo e la Documentazione II L.A. Library: People, Places and Events in Southern California L.A. Library: Historic Photographs of Southern California Lycos Image Index Museum of Tolerance: Photos of Children UCLA Fowler Museum of Cultural History USC Ethnic Studies Collection Alta Vista aka IMPLEMENTATIONS • aka provides internal searching access to 26 GII research, and partner databases and two general search resource databases. Slide courtesy of Marty Harris
©J. Paul Getty Trust Slide courtesy of Murtha Baca
Retrieval using “a.k.a.” • User has concept in mind • Search Scope Note for “Egyptian funerary” • Implementation in search engine • E.g., User chooses databases to query • User may use Getty vocabulary to broaden or refine search From AAT ushabti shabti shawabti ushabtis ushabtiu "ushabti" OR "shabti" OR "shawabti" OR "ushabtis" OR "ushabtiu" © J. Paul Getty Trust Slide courtesy of Patricia Harpring
aat Search results ©J. Paul Getty Trust Slide courtesy of Murtha Baca
aat Search results ©J. Paul Getty Trust Slide courtesy of Murtha Baca
aka Limitations • Currency • Labor intensive • Although vocabularies increased precision, • effectiveness reduced by the fact not all databases • used the same vocabularies or used the • vocabularies only in limited ways • Could not generate a search from the Vocabulary lists • but had to type in the terms
THE aka “FACES of LA“ INTERFACE Text Searching Field Vocabulary Search Option Multiple Collections Slide courtesy of Marty Harris
Auction Catalog Record Different cataloging standard, same machine readable format (MARC)