220 likes | 353 Views
OAI & NSDL Research at Grainger Briefing to UIUC Library Faculty 15 April 2003. Timothy W. Cole ( t-cole3@uiuc.edu ) William H. Mischo ( w-mischo@uiuc.edu ) http://dli.grainger.uiuc.edu/Publications/TWCole/LibFac2003/. Projects. Open Archives Initiative:
E N D
OAI & NSDL Research at GraingerBriefing to UIUC Library Faculty15 April 2003 Timothy W. Cole (t-cole3@uiuc.edu) William H. Mischo (w-mischo@uiuc.edu) http://dli.grainger.uiuc.edu/Publications/TWCole/LibFac2003/
Projects • Open Archives Initiative: • Illinois OAI Metadata Harvesting (Mellon) • IMLS Digital Collections & Content (IMLS) • Grainger OAI Resources in Science & Engineering • National Science Digital Library: • 2nd Generation Math Resources (NSF / DUE)
OAI Protocol for Metadata Harvesting • Harvesting approachto interoperabilityat metadata level • Divides world intoMetadata Providers& Service Providers • Builds on HTTP,XML, & Dublin Core http://www.openarchives.org/
OAI is a tool • All about moving metadata (not data) around • A building block, useable by many communities – supports new models of scholarly communication • Can facilitate, in some cases enable, advanced digital library services & functions • Assumes widely distributed content, butcentralized indexing(!) – requires critical mass • Providers build once, share many times Purpose of OAI is to foster interoperability
Harvesting vs. Federation • Competing approaches to interoperability • Federation is when services are run remotely on remote data (e.g. Broadcast Searching) • Harvesting is when data/metadata is transferred from the remote source to the destination where the services are located (e.g. Union Catalogs) • Federation requires more effort at each remote source but is easier for the central system and vice versa for harvesting OAI focuses on harvesting
Reliance on HTTP, XML, DC • OAI is a REpresentational State Transfer (REST) protocol – i.e., URL-based • Z39.50, Web services, SOAP are RPC-based • OAI requests are sent via the HTTP protocol using GET or POST • OAI responses are valid XML documents • XML allows validation, increases reliability of what’s harvested (in terms of structure) • DC is OAI’s Lowest Common Denominator • Communities encouraged to use additional schemas
How OAI Works OAI “VERBS” Identify ListMetadataFormats ListSets ListIdentifiers ListRecords GetRecord Service Provider Metadata Provider H A R VESTER REPOSITORY OAI HTTP Request OAI (OAI Verb) HTTP Response (Valid XML)
Mellon-OAI Project • Create a web portal to scholarly information resources in cultural heritage harvested via OAI • Primary objectives: • Develop & make available OAI harvesting tools • Build harvesting and search services • Investigate viability and utility of searching OAI harvested resources • Explore issues of advanced search/indexing/display • Explore user needs & metadata usage patterns • Identify critical issues and best practices for using OAI with cultural heritage material
Mellon-OAI Achievements • Developed harvesting tools (Open Source) • Refined data provider tools (Open Source) • Investigated logistics of harvesting activities • Investigated metadata provider usage of DC, EAD • Created XSL stylesheets for metadata transformations (MARC to DC; EAD to DC) • Experimented w/ configurations to address scalability & performance issues • Usability testing with students in College of Education
Metadata aggregation • 39 providers (OAI-compliant and surrogates) • Metadata describing resources of 580 institutions (CIMI, CDP) • 1.1 million original records • 2.6 million including item-level records derived from EAD finding aids
IMLS Digital Collections & Content • Build registry of all National Leadership Grant collections with digital content. • Assist & guide NLG projects in making item-level metadata sharable using OAI. • Build repository, search & discovery tools for integrated access to content of NLG collections • Research best practices for sharing metadata about diverse digital content & supporting interests of diverse user communities. Collaboration between UIUC Library, GSLIS, & IMLS
Project Sites • UIUC OAI Cultural Heritage Repository • Mellon-OAI Project Site • IMLS DCC Project Site
National Science Foundation NSDL Program • National Science, Technology, Engineering, Mathematics Digital Library. • http://www.nsdl.org/ • Coverage: K to Grey. • National system for distributed science education; characterized by a set of exemplary resource collections and services. • Highly competitive grants: 3 years, 339 proposals, 105 funded; three main categories: collections, services and targeted research.
2nd Generation Math Resources • Collaboration with UIUC Library, Wolfram Research Inc., & COE Dept of Theoretical and Applied Mechanics. • Project Objectives: • Adding interactive and graphical content to two feature-rich Wolfram sites. • Generating and extracting OAI-compliant metadata, establishing OAI Provider site, adding mathematics controlled vocabulary terms. • Developing courseware and problem libraries for TAM courses.
Providing Metadata to NSDL • Exposing metadata via OAI • Preferred method for bringing metadata into the NSDL repository (requires little manual intervention) • Sending metadata via ftp • Enabling metadata "scraping" • Creating and editing directly to the NSDL metadata repository See also: NSDL Metadata Primer
Wolfram Functions Web Site Source HTML Page Derived Metadata <dc:identifier><dc:description> <dc:date> <dc:rights>
Wolfram Functions Web Site Source HTML Head Extracted Metadata <html> <head> <title>Square root: Primary…</title> <meta name='Description' content='Primary definition …' > <meta name='Keywords' content='Sqrt, square root, …' > <meta http-equiv='Content-Type' content='text/html; charset=iso-…'> </head> … <dc:title> <dc:description> <dc:subject> <dc:subject> <dc:format>
Sample Metadata File for a Wolfram Functions Web Page <oai_dc:dc … > <dc:title>Square root: Primary definition (formula …</dc:title> <dc:subject>Sqrt</dc:subject> <dc:subject>square root</dc:subject> … <dc:description>Primary definition (2 formulas)</dc:description> <dc:description><math … </math></dc:description> <dc:date>2001-10-29</dc:date> <dc:publisher>Wolfram Research, Inc.</dc:publisher> <dc:type>Text</dc:type> <dc:format>text/html; charset=iso-8859-1</dc:format> <dc:identifier>http://functions.wolfram.com…/Sqrt/02/0001/</dc:identifier> <dc:identifier>http://functions.wolfram…/01.01.02.0001.01</dc:identifier> <dc:language>en</dc:language> <dc:rights>© 2002 Wolfram Research, Inc.</dc:rights> </oai_dc:dc>
The NSDL metadata repository Core Integration Project – Cornell, Columbia, DLESE. The metadata repository is a resource for service providers. It holds information about every collection and item known to the NSDL. Services Users Metadata repository From “The NSDL Metadata Strategy,” A presentation by William Y. Arms and Diane I. Hillman. Available: http://nsdl.comm.nsdlib.org/allprojects01/metastrategy.ppt Collections
Working Assumptions • The WWW is the primary medium (for now) • Content is a mix of “born digital” and analog • There is no lack of “great piles of ‘stuff’ ” • There is a need for “piles of great ‘stuff’ ” • The “unit” of content can and will shrink • Users will increasingly be creators, and vice versa • While much of the use will be “free”, there is a need to explore multiple models of sustainability • Experimental nature of distributed digital library building - “one library, many portals”
Related Links • http://mathworld.wolfram.com/ • http://functions.wolfram.com/ • OAI Resources in Science & Engineering