230 likes | 351 Views
Findings from the Mellon Metadata Harvesting Initiative. Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003. Overview. Highlights of the Mellon projects Findings regarding metadata harvesting Questions about the context of metadata and metadata harvesting
E N D
Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003
Overview • Highlights of the Mellon projects • Findings regarding metadata harvesting • Questions about the context of metadata and metadata harvesting • Next steps, subsequent research projects ECDL 2003 – Trondheim, Norway
Andrew W. Mellon Foundation • Mellon is a major U.S. private philanthropic foundation that has been involved with the OAI-PMH from the beginning • Sought to foster projects exploring how the OAI-PMH could be used by libraries and other organizations supporting research to make metadata concerning scholarly collections more visible to users • Funded seven projects in 2001 with total of US $1.5M ECDL 2003 – Trondheim, Norway
Seven Projects • University of Illinois at Urbana-Champaign • The University of Michigan (OAIster) • Emory University (MetaArchive) • SOLINET / ASERL (AmericanSouth) • The Research Libraries Group (RLG) • University of Virginia • (Woodrow Wilson International Center for Scholars at the Smithsonian) ECDL 2003 – Trondheim, Norway
Highlights of Projects • OAIster and UIUC Repository harvested millions of records and developed sophisticated search tools • Emory and SOLINET MetaScholar projects harvested focused collections, enhanced existing OSS harvesting tools, formed teams of scholars and librarians to study the process and context of metadata harvesting for research portals • Other projects examined internal uses of OAI-PMH for cultural scholarship ECDL 2003 – Trondheim, Norway
Metadata Harvesting Findings:Slow Adoption of the OAI-PMH • Most institutions with cultural materials collections have not yet implemented the protocol in the 2002-2003 period • This is due to many reasons: lack of institutional priority, insufficient technical staff, little organizational understanding of the benefits of the protocol • However, both Emory and Illinois found that centralized regional centers providing relatively modest OAI technical expertise to other libraries was very effective in fostering adoption of the protocol ECDL 2003 – Trondheim, Norway
Metadata Harvesting Findings:Problems with Institutional Metadata • Wide variations in implementation of Unqualified Dublin Core (UDC) descriptive metadata elements • Duplication of records between collaborating institutions, difficult to de-dupe due to lack of unique inter-institutional identifiers • Format incompatibilities/collisions, especially between Encoded Archival Descriptions (EAD) and UDC record perspectives • Inconsistent access restrictions to content leads to confusion by users ECDL 2003 – Trondheim, Norway
Metadata Harvesting Findings:Problems with Inst. Metadata (cont.) • No controlled vocabulary in effect for any UDC field, nor would this make sense for most fields • Although universal systems such as US Library of Congress Subject Headings (LCSH) exist, they are not granular enough for most repositories • No uniform mechanism in place to express dates or locations (coverage), which can mean many things in UDC, and no authority control for creator field • 96% of institutional repositories using Eprints software do not use standard controlled vocabularies ECDL 2003 – Trondheim, Norway
Metadata Harvesting Findings:Need for Metadata Gardening • The best way to make metadata effective cross-institutionally is to coordinate the entire life cycle of metadata production • Uncoordinated harvesting is relatively easy to do, but the resulting metadata aggregation then suffers from all the problems previously described and needs remediation (which may be effectively impossible) ECDL 2003 – Trondheim, Norway
Metadata Harvesting Findings:Need for Metadata Gardening (cont.) • Coordinated gardening of metadata is the long-standing solution to this problem • Examples include virtually any community of information users that have come up with consistent standards for the metadata they share • The problem is that new information communities are still forming, having been enabled by the OAI-PMH • Mature information communities are mature precisely because they have well-understood standards and practice in using and sharing information ECDL 2003 – Trondheim, Norway
Metadata Context • Metadata without a context is useless, much like encrypted information without the key • Metadata is considered useful precisely because it is created in particular contexts by particular communities • OAI-PMH only prescribes UDC format • UDC is some context, and is (probably?) better than nothing, but many groups inaccurately thought that it was enough context to build robust discovery systems around ECDL 2003 – Trondheim, Norway
Metadata Context Findings:Recovering Context • Different opinions among the projects over how to recover context for aggregated heterogeneous metadata • OAIster made some efforts to normalize some UDC metadata fields after harvesting (UDC type field) • Illinois developed mechanism for displaying original EAD context of records disaggregated from finding aid series information • Emory/SOLINET AmericanSouth has a team of nationally renowned scholars studying how online scholarship can contextualize metadata and vice versa ECDL 2003 – Trondheim, Norway
Metadata Context Findings:Harvesters vs. other Discovery Systems • How do we understand harvesters vs. online catalogs, Google, and commercial databases? • How do we articulate the difference to users? • What information should we aggregate and make searchable? Metadata and crawled web content? Very different information realms need to be bridged through new federated search mechanisms ECDL 2003 – Trondheim, Norway
Next Steps for Emory, Michigan, and Illinois • All of these projects learned a great deal during the Mellon Metadata Harvesting Initiative that has informed their subsequent planning for new services • All of these projects are in the process of being mainstreamed using various strategies • All of these projects continue to grapple with metadata quality and context issues ECDL 2003 – Trondheim, Norway
Next Steps: Illinois • Additional research is being undertaken on the integration of EAD and OAI • Beginning a three year collaboration with the research libraries of other Committee on Institutional Cooperation (CIC) institutions to study the potential of OAI-PMH to facilitate resource sharing • NSF grant to develop digital libraries for scientific communities in connection with National Science Digital Library (NSDL) • Institute for Museum and Library Services (IMLS) grant to develop an OAI-based registry of IMLS projects ECDL 2003 – Trondheim, Norway
Next Steps: Michigan • Working on further techniques for metadata remediation • De-duplication • Normalization of more UDC fields • Further tailoring of metadata for research purposes • Exploring use of OAIster in connection with campus courseware initatives ECDL 2003 – Trondheim, Norway
Next Steps: Emory • Undertaking further modeling of scholarly portals based on metadata harvesting, with application to an international Irish Literature portal • New grant from the Mellon Foundation to build on previous projects • Experiments in semantic clustering of metadata using support vector machines • Exploration of combining metadata harvesting and web crawling • Developing frameworks for federating loosely-coupled digital library components ECDL 2003 – Trondheim, Norway
Appreciation • Enormous thanks go to the Andrew W. Mellon Foundation for advancing the understanding of metadata harvesting applications through these projects • Mellon continues to be a driving force in the United States and internationally for research into digital library experiments benefiting scholarly communication ECDL 2003 – Trondheim, Norway
Contacts • Martin Halbert (mhalber@emory.edu) 404-727-2204 • Kat Hagedorn (khage@umich.edu) • Joanne Kaczmarek (jkaczmar@uiuc.edu) ECDL 2003 – Trondheim, Norway