1 / 41

Tools and Techniques for Creating, Maintaining, and Distributing Shareable Metadata

Tools and Techniques for Creating, Maintaining, and Distributing Shareable Metadata. Jenn Riley Metadata Librarian Indiana University Digital Library Program. What does this record describe?. <dc:identifier>http ://museum.university.edu/unique identifier </dc:identifier>

ccrockett
Download Presentation

Tools and Techniques for Creating, Maintaining, and Distributing Shareable Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tools and Techniques for Creating, Maintaining, and Distributing Shareable Metadata Jenn Riley Metadata Librarian Indiana University Digital Library Program

  2. What does this record describe? <dc:identifier>http://museum.university.edu/unique identifier</dc:identifier> <dc:publisher>State University Museum of Ichthyology, Fish Field Notes</dc:publisher> <dc:format>jpeg</dc:format> <dc:rights>These pages may be freely searched and displayed. Permission must be received for subsequent distribution in print or electronically. Please go to http://museum,univeristy,edu/ for more information.</dc:rights> <dc:type>image</dc:type> <dc:description>1926; 0070; 06; Little S. Br. Pere Marquette R.; THL26-68; 71300; 71301; 71302; 71303; 71304; 71305; 71306; 71307; 71308; 71309; 07; 1926/07/06; R12W; S09; Second collector Moody; T16N</dc:description> <dc:subject>Cottus bairdi; Esox lucius; Cottus cognatus; Etheostoma nigrum; Salmo trutta; Oncorhynchus mykiss; Catostomus commersoni; Pimephales notatus; Margariscus margarita; Rhinichthys atratulus; mottled sculpin; northern pike; slimy sculpin; johnny darter; brown trout; rainbow trout; white sucker; bluntnose minnow; pearl dace; blacknose dace; bairdi; lucius; cognatus; nigrum; trutta; mykiss; commersoni; notatus; margarita; atratulus; Cottus; Esox; Cottus; Etheostoma; Salmo; Oncorhynchus; Catostomus; Pimephales; Margariscus; Rhinichthys; 1926-07-06; ; Boleosoma; Salmo; Hyborhynchus; Semotilus; ; fario; gairdneri--irideus; atronasus--obtusus--meleagris</dc:subject> <dc:language>UND</dc:language> <dc:source>Michigan 1926 Langlois, v. 1 1926--1926; </dc:source> Record harvested via OAI PMH 2-27-2007

  3. ????? Collection Registries GEM Photograph from Indiana UniversityCharles W. Cushman Collection

  4. Why we should care • Library/archive/museum data is useful • Even when objects aren’t digitized • It’s our mission to distribute information • We should be leaders in the networked information environment • We have good ideas, but others do too We should therefore make it easier for our data to be used by others

  5. Shareable Metadata… • Is quality metadata • Promotes search interoperability - “the ability to perform a search over diverse sets of metadata records and obtain meaningful results” (Priscilla Caplan) • Is human understandable outside of its local context • Is useful outside of its local context • Preferably is machine processable

  6. Shareable Metadata as a View • Metadata is not monolithic • Metadata should be a view projected from a single information object • Create multiple views appropriate for groups of important sharing venues • Depends on: • Use • Audience

  7. The 6 Cs & Lots of Ss of Shareable Metadata Content Coherence Context Communication Consistency Conformance to Standards

  8. Content • How element values are structured affect whether the record is shareable • For your institution, the resource and the defined audience choose the appropriate: • Vocabularies • Content standards • Granularity of description • Version of the resource to describe • Elements to use • Don’t include empty elements in shared records

  9. Coherence • A shareable metadata record should make sense on its own, outside of the local institutional context and without access to the resource itself • Place values in appropriate elements • Repeat elements instead of “packing” multiple values into one field • Avoid local jargon, abbreviations and codes • Ensure mappings from local to shared metadata formats result in coherent records

  10. Context • Appropriate context allows a user to understand a resource based on the metadata record alone • Shareable metadata records should: • Include information not used locally • Exclude information only used locally • Collection level records can help, but don’t rely on them

  11. Communication • Information supplementing your metadata records can be useful to an aggregator • Intended audiences • Record creation methods • Controlled vocabularies used • Content standards used • Accrual practices • Existence of analytical or supplementary materials • Provenance of materials • Can be within or external to a sharing protocol

  12. Consistency • Consistency allows aggregators to apply same indexing or enhancement logic to an entire group of records • Can be affected by change in policy or personnel over time • Pay special attention to consistency of: • How metadata elements are used • How (and which) vocabularies are used for a particular element • Syntax encoding schemes

  13. Conformance to Standards • Technical conformance to all types of standards is essential. Without it, processing tools and routines simply break. • Sharing protocols (e.g. OAI-PMH) • Metadata structure standards • Controlled vocabularies and syntax encoding schemes • Content standards • Technical standards (e.g. XML, character encoding)

  14. Generic high-level workflow Transform Plan Choose standards for native metadata Perform conceptual mapping Create Share Assess Implement sharing protocol See who is collecting your metadata Who to share with? Perform technical mapping Create metadata (thinking about shareability) Communicate with aggregators Review your metadata in aggregations Choose shared metadata formats Validate transformed metadata Write metadata creation guidelines Test shared metadata with protocol conformance tools

  15. No single “right” workflow exists for all situations • Our tools sometimes dictate parts of our workflow • Be careful not to let them do this too much - tools serve us, not vice-versa • Start workflow design from well-defined goals (not processes) • Fundamental principles to follow • Put the right information in from of the right person at the right time • Ensure shareability is a common theme underlying it all • Generate multiple views from a single master

  16. Choose the best tools for the job • Important every step of the way • Programming languages • Commercial or open-source software packages • Repository solutions • Metadata creation interfaces • Promotes both efficiency and quality • Define needed functionality, and negotiate (compromise) from there

  17. Thinking big picture • Must find a reasonable balance between the perfect solution for a single set of materials and fully streamlined processes that treat everything the same way • One approach - define categories of material and design reusable workflows for each

  18. Defining categories of material • By resource type • Text • Documentary images • Art images • Musical audio recordings • etc…. (including getting more specific) • By managing institution? • May provide barriers for our users - see Elings/Waibel: “Metadata for All” article in First Monday, 2007 • But institutional mission is a factor in determining the appropriate views of a resource to share

  19. Reusable parts of workflow • Decisions on metadata structure standards, content standards, controlled vocabularies, etc. • Metadata creation tools • Automated processing techniques • XSLT stylesheets and other data management code • SIP/AIP/DIP architecture • Delivery systems

  20. Generalization is worth the effort • You will have to go back and do it again at some point • Fixing typos, errors, etc. • Adding new content over time • Adding new metadata format or sharing mechanism • Migration to another system • Need both workflow tools and documentation to be accessible • Generalization will allow you to minimize the effort redoing something and focus more on the new stuff

  21. Make the most of automation • Automate the repetitive tasks as much as feasible, but only where it makes sense • For example: • Create as much technical metadata as possible from the file itself • Derive basic structural metadata from filenaming conventions • Develop automated processes that are triggered when an XML file is placed in a “drop box” or submitted via a specialized tool • Develop easy-to-use tools to apply the same metadata to a defined group of records

  22. Metadata standards chosen Metadata creation guidelines written and tools developed/adapted Fedora content model developed or existing appropriate one identified Metadata/markup created (and perhaps digitization performed) Sometimes in phases by different people Basic workflow at IU (1)

  23. Basic workflow at IU (2) • Metadata transformed via XSLT (one per category of material, with some tweaking for each collection) into all desired formats, and loaded into Fedora • Metadata for sharing loaded into OAI-PMH data provider • Appropriate staff alerted for parallel metadata creation for OPAC (generally collection level) Note several opportunities for greater efficiency

  24. One step at a time • Implementing shareable metadata practices likely will be done incrementally • We’re still learning how to best achieve effective shareability • Best practices grow and change over time • Must be positioned to respond quickly to new metadata standards and technologies as they evolve

  25. Shareable metadata isn’t just about OAI-PMH • Some other options: • Lightweight APIs (e.g., OpenLibrary) • Google SiteMaps • OpenURL • SRU • OAI-ORE • Linked data • Jim Michalko, RLG: library data sharing mechanisms are “high value and low participation” Notice Z39.50 isn’t on this list.

  26. Promoting new uses • The academic institution-built metadata (and/or content) aggregation seems to have plateaued • See Ricky Erway RLG report “Seeking Sustainability” • We must provide a variety of options for accessing our data, to support a variety of uses • We shouldn’t necessarily stop collaboration and aggregation, but we should allow others to do this too, with our metadata (and maybe even our content)

  27. Terminologies services • Sharing our authority data is potentially even more useful than sharing our descriptive data • RLG/OCLC doing some work in this area • Moving terminologies to the “network level” • Some possible uses • Give me more information on this concept/person/etc. • What are this term’s broader, narrower, related terms? • What are all the synonyms for this term?

  28. Tools supporting the creation of shareable metadata • Our existing metadata creation tools are embarrasingly bad • Current technologies provide many opportunities for improvement • Good tools make it easy to do the right thing and hard to do the wrong thing • Can operate when metadata is first created or in a later review step • Here are some ideas…

  29. Generally only a good idea for markup languages, rather than metadata structure standards And often not even then Some supplemental tools can help Validation to Schema/DTD (of course) “Preview” function “Report card” function, e.g., with Schematron Directly in XML

  30. Modularize • All metadata for a resource doesn’t have to be created at once • Transcription vs. authority work vs. subject analysis • Descriptive vs. technical vs. structural • Us vs. users! • Provide optimized views for each metadata creation function • Perhaps even different systems • But always provide metadata creators with a way to see how the metadata will be used

  31. Abandon the record-centric approach • Patterns (and outliers) emerge from data in the aggregate • Reporting capabilities • Sortable, deduplicated lists of values from a given field or set of fields • How many of this field per record • How many distinct values used in this field • Data overlap between fields

  32. Useful features • Data type validation (while entering data in that field!) • Auto-complete • Record-level validation • Spell check • Integration of metadata creation guidelines into software tools

  33. Integration of controlled vocabularies • Should be seamless • Provide access to entire authority record rather than just the heading • For short vocabularies, provide a combo box • For longer vocabularies • Auto-complete • Ajax-y interactions with hierarchical and alphabetical views • Similar features could be used to perform maintenance of vocabularies

  34. Working around system limitations • Many digital asset management systems don’t support a second shareable copy of records • Do your best to split the difference with system records • Use creative interface design for your local system • Use extra-protocol documentation for communicating with aggregators • Lobby your vendor!

  35. One person can’t do it all Implementing shareable metadata requires a primary advocate to ensure shareability is a consideration at all steps of the workflow Many people will need to be involved Good practice requires collaboration

  36. Often are the shareable metadata advocate Choose standards and sharing protocols Write metadata creation guidelines Be prepared to compromise! Role of metadata specialists

  37. Role of technical staff • Evaluate feasibility of technical plans • Help with prioritization of options • Locate and evaluate existing code to minimize duplication of effort • Abstract specific processes for general use

  38. Other collaborators • Collection managers • User specialists • Project managers • Catalogers/metadata creators • Reference staff • Granting agencies

  39. Final thoughts about sharing • Shareable metadata represents a fundamental shift in thinking • Your metadata is no longer a destination, it is information that will serve as building blocks for other services • Your metadata must operate effectively in an increasingly decontextualized environment • Creating shareable metadata • Will require more work on your part • Will require our software to support (more) standards • Is no longer an option, it’s a requirement

  40. Yes, this is hard… …and we’re just starting to learn how to do it effectively and efficiently There’s plenty of room for leadership in this area.

More Related