1 / 30

Metadata interoperability for everyone – XML tools for catalogers

Metadata interoperability for everyone – XML tools for catalogers Terry Reese Digital Production Unit Head Oregon State University Finding our way Metadata Interoperability Crosswalk systems Common problems Metadata tools Scripting Solutions MarcEdit MarcEdit and MODS

niveditha
Download Presentation

Metadata interoperability for everyone – XML tools for catalogers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata interoperability for everyone – XML tools for catalogers Terry Reese Digital Production Unit Head Oregon State University

  2. Finding our way • Metadata Interoperability • Crosswalk systems • Common problems • Metadata tools • Scripting Solutions • MarcEdit • MarcEdit and MODS • Metadata transformations • MODS editing • Automatic MODS harvesting • Conclusion

  3. Why metadata interoperability?

  4. Why metadata interoperability? • Today, we have literally hundreds of different metadata schemas. In the library, we have a wide variety as well. • MARC (and all its flavors) • FGDC • Dublin Core • EAD • METS • MODS • Onyx • OAI • TEI • FRBR • GILS • etc…..

  5. If you describe it….. • Metadata schemas are created by communities to meet the special descriptive needs of those communities. • Of course, one of the dangers is competing standards within groups creating multiple incompatible schema or the creation of variations of a particular schema within a community.

  6. If you describe it….. <controlaccess> <subject source="lcsh" encodinganalog="650">College students--Iowa--Mount Vernon.</subject> <subject encodinganalog="650" source="lcsh">Student activities--Iowa--Mount Vernon.</subject> </controlaccess> <controlaccess> <subject source="lcsh"> <controlaccess encodinganalog=“650a”>College students</controlaccess> <controlaccess encodinganalog=“650z”>Iowa</controlaccess> <controlaccess encodinganalog=“650z”>Mount Vernon.</controlaccess> </subject> </controlaccess>

  7. If you describe it… Some specialized examples: • MARC (MAchine Readable Communication) • http://oregonstate.edu/~reeset/presentations/ala/summer2005/marc.txt • EAD (Encoded Archival Description) • http://oregonstate.edu/~reeset/presentations/ala/summer2005/ead.xml • (MARC representation: http://oasis.orst.edu/record=b2324248) • Dublin Core • http://oregonstate.edu/~reeset/presentations/ala/summer2005/dc.xml • FGDC • http://oregonstate.edu/~reeset/presentations/ala/summer2005/fgdc.xml

  8. If you describe it… Why would communities develop shared metadata schemas? • Shared schemas provide a structured method for sharing data within a community. • Example: MARC…its development paved the way for the current cooperative cataloging model and tools like: • OCLC • RLIN • Z39.50 • But shared best practices?

  9. Why use crosswalks? Crosswalks: • Are developed by examining the similarities and differences between schemas. • Are one of the primary mechanism that can be used to allow different systems to interoperate with each other. • Breaks down data transfer barriers, allowing different systems to share data.

  10. Why use crosswalks? • To combine metadata catalogs e.g. Union catalogs • To provide cross searchability between unlike datasets e.g. Federated search tools • To perform data/metadata maintenance e.g. Updating metadata formats – moving away from obsolete standards. • Repurposing one schema to another.

  11. Why use crosswalks? • Cost • Metadata creation costs can be prohibitive • University of Indiana reported in 2003 on their digitization costs that 1/3 total cost attributed to metadata create.4 This was just the initial metadata creation costs and didn’t include estimates for ongoing metadata maintenance. • However, this isn’t just a digitization issue – its also an issue for traditional catalog workflows (books, serials, etc): • Loose OSU cost approximates (including OCLC charges): • Books (copy cataloging): $3 /book • Books (original): $27 /book • Thesis (subject/classification): $20 /thesis

  12. Crosswalking challenges • Schema granularity • One to many matches and many to one matches • Crosswalking from schemas with different granularity levels • Trying to map anything from unqualified Dublin Core. • Handling object relationships or hierarchies. • EAD=>MARC

  13. Crosswalking challenges • Dealing with spare parts • Since data crosswalking is rarely a one to one mapping, the process nearly always results in unmappable data.

  14. Common Crosswalking System Designs • Type-broker model (Ockerbloom) • Facilitates crosswalking – allows users to query known systems • Provides analysis and facilitates unknown crosswalking systems: • Determines crosswalk path • Negotiates system nodes • Does negotiations without the need for a control data layer – but allows clients to specify a control data layer that must be utilized in the conversion process.

  15. Common Crosswalking System Designs • Dumb-down crosswalking model • Converting data to its lowest common denominator. • Example: OAI’s initial use of Dublin Core as a tranfer format.

  16. Metadata Tools • PERL-based: • MARC::RECORD, MARC::CharSet, MARC::XML • http://marcpm.sourceforge.net/ • Non-PERL based: • MarcEdit – includes XML API and crosswalks for a number of common metadata schemas. • http://oregonstate.edu/~reeset/marcedit/html/ • LC’s MARC tools: http://www.loc.gov/marc/marctools.html

  17. MarcEdit • MarcEdit 5.0 • System Requirements:Using .NET FrameworkWindows 98, ME, NT, 2000, XP, 2003 .NET 1.1 FrameworkMDAC 2.7 runtimesUsing MONO Framework (hopefully available after August 2005).Windows 2000+, Linux and MAC OS XMONO system requirements

  18. MarcEdit: crosswalking design • Utilizes a modified version of Ockerbloom’s type-broker system. • Unlike Ockerbloom’s system, which broker’s transformations between known schemas, MarcEdit utilizes MARCXML as a control schema to facilitate translation.

  19. MarcEdit: crosswalking design • Ockerbloom model:broker system would continue doing translations till the desired format was found. Example: MODS, Dublin Core, MARCXML, MARC

  20. Broker System model crosswalks Type broker

  21. MarcEdit: crosswalking design • MarcEdit model: • So long as a schema has been mapped to MARCXML, any metadata combination could be utilized. This means that no more than two tranformations will ever take place. Example: MODS  MARCXML  EAD

  22. MarcEdit: crosswalking design • MarcEdit Crosswalk model • Pro • Crosswalks need not be directly related to each other • Requires crosswalker to know specific knowledge of only one schema • Con • each known crosswalk must be mapped to MARCXML.

  23. MarcEdit Crosswalking model

  24. MarcEdit: Crosswalks for everyone

  25. MarcEdit: Crosswalks for everyone • Example Crosswalks: • MODS => MARC • MODS => FGDC • MODS => Dublin Core • EAD => MODS • EAD=>HTML

  26. MarcEdit: Crosswalks for everyone • What’s MarcEdit doing? • Facilitates the crosswalk by: • Performing character translations (MARC8-UTF8) • Facilitates interaction between binary and XML formats.

  27. MarcEdit: Simplify Editing MODS records • New to MarcEdit 5.0 is the ability to edit MODS records in the MarcEditor as if it were a regular MARC file. • Allows catalogers unfamiliar with MODS to work with MODS data in a familiar form. • Will automatically translate new fields into MODS equivalents. • Will only translate MODS equivalent field data.

  28. MarcEdit: Simplify Editing MODS records • How it works: • MODS file is translated to MARCXML • MARCXML is translated to MarcEdit Mnemonic format. • Internally, the MarcEditor tracks format and changes. • On save, mnemonic file will be retranslated back into MODS with edited and added fields being translated to their appropriate MODS mappings.

  29. MarcEdit: Making OAI Simple • New to MarcEdit 5.0 is a Metadata Harvester. • From within the MarcEditor, users can harvest DC, oai_marc or MODS records directly into MARC. • http://oregonstate.edu/~reeset/presentations/ala/summer2005/harvest.wmv

  30. Bibliography • Ockerbloom, John. Mediating among diverse data formats. School of Computer Science, Carnegie Mellon University. CMU-CS-98-102. January 1998. http://tom.library.upenn.edu/pubs/thesis/ • Digitization Costs & Funding. Digital Library Workshop. Oct. 2003. http://www.dlib.indiana.edu/workshops/alioct03/costs.ppt

More Related