1 / 152

Metadata for Digital Repositories

Metadata for Digital Repositories Mark Jordan Repository Redux University of Prince Edward Island September 19, 2007 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License Schedule 9:00 - 10:30

Faraday
Download Presentation

Metadata for Digital Repositories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata for Digital Repositories Mark Jordan Repository Redux University of Prince Edward Island September 19, 2007 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License

  2. Schedule • 9:00 - 10:30 • Background; types of metadata; major standards; choosing metadata schemes • 10:45 - 12:00 • Metadata life cycle; strategies for creation and management; automated creation; supplementation strategies • 1:00 - 2:30 • SFU theses workflow case study; native vs. derived; crosswalks • 2:45 - 4:30 • Application Profiles; OAI; CARLCore AP case study

  3. What is Metadata? • Different meanings in different communities • Information about information • Can describe information at any level • Collection • Item • Item within item • Can be embedded within an object or separate from it

  4. Types of Metadata • Descriptive • Terms and conditions • Administrative data • Content ratings • Provenance • Linking or relationship data • Structural data Carl Lagoze, Clifford A. Lynch, and Ron Daniel, Jr. “The Warwick Framework: A Container Architecture for Aggregating Sets of Metadata”. 1996. http://hdl.handle.net/1813/7248

  5. Metadata and Cataloguing • Perception that cataloguing is old and metadata is new • Traditional cataloguing focuses on descriptions of analogue materials • Metadata focuses on management of networked resources • For locally created or managed networked resources (such as repositories), cataloguing is insufficient

  6. Metadata Schemes • Defines a collection of elements for supporting a specific function • Defines structures for element values • Defines formal aspects of the element set, such as name, definition, data type, etc. • Some schemes are expressed as XML schemas

  7. Containers vs. Rules of Description • Containers dictate structure • Rules of description dictate content • Common rules of description • AACR2 • RDA • RAD

  8. Vs. Glue Standards • OpenURL • Syntax for encoding bib data in URLs • http://resolver.example.edu/cgi?genre=book&isbn=0836218310&title=The+Far+Side+Gallery+3 • COinS • OpenURLS embedded in HTML <span> tags • unAPI • Identifiers embedded in HTML <abbr> tags for autodiscovery and “copy and paste” • Microformats • For example, <a href="http://creativecommons.org/licenses/by/2.0/" rel="license">cc by 2.0</a>

  9. Selected Major Standards • Dublin Core • MODS • Collection Description • RDA • EAD • PREMIS • METS

  10. Dublin Core • Standard metadata set for describing resources • It is flexible • Qualified vs. unqualified • Can be expressed in HTML, XML ,or using RDF • Dummying down is a good thing

  11. Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights Dublin Core Element Set

  12. Dublin Core Qualifiers • Types • Element refinements • Encoding schemes • Examples • Description • Table of contents, abstract • Date • Created, valid, available, issued, modified • Subject • LCSH, MESH, DDC, LCC, UDC

  13. Dublin Core Example

  14. MODS • A “bibliographic element set that may be used for a variety of purposes, and particularly for library applications.” • Richer than DC, simpler than MARC • Does not assume the use of any specific cataloging code • Elements: titleInfo, title, name, namePart, originInfo, etc.

  15. <?xml version="1.0" encoding="UTF-8"?> <mods:mods xmlns:mods="http://www.loc.gov/mods/v3"> <mods:titleInfo> <mods:title>A Jewel of Honesty</mods:title> </mods:titleInfo> <mods:genre>Article</mods:genre> <mods:abstract>clashing oppositions</mods:abstract> <mods:subject> <mods:geographic>N/A</mods:geographic> </mods:subject> <mods:subject authority="none"> <mods:topic>General interest</mods:topic> </mods:subject> <mods:relatedItem type="host"> <mods:titleInfo> <mods:title>Carnegie Newsletter</mods:title> <mods:title>Celebration a Spectacle of Hope</mods:title> </mods:titleInfo> <mods:name> <mods:namePart>Pra'N'Ava</mods:namePart> <mods:role> <mods:roleTerm authority="marcrelator" type="text">author</mods:roleTerm> </mods:role> </mods:name> <mods:name> <mods:namePart>N/A</mods:namePart> <mods:role> <mods:roleTerm authority="chodarr" type="text">recipient</mods:roleTerm> </mods:role> </mods:name> <mods:part> <mods:extent unit="pages"> <mods:start>9</mods:start> <mods:list>9,14</mods:list> </mods:extent> </mods:part> <mods:originInfo> <mods:dateIssued encoding="iso8601">19870101</mods:dateIssued> </mods:originInfo> </mods:relatedItem> </mods:mods>

  16. <?xml version="1.0" encoding="UTF-8"?> <mods:mods xmlns:mods="http://www.loc.gov/mods/v3"> <mods:titleInfo> <mods:title>A Jewel of Honesty</mods:title> </mods:titleInfo> <mods:genre>Article</mods:genre> <mods:abstract>clashing oppositions</mods:abstract> <mods:subject> <mods:geographic>N/A</mods:geographic> </mods:subject> <mods:subject authority="none"> <mods:topic>General interest</mods:topic> </mods:subject> <mods:relatedItem type="host"> <mods:titleInfo> <mods:title>Carnegie Newsletter</mods:title> <mods:title>Celebration a Spectacle of Hope</mods:title> </mods:titleInfo> <mods:name> <mods:namePart>Pra'N'Ava</mods:namePart> <mods:role> <mods:roleTerm authority="marcrelator" type="text">author </mods:roleTerm> </mods:role> </mods:name>

  17. DCMI Collection Description • Formal description of aggregation or collection of items • Can apply to collections where item-level metadata is not available or appropriate, or to collections where it is • Sample elements: • accrualMethod, accrualPeriodicity • Developed as NISO Z39.91 Dublin Core Collection Description Application Profile, http://www.ukoln.ac.uk/metadata/dcmi/collection-application-profile/2004-02-01/

  18. RDA • Resource Description and Access, the successor to AACR2 • Diane Hillmann’s critique • Reliance on transcription and specified sources of information • Reliance on unstructured notes • Multiple versions in one record • Full review at http://dublincore.org/usage/meetings/2006/04/seattle/rda-review/RDA_for_who.htm

  19. EAD • XML schema for encoding archival finding aids • Contains elements for all aspects of archival description, from <repository> to <daoloc> • <archdesc> is the standard tag for describing fonds, series, subseries, etc. hierarchies

  20. <did> <head>Summary Description of the Tom Stoppard Papers</head> <repository> <corpname>The University of Texas at Austin <subarea>Harry Ransom Humanities Research Center</subarea> </corpname> </repository> <origination> <persname source="lcnaf" encodinganalog="100">Stoppard,Tom</persname> </origination> <unittitle encodinganalog="245">Tom Stoppard Papers, </unittitle> <unitdate type="inclusive">1944-1995</unitdate> <physdesc encodinganalog="300"> <extent>68 boxes (28 linear feet)</extent> </physdesc> <unitid type="accession">R4635</unitid> <physloc audience="internal">14E:SW:6-8</physloc> <abstract>The papers of British playwright Tom Stoppard (b. 1937 encompass his entire career and consist of multiple drafts of his plays, from the well-known <title render="italic">Rosencrantz and Guildenstern Are Dead</title> to several that were never produced, correspondence, photographs, and posters, as well as materials from stage, screen, and radio productions from around the world.</abstract> </did>

  21. PREMIS • Data model • Digital objects • Intellectual entities • Agents • Events • Rights • Relationships • Data Dictionary contains examples and sections on compliance and implementation • Can be encoded in METS

  22. PREMIS Elements

  23. METSRights • Endorsed by METS Board but useful outside of METS documents • XML Elements • RightsDeclaration • RightsHolder • Context • Permissions • Constraint

  24. METS • METS: Metadata Encoding & Transmission Standard • Encodes descriptive, administrative, and structural metadata in one XML file • Preferred data structure for digital library initiatives • Goals • Manage different types of metadata • Migrate resources between repositories

  25. METS Community • Maintenance agency is Library of Congress • Website • http://www.loc.gov/standards/mets/ • Implementation registry • Lists 33 projects at 24 institutions

  26. METS Components • METS header • Descriptive metadata section • Administrative metadata section • File section • Structural map section • Structural link section • Behavior section

  27. fileSec • Lists all files making up the resource • <fileLocat> points to files • IDs of <file> elements link to pertinent administrative metadata in <amdSec> using the ADMID attribute

  28. <mets:fileSec> <mets:fileGrp USE="archive image"> <mets:file ID="epi01m" MIMETYPE="image/tiff"> <mets:FLocat xlink:href="http://www.loc.gov/standards/mets/docgroup/ full/01.tif" LOCTYPE="URL"/> </mets:file> <mets:file> … </mets:file> </mets:fileGrp> <mets:fileGrp USE="reference image"> <mets:file ID="epi01r" MIMETYPE="image/jpeg"> <mets:FLocat xlink:href="http://www.loc.gov/standards/mets/docgroup/jpg/01.jpg" LOCTYPE="URL"/> </mets:file> </mets:fileGrp> <mets:fileGrp USE="thumbnail image"> <mets:file ID="epi01t" MIMETYPE="image/gif"> <mets:FLocat xlink:href="http://www.loc.gov/standards/mets/docgroup/gif/01.gif" LOCTYPE="URL"/> </mets:file> </mets:fileGrp> </mets:fileSec>

  29. structMap • The only required section • Defines the hierarchical structure of the resource • Can be physical or logical • Physical structMaps simply list files in order • Pages that make up a book • Logical structMaps list files in order but in the context of the intellectural structure of the resource • Chapters that make up a book

  30. <mets:structMap TYPE="physical"> <mets:div TYPE="book" LABEL="Martial Epigrams II"> <mets:div TYPE="page" LABEL="Blank page"> </mets:div> <mets:div TYPE="page" LABEL="Page ii: Blank page"> </mets:div> <mets:div TYPE="page" LABEL="Page iii: Title page"> </mets:div> <mets:div TYPE="page" LABEL="Page iv: Publication info"> </mets:div> <mets:div TYPE="page" LABEL="Page v: Table of contents"> </mets:div> <mets:div TYPE="page" LABEL="Page vi: Blank page"> </mets:div> <mets:div TYPE="page" LABEL="Page 1: Half title page"> </mets:div> <mets:div TYPE="page" LABEL="Page 2 (Latin)"> </mets:div> <mets:div TYPE="page" LABEL="Page 3 (English)"> </mets:div> </mets:div> … </mets:div> </mets:structMap>

  31. dmdSec • Contains descriptive metadata • Descriptive metadatat can be included or linked externally • Descriptive metadata can be in any scheme • Can accommodate XML (ex., MODS) or binary (ex., MARC) representations of descriptive metadata

  32. <mets:dmdSec ID="DMD1"> <mets:mdWrap MIMETYPE="text/xml" MDTYPE="MODS"> <mets:xmlData> <mods:mods version="3.1"> <mods:titleInfo> <mods:title>Epigrams</mods:title> </mods:titleInfo> <mods:name type="personal"> <mods:namePart>Martial</mods:namePart> </mods:name> <mods:name type="personal"> <mods:namePart>Ker, Walter C. A. (Walter Charles Alan), 1853-1929 </mods:namePart> </mods:name> <mods:typeOfResource>text</mods:typeOfResource> … </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec>

  33. amdSec • Contains info on digital resource, files in the resource, or original analogue source • Type of info • Technical • Intellectual property • Provenance

  34. <mets:techMD ID="AMD001"> <mets:mdWrap MIMETYPE="text/xml" MDTYPE="NISOIMG" LABEL="NISO Img.Data"> <mets:xmlData> <niso:MIMEtype>image/tiff</niso:MIMEtype> <niso:Compression>LZW</niso:Compression> <niso:PhotometricInterpretation> 8 </niso:PhotometricInterpretation> <niso:Orientation> 1 </niso:Orientation> <niso:ScanningAgency> NYU Press </niso:ScanningAgency> </mets:xmlData> </mets:mdWrap> </mets:techMD>

  35. mets Header • Contains info about the METS document • Sample <metsHdr CREATEDATE="2006-05-09T15:00:00" LASTMODDATE=”2006-05-09T21:00:00> <mets:agent ROLE="CREATOR" TYPE="INDIVIDUAL"> <mets:name>Rick Beaubien</mets:name> </mets:agent> <mets:altRecordID TYPE=”LCCN”>20022838</mets:altRecordID> </metsHdr>

  36. structLink • Adds hyperlinks between elements in a Structural Map • Sample <mets:structLink> <mets:smLink xlink:from="LINK7" xlink:to="page1145" xlink:title="projects"> </mets:smLink> <mets:smLink xlink:from="LINK13" xlink:to="page1145” xlink:title="projects"> </mets:smLink> <mets:smLink xlink:from="LINK36" xlink:to="page113" xlink:title="officers"> </mets:smLink> <mets:smLink xlink:from="LINK37" xlink:to="page120" xlink:title="calender"> </mets:smLink> </mets:structLink>

  37. behaviorSec • Associates executable behaviors (i.e., computer code) with parts of a document/object • Sample <mets:behaviorSec> <mets:behavior ID="disp1" STRUCTID="top" BTYPE="display” LABEL="Display Behavior"> <mets:interfaceDef LABEL="EAD Display Definition" LOCTYPE="URL" xlink:href= ”http://texts.cdlib.org/dynaxml/profiles/display/oacDisplayDef.txt”/> <mets:mechanism LABEL="EAD Display Mechanism" LOCTYPE="URN" xlink:href= “http://texts.cdlib.org/dynaxml/profiles/display/oacDisplayMech.xml </mets:behavior> </mets:behaviorSec>

  38. Linking Between Sections • Can point to <dmdSec> • <file>, <stream>, <div> • Can point to <techMD>, <rightsMD>, <sourceID>, <digiprovMD> • <dmdSec>, <file>, <fileGrp>, <stream> • Can point to <file> • <fptr>, <area> • Can point to <div> • <behavior>

  39. METS Profiles • METS is so flexible, it needs to be documented for each particular application or use • Components • URI • Date • Abstract • Extension schemas • Rules of description • Vocabularies • Structural rules for resources • Technical metadata

  40. What is the point of all this? • Management of digital resources requires many types of metadata • Managing all this metadata can be difficult • METS can do it all, but is complex

  41. Functional Requirements • What do you expect your metadata to do? • The nature of the resources you are putting in your digital collection • The nature of the intended audience(s) for your collection • The level of description • The size of your collection • Importance of interoperability • The resources your library has for creation and long-term maintenance of the metadata

  42. Nature of Resources • Is there full text? • Are they “simple” or “complex”? • Do you supply multiple versions of the same resource? • Are all resources available to all users?

  43. Nature of Users • Is your audience general or specialized? • How information/network literate are they? • How much information will they need to choose appropriate resources? • What other assumptions can you safely make about your users, and how do those assumptions impact your metadata planning activities?

  44. Level of Description • How much detail do you want to include in your metadata • Related to resources available for creation of metadata, and balance of quantity vs. quality • Expensive (e.g., subject) vs. cheap (e.g., file size) descriptive elements

  45. Size of Collection • Small collections rely less on metadata than large collections do • Browsing, faceting, and differentiating functions are more important in large collections • In general, the bigger the collection, the more granular the values in your metadata needs to be • E.g., subject vocabularies

  46. Importance of Interoperability • Metadata in local schemes is more difficult to share than metadata in standard schemes • Always assume your metadata will be used in contexts different from the original • Plan metadata with crosswalks in mind

  47. Resources for Managing Metadata • How will metadata of various types be created and managed? • Does your institution have a DAM strategy? • Will preservation metadata (e.g., PREMIS) be managed?

  48. FRBR’s User Tasks • Functional requirements can be expressed in terms of the FRBR data model • Find entities which correspond to user’s search criteria • Identify an entity • Select an entity • Acquire or obtain access to the desired entity

  49. Analyzing Domains • Environmental • Object class • Object format Jane Greenberg, “Understanding Metadata and Metadata Schemas.” In Metadata: A Cataloguer’s Primer. Ed. Richard P. Smiraglia. New York: Haworth. 2005.

  50. Metadata Quality • Completeness • Accuracy • Provenance • Conformance to expectations • Logical consistency and coherence • Timelines • Accessibility Thomas R. Bruce and Dianne I Hillmann, “The Continuum of Metadata Quality: Defining, Expressing, Expoiting.” In Metadata in Practice. Ed. Diane I. Hillmann and Elaine L. Westbrooks. Chicago: American Library Association, 2004.

More Related