1.56k likes | 2.15k Views
Metadata for Digital Repositories Mark Jordan Repository Redux University of Prince Edward Island September 19, 2007 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License Schedule 9:00 - 10:30
E N D
Metadata for Digital Repositories Mark Jordan Repository Redux University of Prince Edward Island September 19, 2007 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada License
Schedule • 9:00 - 10:30 • Background; types of metadata; major standards; choosing metadata schemes • 10:45 - 12:00 • Metadata life cycle; strategies for creation and management; automated creation; supplementation strategies • 1:00 - 2:30 • SFU theses workflow case study; native vs. derived; crosswalks • 2:45 - 4:30 • Application Profiles; OAI; CARLCore AP case study
What is Metadata? • Different meanings in different communities • Information about information • Can describe information at any level • Collection • Item • Item within item • Can be embedded within an object or separate from it
Types of Metadata • Descriptive • Terms and conditions • Administrative data • Content ratings • Provenance • Linking or relationship data • Structural data Carl Lagoze, Clifford A. Lynch, and Ron Daniel, Jr. “The Warwick Framework: A Container Architecture for Aggregating Sets of Metadata”. 1996. http://hdl.handle.net/1813/7248
Metadata and Cataloguing • Perception that cataloguing is old and metadata is new • Traditional cataloguing focuses on descriptions of analogue materials • Metadata focuses on management of networked resources • For locally created or managed networked resources (such as repositories), cataloguing is insufficient
Metadata Schemes • Defines a collection of elements for supporting a specific function • Defines structures for element values • Defines formal aspects of the element set, such as name, definition, data type, etc. • Some schemes are expressed as XML schemas
Containers vs. Rules of Description • Containers dictate structure • Rules of description dictate content • Common rules of description • AACR2 • RDA • RAD
Vs. Glue Standards • OpenURL • Syntax for encoding bib data in URLs • http://resolver.example.edu/cgi?genre=book&isbn=0836218310&title=The+Far+Side+Gallery+3 • COinS • OpenURLS embedded in HTML <span> tags • unAPI • Identifiers embedded in HTML <abbr> tags for autodiscovery and “copy and paste” • Microformats • For example, <a href="http://creativecommons.org/licenses/by/2.0/" rel="license">cc by 2.0</a>
Selected Major Standards • Dublin Core • MODS • Collection Description • RDA • EAD • PREMIS • METS
Dublin Core • Standard metadata set for describing resources • It is flexible • Qualified vs. unqualified • Can be expressed in HTML, XML ,or using RDF • Dummying down is a good thing
Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights Dublin Core Element Set
Dublin Core Qualifiers • Types • Element refinements • Encoding schemes • Examples • Description • Table of contents, abstract • Date • Created, valid, available, issued, modified • Subject • LCSH, MESH, DDC, LCC, UDC
MODS • A “bibliographic element set that may be used for a variety of purposes, and particularly for library applications.” • Richer than DC, simpler than MARC • Does not assume the use of any specific cataloging code • Elements: titleInfo, title, name, namePart, originInfo, etc.
<?xml version="1.0" encoding="UTF-8"?> <mods:mods xmlns:mods="http://www.loc.gov/mods/v3"> <mods:titleInfo> <mods:title>A Jewel of Honesty</mods:title> </mods:titleInfo> <mods:genre>Article</mods:genre> <mods:abstract>clashing oppositions</mods:abstract> <mods:subject> <mods:geographic>N/A</mods:geographic> </mods:subject> <mods:subject authority="none"> <mods:topic>General interest</mods:topic> </mods:subject> <mods:relatedItem type="host"> <mods:titleInfo> <mods:title>Carnegie Newsletter</mods:title> <mods:title>Celebration a Spectacle of Hope</mods:title> </mods:titleInfo> <mods:name> <mods:namePart>Pra'N'Ava</mods:namePart> <mods:role> <mods:roleTerm authority="marcrelator" type="text">author</mods:roleTerm> </mods:role> </mods:name> <mods:name> <mods:namePart>N/A</mods:namePart> <mods:role> <mods:roleTerm authority="chodarr" type="text">recipient</mods:roleTerm> </mods:role> </mods:name> <mods:part> <mods:extent unit="pages"> <mods:start>9</mods:start> <mods:list>9,14</mods:list> </mods:extent> </mods:part> <mods:originInfo> <mods:dateIssued encoding="iso8601">19870101</mods:dateIssued> </mods:originInfo> </mods:relatedItem> </mods:mods>
<?xml version="1.0" encoding="UTF-8"?> <mods:mods xmlns:mods="http://www.loc.gov/mods/v3"> <mods:titleInfo> <mods:title>A Jewel of Honesty</mods:title> </mods:titleInfo> <mods:genre>Article</mods:genre> <mods:abstract>clashing oppositions</mods:abstract> <mods:subject> <mods:geographic>N/A</mods:geographic> </mods:subject> <mods:subject authority="none"> <mods:topic>General interest</mods:topic> </mods:subject> <mods:relatedItem type="host"> <mods:titleInfo> <mods:title>Carnegie Newsletter</mods:title> <mods:title>Celebration a Spectacle of Hope</mods:title> </mods:titleInfo> <mods:name> <mods:namePart>Pra'N'Ava</mods:namePart> <mods:role> <mods:roleTerm authority="marcrelator" type="text">author </mods:roleTerm> </mods:role> </mods:name>
DCMI Collection Description • Formal description of aggregation or collection of items • Can apply to collections where item-level metadata is not available or appropriate, or to collections where it is • Sample elements: • accrualMethod, accrualPeriodicity • Developed as NISO Z39.91 Dublin Core Collection Description Application Profile, http://www.ukoln.ac.uk/metadata/dcmi/collection-application-profile/2004-02-01/
RDA • Resource Description and Access, the successor to AACR2 • Diane Hillmann’s critique • Reliance on transcription and specified sources of information • Reliance on unstructured notes • Multiple versions in one record • Full review at http://dublincore.org/usage/meetings/2006/04/seattle/rda-review/RDA_for_who.htm
EAD • XML schema for encoding archival finding aids • Contains elements for all aspects of archival description, from <repository> to <daoloc> • <archdesc> is the standard tag for describing fonds, series, subseries, etc. hierarchies
<did> <head>Summary Description of the Tom Stoppard Papers</head> <repository> <corpname>The University of Texas at Austin <subarea>Harry Ransom Humanities Research Center</subarea> </corpname> </repository> <origination> <persname source="lcnaf" encodinganalog="100">Stoppard,Tom</persname> </origination> <unittitle encodinganalog="245">Tom Stoppard Papers, </unittitle> <unitdate type="inclusive">1944-1995</unitdate> <physdesc encodinganalog="300"> <extent>68 boxes (28 linear feet)</extent> </physdesc> <unitid type="accession">R4635</unitid> <physloc audience="internal">14E:SW:6-8</physloc> <abstract>The papers of British playwright Tom Stoppard (b. 1937 encompass his entire career and consist of multiple drafts of his plays, from the well-known <title render="italic">Rosencrantz and Guildenstern Are Dead</title> to several that were never produced, correspondence, photographs, and posters, as well as materials from stage, screen, and radio productions from around the world.</abstract> </did>
PREMIS • Data model • Digital objects • Intellectual entities • Agents • Events • Rights • Relationships • Data Dictionary contains examples and sections on compliance and implementation • Can be encoded in METS
METSRights • Endorsed by METS Board but useful outside of METS documents • XML Elements • RightsDeclaration • RightsHolder • Context • Permissions • Constraint
METS • METS: Metadata Encoding & Transmission Standard • Encodes descriptive, administrative, and structural metadata in one XML file • Preferred data structure for digital library initiatives • Goals • Manage different types of metadata • Migrate resources between repositories
METS Community • Maintenance agency is Library of Congress • Website • http://www.loc.gov/standards/mets/ • Implementation registry • Lists 33 projects at 24 institutions
METS Components • METS header • Descriptive metadata section • Administrative metadata section • File section • Structural map section • Structural link section • Behavior section
fileSec • Lists all files making up the resource • <fileLocat> points to files • IDs of <file> elements link to pertinent administrative metadata in <amdSec> using the ADMID attribute
<mets:fileSec> <mets:fileGrp USE="archive image"> <mets:file ID="epi01m" MIMETYPE="image/tiff"> <mets:FLocat xlink:href="http://www.loc.gov/standards/mets/docgroup/ full/01.tif" LOCTYPE="URL"/> </mets:file> <mets:file> … </mets:file> </mets:fileGrp> <mets:fileGrp USE="reference image"> <mets:file ID="epi01r" MIMETYPE="image/jpeg"> <mets:FLocat xlink:href="http://www.loc.gov/standards/mets/docgroup/jpg/01.jpg" LOCTYPE="URL"/> </mets:file> </mets:fileGrp> <mets:fileGrp USE="thumbnail image"> <mets:file ID="epi01t" MIMETYPE="image/gif"> <mets:FLocat xlink:href="http://www.loc.gov/standards/mets/docgroup/gif/01.gif" LOCTYPE="URL"/> </mets:file> </mets:fileGrp> </mets:fileSec>
structMap • The only required section • Defines the hierarchical structure of the resource • Can be physical or logical • Physical structMaps simply list files in order • Pages that make up a book • Logical structMaps list files in order but in the context of the intellectural structure of the resource • Chapters that make up a book
<mets:structMap TYPE="physical"> <mets:div TYPE="book" LABEL="Martial Epigrams II"> <mets:div TYPE="page" LABEL="Blank page"> </mets:div> <mets:div TYPE="page" LABEL="Page ii: Blank page"> </mets:div> <mets:div TYPE="page" LABEL="Page iii: Title page"> </mets:div> <mets:div TYPE="page" LABEL="Page iv: Publication info"> </mets:div> <mets:div TYPE="page" LABEL="Page v: Table of contents"> </mets:div> <mets:div TYPE="page" LABEL="Page vi: Blank page"> </mets:div> <mets:div TYPE="page" LABEL="Page 1: Half title page"> </mets:div> <mets:div TYPE="page" LABEL="Page 2 (Latin)"> </mets:div> <mets:div TYPE="page" LABEL="Page 3 (English)"> </mets:div> </mets:div> … </mets:div> </mets:structMap>
dmdSec • Contains descriptive metadata • Descriptive metadatat can be included or linked externally • Descriptive metadata can be in any scheme • Can accommodate XML (ex., MODS) or binary (ex., MARC) representations of descriptive metadata
<mets:dmdSec ID="DMD1"> <mets:mdWrap MIMETYPE="text/xml" MDTYPE="MODS"> <mets:xmlData> <mods:mods version="3.1"> <mods:titleInfo> <mods:title>Epigrams</mods:title> </mods:titleInfo> <mods:name type="personal"> <mods:namePart>Martial</mods:namePart> </mods:name> <mods:name type="personal"> <mods:namePart>Ker, Walter C. A. (Walter Charles Alan), 1853-1929 </mods:namePart> </mods:name> <mods:typeOfResource>text</mods:typeOfResource> … </mods:mods> </mets:xmlData> </mets:mdWrap> </mets:dmdSec>
amdSec • Contains info on digital resource, files in the resource, or original analogue source • Type of info • Technical • Intellectual property • Provenance
<mets:techMD ID="AMD001"> <mets:mdWrap MIMETYPE="text/xml" MDTYPE="NISOIMG" LABEL="NISO Img.Data"> <mets:xmlData> <niso:MIMEtype>image/tiff</niso:MIMEtype> <niso:Compression>LZW</niso:Compression> <niso:PhotometricInterpretation> 8 </niso:PhotometricInterpretation> <niso:Orientation> 1 </niso:Orientation> <niso:ScanningAgency> NYU Press </niso:ScanningAgency> </mets:xmlData> </mets:mdWrap> </mets:techMD>
mets Header • Contains info about the METS document • Sample <metsHdr CREATEDATE="2006-05-09T15:00:00" LASTMODDATE=”2006-05-09T21:00:00> <mets:agent ROLE="CREATOR" TYPE="INDIVIDUAL"> <mets:name>Rick Beaubien</mets:name> </mets:agent> <mets:altRecordID TYPE=”LCCN”>20022838</mets:altRecordID> </metsHdr>
structLink • Adds hyperlinks between elements in a Structural Map • Sample <mets:structLink> <mets:smLink xlink:from="LINK7" xlink:to="page1145" xlink:title="projects"> </mets:smLink> <mets:smLink xlink:from="LINK13" xlink:to="page1145” xlink:title="projects"> </mets:smLink> <mets:smLink xlink:from="LINK36" xlink:to="page113" xlink:title="officers"> </mets:smLink> <mets:smLink xlink:from="LINK37" xlink:to="page120" xlink:title="calender"> </mets:smLink> </mets:structLink>
behaviorSec • Associates executable behaviors (i.e., computer code) with parts of a document/object • Sample <mets:behaviorSec> <mets:behavior ID="disp1" STRUCTID="top" BTYPE="display” LABEL="Display Behavior"> <mets:interfaceDef LABEL="EAD Display Definition" LOCTYPE="URL" xlink:href= ”http://texts.cdlib.org/dynaxml/profiles/display/oacDisplayDef.txt”/> <mets:mechanism LABEL="EAD Display Mechanism" LOCTYPE="URN" xlink:href= “http://texts.cdlib.org/dynaxml/profiles/display/oacDisplayMech.xml </mets:behavior> </mets:behaviorSec>
Linking Between Sections • Can point to <dmdSec> • <file>, <stream>, <div> • Can point to <techMD>, <rightsMD>, <sourceID>, <digiprovMD> • <dmdSec>, <file>, <fileGrp>, <stream> • Can point to <file> • <fptr>, <area> • Can point to <div> • <behavior>
METS Profiles • METS is so flexible, it needs to be documented for each particular application or use • Components • URI • Date • Abstract • Extension schemas • Rules of description • Vocabularies • Structural rules for resources • Technical metadata
What is the point of all this? • Management of digital resources requires many types of metadata • Managing all this metadata can be difficult • METS can do it all, but is complex
Functional Requirements • What do you expect your metadata to do? • The nature of the resources you are putting in your digital collection • The nature of the intended audience(s) for your collection • The level of description • The size of your collection • Importance of interoperability • The resources your library has for creation and long-term maintenance of the metadata
Nature of Resources • Is there full text? • Are they “simple” or “complex”? • Do you supply multiple versions of the same resource? • Are all resources available to all users?
Nature of Users • Is your audience general or specialized? • How information/network literate are they? • How much information will they need to choose appropriate resources? • What other assumptions can you safely make about your users, and how do those assumptions impact your metadata planning activities?
Level of Description • How much detail do you want to include in your metadata • Related to resources available for creation of metadata, and balance of quantity vs. quality • Expensive (e.g., subject) vs. cheap (e.g., file size) descriptive elements
Size of Collection • Small collections rely less on metadata than large collections do • Browsing, faceting, and differentiating functions are more important in large collections • In general, the bigger the collection, the more granular the values in your metadata needs to be • E.g., subject vocabularies
Importance of Interoperability • Metadata in local schemes is more difficult to share than metadata in standard schemes • Always assume your metadata will be used in contexts different from the original • Plan metadata with crosswalks in mind
Resources for Managing Metadata • How will metadata of various types be created and managed? • Does your institution have a DAM strategy? • Will preservation metadata (e.g., PREMIS) be managed?
FRBR’s User Tasks • Functional requirements can be expressed in terms of the FRBR data model • Find entities which correspond to user’s search criteria • Identify an entity • Select an entity • Acquire or obtain access to the desired entity
Analyzing Domains • Environmental • Object class • Object format Jane Greenberg, “Understanding Metadata and Metadata Schemas.” In Metadata: A Cataloguer’s Primer. Ed. Richard P. Smiraglia. New York: Haworth. 2005.
Metadata Quality • Completeness • Accuracy • Provenance • Conformance to expectations • Logical consistency and coherence • Timelines • Accessibility Thomas R. Bruce and Dianne I Hillmann, “The Continuum of Metadata Quality: Defining, Expressing, Expoiting.” In Metadata in Practice. Ed. Diane I. Hillmann and Elaine L. Westbrooks. Chicago: American Library Association, 2004.