400 likes | 552 Views
The Purdue University Research Repository:. HUBzero customization for dataset publication and digital preservation. Amy Barton, MLS Assistant Professor of Library Science, Metadata Specialist . Carly Dearborn, MSIS Digital Preservation and Electronic Records Archivist. Neal Harmeyer, MLS
E N D
The Purdue University Research Repository: HUBzero customization for dataset publication and digital preservation • Amy Barton, MLS • Assistant Professor of Library Science, Metadata Specialist • Carly Dearborn, MSIS • Digital Preservation and Electronic Records Archivist Neal Harmeyer, MLS Digital Archivist
What is PURR? Technical and institutional infrastructure
The Purdue University Research Repository A brief overview: • The Purdue University Research Repository (PURR) is a research collaboration and data management solution for Purdue researchers and their collaborators. • Data management support • A workspace for researchers to collaborate on research and publish datasets online • Access to published datasets with unique Digital Object Identifier (DOI) • Long-term preservation component • https://purr.purdue.edu
The Purdue University Research Repository A customized instance of hubzero ® • PURR utilizes HUBzero as its foundation: https://hubzero.org • Designed to facilitate virtual communities, online collaboration, research, and teaching • Built on open source LAMP (Linux Apache, MySQL, and PHP) platform with Joomla! Content Management System (CMS) • PURR was specially customized for data stewardship which includes a workflow for the curation, publication, dissemination and preservation of datasets • Unique customization of HUB software will be added to base HUBzero package in next release
The Purdue University Research Repository Collaborative institutional infrastructure • Collaborative effort • Purdue University Libraries • Information Technology at Purdue (ITaP) • Office of the Vice President for Research (OVPR) • Governed by an Executive Committee, Steering Group, and a Working Group • PURR Libraries team • Project Director 50% • Digital Data Repository Specialist 100% • Two Software Developers 100% • Metadata Specialist 20% • Digital Archivist 25% • Two Graduate Assistants 50% • Graduate Assistant 25%
Data Preservation ISO 16363 & OAIS
Digital Preservation in PURR The data deluge • Long-term data management plans required by many federal funding agencies • Trustworthy repositories, sound metadata creation and capture, open standards for file formats, and information literacy vital to longevity of digital resources • Working Group drafted PURR Digital Preservation Policy using the Trustworthy Repository Audit Checklist (TRAC) as guiding document. • TRAC/ISO 16363 influenced documentation such as mission statement, policies, job descriptions, business plan, etc.
Digital Preservation in PURR Developing policies and strategies • PURR’s preservation mandate and its organizational commitment. • PURR commits to preservation for a period of 10 years after which the content is subject to the Libraries’ selection criteria and archival appraisal • Preservation strategies: full preservation, bit-level preservation and no preservation. • All objects receive bit-level maintenance, a DOI permanent identifier, PREMIS preservation metadata, onsite and offsite backups, regular virus checks, regular rotation to new storage media. • PURR accepts all file formats but recommends formats which are more sustainable long-term.
Digital Preservation in PURR OAIS & Distributed digital preservation • The Open Archival Information System (OAIS) Reference Model is a standard in digital preservation and an ISO standard – ISO 14721 • Producers submit content item for publication with appropriate Dublin Core metadata – this acts as the Submission Information Package (SIP) • The Content Information (CI) is then bundled together with Preservation Description Information using Library of Congress specifications for BagIt. This is the Archival Information Package (AIP) • Unlike most OAIS repositories, the Dissemination Information Package (DIP) is not derived from AIP but rather its SIP. • In February 2013, Purdue joined The MetaArchive Cooperative
SIP Designated Community purr.purdue.edu APACHE DIP PHP MySQL JOOMLA! Backup (MetaArchive) HUBzero LOCKSS access BagIt Media AIP LINUX Diagram designed by: SriramKiranValavala
PURR Metadata weaving of standardsfor preservation
Metadata & AIP CREATION TOOL Talking points Metadata Overview Dataset Publication Process Archive Information Package Generation Metadata Generation
Metadata goals for purr • dataset metadata Capture all pertinent information about the dataset file for long term preservation • Descriptive metadata • Administrative metadata • Technical metadata • Structural metadata • Rights metadata • Preservation metadata
Metadata Generation Metadata Standards FOR PURR Metadata Encoding and Transmission Standard (METS) • Wrapper DCMI Metadata Terms (dcterms) • Descriptive metadata Metadata Object Description Schema (MODS) • Dataset ownership • Access condition Preservation Metadata: Implementation Strategies (PREMIS) • Preservation metadata
Metadata generation Why mets? METS acts as a structured container into which other standard metadata schemas can be pointed to externally or embedded internally. Structure: • Descriptive Section <mets:dmdSec> • Administrative Section <mets:amdSec> • Technical Section <mets:techMD> • Rights Section <mets:rightsMD> • Digital Provenance Section <mets:digiprovMD> • File Section <mets:fileSec> • File Structure Section <mets:structMap> DCTERMS PREMIS PREMIS & MODS METS
Metadata Generation • Why Qualified Dublin core? Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) • OurDigital Library Software Developer, Brandon Beatty, developed OAI-PMH functionality The code was submitted to the HUBzero development group and added the core HUBzero code. HUBzero now comes standard with OAI-PMH functionality A contribution for the greater good
Metadata Generation Why Premis? PREMIS is a robust preservation standard that captures digital preservation activities applied to a digital object. • Intellectual Entity • Acoherent set of content that is reasonably described as a unit (dataset). • Objects • A discrete unit of information in digital form. • Events • An action that involves at least one object or agent known to the preservation repository. • Rights • Assertions of one or more rights or permissions pertaining to an object and/or agent. • Agents • A person, organization, or software program associated with preservation events in the life of an object. Data Dictionary for Preservation Metadata (http://www.oclc.org/content/dam/research/activities/pmwg/premis-final.pdf)
Metadata generation PREMIS Events for purr Data Dictionary for Preservation Metadata (http://www.oclc.org/content/dam/research/activities/pmwg/premis-final.pdf)
Metadata generation • Why mods? • CaptureDataset Ownership • <name> The name of a person, organization, or event (conference, meeting, etc.) associated in some way with the resource. • <affiliation> The name of an organization, institution, etc. with which the entity recorded in <name> was associated at the time that the resource was created. • <role> Designates the relationship (role) of the entity recorded in name to the resource described in the record. • <accessCondition> Information about restrictions imposed on access to a resource. • <mods:accessCondition type="restriction on access">publically accessible </mods:accessCondition> • <mods:accessCondition type="restriction on access">embargoed until 2015-06-30 </mods:accessCondition>
Metadata generation • AIP creation
Metadata Generation • AIP Creation
Metadata generation • AIP creation
Metadata Generation • AIP creation CC0 - Creative Commons Creative Commons Attribution Unported 3.0 License Creative Commons Attribution-NoDerivs Unported 3.0 License Creative Commons Attribution-NonCommercial-ShareAlike Unported 3.0 License Creative Commons Attribution-ShareAlike Unported 3.0 License Creative Commons Attribution-NonCommercial Unported 3.0 License Creative Commons Attribution-NonCommercial-NoDerivs Unported 3.0 License
Metadata Generation • AIP creation
Metadata Generation • AIP creation PURR Puuuurrrrrrrrrrr….
Metadata generation • Premis event captured <mets:digiprovMD ID="METS-digiprovMD-premis-event-unpacking-20130312T112352- processId-17937-seq-1"> <mets:mdWrap MDTYPE="PREMIS:EVENT"> <mets:xmlData> <premis:event> <premis:eventIdentifier> <premis:eventIdentifierType>HUBzero</premis:eventIdentifierType> <premis:eventIdentifierValue>premis-event-unpacking- 20130312T112352-processId-17937-seq-1 </premis:eventIdentifierValue> </premis:eventIdentifier> <premis:eventType>unpacking</premis:eventType> <premis:eventDateTime>2013-03-12T11:23:52+00:00 </premis:eventDateTime> <premis:eventDetail>tool: HUBzero</premis:eventDetail> <premis:eventOutcomeInformation> <premis:eventOutcome>unpackaged</premis:eventOutcome> </premis:eventOutcomeInformation>…
Metadata generation • Mets wrapper & DC Terms <mets:dmdSec ID="METS-dmdSec-doi__10.5072__FK250925"> <mets:mdWrap MDTYPE="DC"> <mets:xmlData> <mets:dcterms> <dcterms:creator>Amy Barton</dcterms:creator> <dcterms:date>2013-01-07T16:40:43-05:00</dcterms:date> <dcterms:description>projectName: Metadata Project</dcterms:description> <dcterms:description>projectAlias: metadata</dcterms:description> <dcterms:description>publicationState: Draft under review</dcterms:description> <dcterms:description>publicationVersion: 1</dcterms:description> <dcterms:description>abstract: A metadata workshop was developed based on subject liaison librarians’ feedback in a Qualtrics survey. </dcterms:description> <dcterms:description>notes: The dataset contains survey data.</dcterms:description> <dcterms:description>synopsis: Subject Librarian survey and resulting metadata workshop.</dcterms:description> <dcterms:format>BagIt</dcterms:format> <dcterms:identifier>doi:10.5072/FK250925</dcterms:identifier> <dcterms:publisher>Purdue University Research Repository</dcterms:publisher> <dcterms:rights>CC0 - Creative Commons</dcterms:rights> <dcterms:subject>Instruction</dcterms:subject> <dcterms:subject>Metadata</dcterms:subject> <dcterms:subject>Survey data</dcterms:subject> <dcterms:subject>Library Science</dcterms:subject> <dcterms:title>Metadata Madness Workshop:</dcterms:title> <dcterms:type>dataset</dcterms:type> </mets:dcterms> </mets:xmlData> </mets:mdWrap> </mets:dmdSec>
Metadata generation Mets technical section
Metadata generation Mets technical section
Metadata generation Mets rights
Metadata generation • Mods dataset ownership
Metadata generation Premis agent
Metadata generation Premis Event
Metadata generation Mets files and structure map
PURR Puuuurrrrrrrrrrr….
Want to learn more? Purr contacts • Visit https://purr.purdue.edu/ • Digital Data Repository Specialist: Courtney Matthews at matthew6@purdue.edu
Special thanks to: • Neal Harmeyer, Digital Archivist • Brandon Beatty, Digital Library Software Developer • Courtney Matthews, Digital Data Repository Specialist • Mark Fisher, Digital Library Software Developer
References • Faniel, Ixchel M., Zimmerman, Ann (2011) “Beyond the Data Deluge: A Research Agenda for Largee-Scale Data Sharing and Reuse.” The International Journal of Digital Curation 6(1): 59 • Lee, C., and Tibbo, H. “Digital Curation and Trusted Repositories: Steps toward Success” (2007). Journal of Digital Information. http://journals.tdl.org/jodi/index.php/jodi/article/view/229/183 • Klimeck, G., McLennan, M., Brophy, S.P., Adams, G.B., & Lundstrom, M.S.(2008). “nanoHUB.org: Advancing Education and Research in Nanotechnology,” Computing in Science and Engineering,10(5): 17, 19, 21 • Witt, M., (2012).“Curation Service Models: Purdue University Research Repository” Libraries and Staff Presentations. Paper 3. http://docs.lib.purdue.edu/lib_fspress/3 • Witt, M. (2012). Co-designing, Co-developing, and Co-implementing an Institutional Data Repository Service. Journal of Library Administration, 52(2). DOI:10.1080/01930826.2012.655607