1 / 22

Appraisal and Selection of Scientific Data for Long-Term Archive: A Case Study

Learn about the importance of appraisal and selection for long-term preservation of scientific digital data. Discover the benefits, challenges, and cost assumptions associated with managing and curating scientific data for future use.

hettie
Download Presentation

Appraisal and Selection of Scientific Data for Long-Term Archive: A Case Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robert R. Downs Senior Digital Archivist; SEDAC Archives Manager Robert S. Chen Director & Senior Research Scientist; SEDAC Manager; CODATA Secretary-General W. Christopher Lenhardt Associate Director, Information Services; Deputy Director, SEDAC Center for International Earth Science Information Network (CIESIN)Socioeconomic Data and Applications Center (SEDAC)The Earth Institute at Columbia University 16 May 2007 IASSIST 2007: Building Global Knowledge Communities with Open DataMcGill University, Montréal Québec, CANADAAppraisal and Selection of Scientific Data for the Long-Term Archive: A Case Study

  2. Scientific communities use digital data to advance knowledge Scientific digital data are increasingly integral to scientific progress Current software is used to create, render, analyze, and share data in digital form with others Print cannot offer analytical capabilities of software (e.g., GIS) Data volumes are growing exponentially and are becoming increasingly complex Scientific digital data are at risk if not properly preserved and curated Data must be correct, complete, documented, and described for a range of potential future users Information systems technology evolution is inevitable and is not always anticipated in a timely manner Digital and optical media deterioration occurs even under ideal conditions Need to support future requirements for discovery, access, and use Need for Management and Long-Term Preservation of Scientific Digital Data

  3. Over time, many more data sets will be archived and curated Economies of scale will reduce costs of curation, especially for large, relatively homogeneous data sets Nevertheless, each additional data set to be curated does increase the cost of operating the long-term archive at the margin For now, cost drivers are personnel costs; technology costs stable or decreasing Future budget limitations could put curated data at risk of loss, or at least lead to reductions in desirable levels of accessibility and support Long-term archiving requires a commitment to incur continuous costs in the future and therefore requires careful analysis of what ought to be included, i.e., selection and appraisal Cost Assumptions for Long-Term Archiving of Scientific Digital Data

  4. Appraisal and selection ensures that: The long-term archive (LTA) will contain quality data resources that have been appraised as possessing enduring value The high quality of scientific data resources can help justify archival costs during times of constrained budgets Limited long-term archive resources can be focused on the most important scientific data rather than spread across all data Plans for providing services for each data set can take into consideration potential future needs and use Potential future value of data can be identified and documented during the appraisal process Preparation of scientific data for appraisal and selection provides added value to data Requirements and priorities for planning long-term archive infrastructure and services take into account differences in data and their potential use Benefits of Appraisal and Selection of Scientific Data for Accession to the LTA

  5. Documented appraisal process Ongoing review and improvement of appraisal criteria and process Defined categories and choices for decisions Community-based selection criteria Efficient process for appraisal and selection Diversified stakeholder representation on selection committee Ideal Qualities for Appraisal and Selection of Scientific Data for Long-Term Archiving

  6. SEDAC The Socioeconomic Data and Applications Center is operated for NASA by the Center for International Earth Science Information Network (CIESIN), a unit of the Earth Institute at Columbia University SEDAC Active Archive Publicly accessible online scientific data products and services relevant to the needs of the interdisciplinary community interested in human dimensions of the environment SEDAC Long-Term Archive An LTA established in collaboration with the Columbia University library system to archive and curate selected data from the SEDAC Active Archive Mission: The SEDAC Long-Term Archive acquires, preserves, and maintains the content of selected high-quality data, data products, documentation, and services relevant to human dimensions of global change in a digital form to support the discovery, access, and use of archived resources by scientific, educational, and decision-making communities for at least the next 50 years. The SEDAC LTA: A Case Study for Appraisal and Selection of Scientific Data

  7. Older SEDAC Data Need a Long-Term Home More than 180 citations of GPW versions 1 and 2 http://sedac.ciesin.columbia.edu/gpw/

  8. Draft SEDAC Long-Term Archive Management and Operations Plan (September 23, 2005): The SEDAC Long-Term Archive (LTA) Board receives nominations from the SEDAC Lead Project Scientist for SEDAC resource to be submitted to the LTA and evaluates each resource for potential accession. The Board will consider the Selection Criteria for Submission of SEDAC Resources to the LTA to appraise each nominated resource for submission to the LTA and to identify the level of service and the retention schedule to be assigned to the resource. Nominations not containing sufficient information for appraisal shall be returned to the Lead SEDAC Project Scientist for insufficient evidence. Rejections shall be returned with an explanation of the deficiencies and the criteria that the nomination did not meet. Selection and Appraisal of SEDAC Resources for Accession into the SEDAC LTA

  9. LTA Board established with representation from SEDAC, the Earth Institute, and the Columbia University Libraries: SEDAC Project Scientist SEDAC Systems Engineer SEDAC Archives Manager (serves as Chair) Two representatives designated by Earth Institute Two representatives designated by Columbia University Libraries If SEDAC discontinues operations at Columbia University CIESIN will designate a replacement for one SEDAC position Columbia University Library will appoint replacements for the other two positions, including the chair SEDAC LTA Board Representation

  10. SEDAC User Working Group (UWG) reviews and approves data for SEDAC Active Archive dissemination SEDAC scientist identifies candidate scientific data and nominates data to the Lead SEDAC Project Scientist for dissemination Data described and presented to UWG with recommendation SEDAC UWG approves data for dissemination by SEDAC Active Archive SEDAC Active Archive data recommended for LTA SEDAC Active Archive data considered for transfer to LTA Plan for data is described with rationale to justify recommendation Nominated data recommended to UWG for transfer to the LTA SEDAC LTA Board appraises and selects data for LTA Data set recommended with proposed services to LTA Board LTA Board appraises data and reviews service recommendations LTA Board approves data for accession to the LTA LTA Board approves preservation and service levels for data Decision Practice for Appraisal and Selection of Scientific Data for SEDAC LTA

  11. Decision Path for Submission of Scientific Data to the SEDAC LTA SEDAC LTA Board approves plan for submission of data to LTA SEDAC Lead Project Scientist recommends plan for submission of data to LTA SEDAC User Working Group accepts plan to transfer data to LTA SEDAC Lead Project Scientist recommends plan to transfer data to LTA SEDAC User Working Group approves plan for dissemination SEDAC Lead Project Scientist recommends plan for dissemination SEDAC Scientists review and identify scientific data

  12. Reviewed literature, conducted research on requirements for digital preservation of scientific data, and participated in workshops on scientific data stewardship and digital preservation Reviewed existing policies and appraisal criteria: Library collections development records management and appraisal criteria traditional archives, scientific data centers, digital archives LTA Board reviewed and revised drafts Broad perspectives from diverse experiences represented Ongoing review by the LTA Board for current relevance and applicability to appraisal practice Development of Criteria for Selection of Data for the SEDAC LTA

  13. Scientific or Historical Value citation, research, and educational use as published in refereed scientific publications/reports from recognized committee of scientists Potential Usability and Use evidence of usability, usefulness, and sufficient usage by the community interested in human dimensions of the environment. Adequate evidence indicate potential for future use justifies costs of long-term archiving Uniqueness of Data (non-redundant stewardship) not being preserved in any form in another archive and is at risk of loss if not accessioned into the Long-Term Archive Relevance to LTA Mission currently endorsed or approved by community interested in human interactions in the environment. For the short-term, relevance includes content germane to SEDAC mission and SEDAC strategic plan Documented for Accessibility completeness and correctness of documentation to facilitate future discovery, access, and use Technological Accessibility (feasibility) received in format meeting technical criteria for the Service Level designated for the resource Legality and Confidentiality unrestricted permissions for preservation and future dissemination. No information that is confidential or prohibited from dissemination Non-Replicability data replication not feasible, excessively costly or prohibitive Summary of Current Selection Criteria for Accession to SEDAC Long-Term Archive

  14. Preservation Services Preserve Content in Original Formats Preserve and Maintain Content in Supported Formats Dissemination Services Restricted Dissemination Public Dissemination Current Services Assigned for SEDAC LTA Data

  15. Preserve Content in Original Formats Content is maintained in Original Formats on accessible system for the specified retention period. Preserve and Maintain Content in Supported Formats Content received in Supported Formats is maintained on accessible system and is migrated to current Supported Formats. Supported Formats: ASCII Text (txt, xml, html) Comma Separated Values (csv) Image Files (png, jpg, tif, gif) Portable Document Format (pdf) Current Levels of Services for Preservation of Data in the SEDAC LTA

  16. Restricted Dissemination: The resource and its Dissemination Information Package (DIP) are not accessible by the public. The discovery metadata for the resource is included in the LTA restricted access catalog. Access to the restricted resource is granted in compliance with the restrictions specified for the resource. Limited user support is provided for a restricted resource that is authored by SEDAC in compliance with the restrictions specified for the resource. The use of restricted resources and services is evaluated and reported. Public Dissemination: The Dissemination Information Package (DIP) for the resource is freely accessible by the public in digital form. The discovery metadata for the resource is included in the LTA public access catalog. Contact information for user support is provided on the LTA public access catalog. Responses are provided for legitimate requests to correct publicly disseminated documentation or access capabilities. Answers or referrals are provided for scientific and technical questions about publicly disseminated resources. Changes are described for publicly disseminated resources or services. The use of publicly disseminated resources and services is evaluated and reported. Current Levels of Services for Dissemination of Data in the SEDAC LTA

  17. Preservation and Dissemination Services for SEDAC Data Approved for Accession to the LTA

  18. Candidate data sets recommended for accession to the LTA by the SEDAC Project Scientist and currently being prepared for review by the LTA Board. Gridded Population of the World (GPW) Version 2 Gridded Population of the World, Version 2: Ancillary Data Candidate Data for SEDAC Long-Term Archive

  19. Selected data sets are transferred from the SEDAC Active Archive to the SEDAC Long-Term Archive (LTA) The SEDAC LTA accessions, preserves, and disseminates each selected data set in accordance with the preservation and dissemination services approved for that data set. The SEDAC Active Archive deaccessions each data set that has been accessioned into the SEDAC LTA once that data set has been disseminated by the LTA. Overview of Process for Transferring Selected Data to the SEDAC LTA

  20. SEDAC Data Repository Organization SEDAC Digital Object Repository SEDAC Active Archive Data and Information Products SEDAC Long-Term Archive Data and Information Products Public Access to Data and Information Restricted Access to Data and Information Public Access to Data and Information Restricted Access to Data and Information Active Archive is for near-term dissemination with high levels of service. Primary users are discipline-specific scientists. Long-Term Archive is for the 50 – 100 year preservation time-frame with different expectations for levels of service.

  21. Consistent with Open Archival Information System (OAIS) Framework Meeting Responsibilities for the Long-Term Preservation of Data for Future Access and Use Prompted Prototype Implementation of Fedora Open Source Digital Repository System Submission Information Packages (SIPs) Prepared for Data Approved for Submission to the LTA Unique Persistent Identifiers (PIDs) Generated for each Digital Object (LTA Data Set) Digital Object Contains Content and Metadata Datastreams for an OAIS Compliant Archival Information Package (AIP) Changes to Digital Objects Stored as New Versions Ingest and Management of Various Data Types Web-Based Dissemination of Content and Metadata (Dublin Core and FGDC CSDGM) Search Supported by Resource Indexing Objects Assigned Behavior Definition and Dissemination Methods Ingest, Store, and Export in Extensible Markup Language (XML) Collection and Object Relationship Management Using Resource Description Framework (RDF) Graphs Prototype Assessment Revealed Need to Implement VITAL Product From VTLS VITAL is based on Fedora and includes integrated features and support Web-Based Discovery and Access for Public Browsing, and Simple, Advanced, and Full-Text Search Capabilities Web-Based Administration and Content Manager Client for Staff VITAL Batch Ingest Utility VTLS Automated Loading and Electronic-submission Tool (VALET) Enables Workflow, Author Submission, Cataloging, and Review Lightweight Directory Access Protocol (LDAP) Authentication Server JStore Harvard Object Validation Environment (JHOVE) Handle system server assigns unique identifiers that are resolved to URLs Search Retrieve Web / URL (SRW/SRU) and Z39.5 services Generation of SHA-1 Fixity Signatures on Ingest for Integrity Validation Synchronized Failover System for Contingent System and Data Recovery Also Reviewing Requirements to Adopt PREMIS and GML Conclusions: Appraisal and Selection Process Contributed to Adoption of an OAIS Compliant Digital Repository System

  22. http://sedac.ciesin.columbia.edu/lta/ SEDAC Long-Term Archive

More Related