130 likes | 272 Views
Indiana University. Data Publishing Service. Stacy Kowalczyk. April 9, 2010. Questions. Which phases of the data life cycle are managed by your repository? How do data management requirements differ across the data life cycle? What systems do you use to support the data life cycle?
E N D
Indiana University Data Publishing Service Stacy Kowalczyk April 9, 2010
Questions • Which phases of the data life cycle are managed by your repository? • How do data management requirements differ across the data life cycle? • What systems do you use to support the data life cycle? • Can you generalize the mechanisms used to migrate data between different phases of the data life cycle?
Data Publishing Service • A new service of the IUScholarWorks institutional repository and the Scholarly Data Services • Providing data management support and data access • Data will have a persistent URL so it can be linked to publications • The service will combine our DSpace repository with IU’s Scholarly Data system (formerly known as MDSS), a system that researchers are already uses • Allows discovery over the Web • Preservation – bit level
Current Data Lifecycle Model Implementation Scholarly Data Service IU ScholarWorks Preservation of data storage of data migration to suitable format/medium metadata creation ↓Distribution/publication of data ↓Re-use of data by same researcher by other researchers Data creation research design data management planning data collection (surveying, experimentation, measuring etc.) data checking and cleaning ↓Data analysis analysis derived data creation creation of data documentation ↓End of research research outputs preparing data for preservation http://www.data-archive.ac.uk/sharing/lifecycle.asp
Scholarly Data Service • Massive Data Storage System • Current system for research data storage • Installed in 1998 • Based on IBM developed High Performance Storage System (HPSS) software • It offers over 2.8 petabytes of disk- and tape-based storage. Distributed between Indianapolis and Bloomington campuses
Bloomington Users Indianapolis Users HPSS Movers HPSS Movers Disk Arrays Disk Arrays Tape Library Tape Library IUB Campus Network IUPUI Campus Network TCP/IP Wide Area Network Distributed between IUB and IUPUI IUB Subsystem IUPUI Subsystem HPSS Core Servers Research Network Research Network SAN SAN
Data Publishing in IU Scholarworks • Discovery and access of datasets and related publications through the IUScholarWorks Repository service • DSpace records that are searchable, indexed, and harvested and available at stable URLs • DSpace records that contain DSpace bitstreams for small datasets • DSpace records that link via stable URLs to large datasets in IU MDSS
IUScholarWorks Data: Linking to MDSS and delivery via HTTP HTTP Server Item record with URL’s of datasets in MDSS IU MDSS hpssfs filesystem MDSS web server
Data Publishing in IU Scholarworks • Facilitating the submission process for both the researcher and collection manager • We facilitate the process for submitters via the DSpace Configurable Submission system • We facilitate the data collection manager’s process via steps in the DSpace workflow system
IUScholarWorks Data: Item submission user interface Phase 2, automated workflow DSpace Configurable Submission System Instructions and preparation Describe item metadata form(s) MDSS and dataset info/form File upload step Review step Non-interactive processing steps Finalize/ Accept License Update metadata IU MDSS Initiate MDSS actions (move datasets, etc.) Query MDSS technical metadata (checksum, etc.)
Planning for a More Curated Life Cycle Model http://libraries.mit.edu/guides/subjects/data-management/cycle.html
Active and Social Curation • Engage researchers during projects not at the end • Use immediate benefits to drive automatic capture and 'volunteering’ of metadata • Reduce costs by re-engineering curation processes to leverage this rich metadata and volunteered effort
Data Curation Lifecycle Elements Active Curation OAIS Repository Federation Curation Boundary Automated Curation Workflow/Rule Engine Metadata Management Data Acquisition, Analysis and Simulation Scholarly Communication Operates on Metadata, Content Objects and Trigger Events DDI3. METS, PREMIS, MODS, DC, SensorML, OGC, … Ingest scripts: fixity, integrity, authentication, transformation Ingest, AIPs Trusted Digital Repository Federation (OAIS compliant) Appraisal and Selection Active Data Systems Compound Objects - OAI-ORE Preservation Actions Dissemination Packages Wide-Area File System Search, Browse, Annotation, Visualization Tools Migration and Emulation Tools Use, Reuse, Repurposing Tools Access Mechanisms and E-Scholarship Services Contributor User