60 likes | 187 Views
GSLIS Research Showcase, 9 April 2010. The Data Conservancy: Research on Data Curation and Repositories. Center for Informatics Research in Science & Scholarship Carole Palmer, PI Melissa Cragin, John MacMullen, Tiffany Chao Allen Renear, Dave Dubin, Simone Sacchi
E N D
GSLIS Research Showcase, 9 April 2010 The Data Conservancy: Research on Data Curation and Repositories Center for Informatics Research in Science & Scholarship Carole Palmer, PI Melissa Cragin, John MacMullen, Tiffany Chao Allen Renear, Dave Dubin, Simone Sacchi Michael Welge & Loretta Auvil, NCSA Led by:
What’s the problem? • Scientists & scholars generate increasingly vast amounts of digital data. • Digital data is extremely fragile; few standards of good practice. • Data are essential raw materials of science and scholarship • Data are valuable institutional, disciplinary, and national assets with tremendous potential for integration and reuse. • Need for repositories of “curated” data Data curation is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship and science. • enable data discovery and retrieval • maintain data quality add value • provide for re-use over time
The Data Conservancy asserts research libraries as core part of emerging distributed network of data collections and services • “Data sets are the new special collections.” (Sayeed Choudhury, personal communication, 2007) • “Data centers are the new library stacks.” (Winston Tabb, JHU Dean of Libraries) Data collections and services consistent with research library mission. Will be like other collections requiring library support and expertise Will need to serve broad academic constituency. flickr.com/photos/001fj/2907653323/ Flickr users: stancia, rh creative commons
Astronomy as an exemplar scientific community Achieved notable success in community data standards, practices, documentation, and associated services for research and learning. DC initial goal - ingest astronomy data into preservation archive, connect data to existing services used by astronomers. ** SDSS 140 TB, 3 times that currently held on JHU campus Demonstrate utility of hosting data in environment that supports existing scientific capabilities in a sustainable manner. Extend to: • life sciences • earth sciences • social sciences
To date, limited support for “small” science Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time will generate 2-3 times more data than Big Science. (‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education, 23/06/2006.) small science data
CIRSS contributions to DC and DataNet Partners Data practices group (Palmer, Cragin, MacMullen, Chao) • comparative analysis concentrating on small science • taxonomies of data types, practices, & curation • criteria for deposition, sharing, quality control • long-term potentials of data Data concepts group (Renear, Dubin, Sacchi) • development of formal terminology, identity conditions forcollections, data sets, versions, and data items • rules that relate collection and data set metadata support development of common collection registry scheme NCSA SEASR group (Welge, Auvil) • extend and advance Software Environment for the Advancement of Scholarly Research – begin with high throughput biology