1 / 36

Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science

Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science. Presenter Name. Patricia Cruse, Director, Digital Preservation Program California Digital Library Coalition for Networked Information Spring 2009 Task Force Meeting. Responding to the NSF solicitation.

midori
Download Presentation

Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science Presenter Name Patricia Cruse, Director, Digital Preservation Program California Digital LibraryCoalition for Networked Information Spring 2009 Task Force Meeting

  2. Responding to the NSF solicitation …new methods, management structures and technologies to manage the diversity, size, and complexity of… data sets and data streams (NSF 2007) Mound(s) built by termites

  3. Scientific challenges and data needs • Global change is a complex scientific and societal challenge • Community needs good data • Good data… • build good science • make possible wise management • enable sound decisions • Good data need… • solid technical infrastructure • sound organization • community engagement

  4. Outline to today’s talk Complexities of global change Challenges for cyberinfrastructure and data intensive research A solution: DataONE

  5. The challenges of global change Smith, Knapp, Collins. In press.

  6. Critical areas in the Earth’s system

  7. Human impacts on land-based ecosystems Ecosystems and Human Well-Being

  8. Human impacts on the world’s oceans Halpern et al. 2008 Science 319.

  9. The move to cross-disciplinary forms of collaboration Unidisciplinary- researchers from a single discipline work together to address a common problem Multidisciplinary - researchers from different disciplines work independently or sequentially, each from his or her own disciplinary-specific perspective, to address a common problem Interdisciplinary - researchers from different disciplines work jointly to address a common problem and although some integration of their diverse perspectives occurs, participants remain anchored in their own fields Transdisciplinary - researchers from different disciplines work jointly to create ashared conceptual framework that integrates and moves beyond discipline-specific theories, concepts, and approaches, to address a common problem (Rosenfield, 1992)

  10. Intensive science sites and experiments Extensive science sites Volunteer & education networks Remote sensing Building the knowledge pyramid Increasing Spatial Coverage Increasing Process Knowledge Adapted from CENR-OSTP

  11. Outline to today’s talk Complexities of global change Challenges for cyberinfrastructure and data intensive research DataONE: A solution

  12. Data challenge 1: dispersed sources (“finding the needle in the haystack”) Data are massively dispersed Ecological field stations and research centers (100’s) Natural history museums and biocollection facilities (100’s) Agency data collections (100’s to 1000’s) Individual scientists (1000’s to 10,000s to 100,000s)

  13. Data challenge 2: diversity “the flood of increasingly heterogeneous data” Data are heterogeneous Syntax (format) Schema (model) Semantics (meaning) Jones et al. 2007

  14. Data challenge 3: poor practice“data entropy” Time of publication Specific details General details Retirement or career change Information Content Accident Death Time (Michener et al. 1997)

  15. Data challenge 4: loss Natural disaster Facilities infrastructure failure Storage failure Server hardware/software failure Application software failure External dependencies (e.g. PKI failure) Format obsolescence Legal encumbrance Human error Malicious attack by human or automated agents Loss of staffing competencies Loss of institutional commitment Loss of financial stability Changes in user expectations and requirements Source: S. Abrams, CDL

  16. Transient information or unfilled demand for storage Data challenge 4: more loss Information Petabytes Worldwide Available Storage Source: John Gantz, IDC Corporation: The Expanding Digital Universe

  17. Cumulative impact: data longevity Koehler, W. (2004) Information Research 9(2): 174.

  18. Outline to today’s talk Complexities of global change Challenges for cyberinfrastructure and data intensive research DataONE: a solution Building on existing cyberinfrastructure Creating new cyberinfrastructure Developing new communities of practice

  19. Key components Vision: enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it by: • engaging the scientist in the data curation process • supporting the full data life cycle, • encouraging data stewardship and sharing • promoting best practices • engaging citizens • developing domain agnostic solutions

  20. Data types • Biological (genes to biomes) • Environmental • Atmospheric • Ecological • Hydrological • Oceanographic

  21. Existing biological data archives ESA’s Ecological Archive Distributed Active Archive Center National Biological Information Infrastructure Fire Research & Management Exchange System Long Term Ecological Research Network Knowledge Network for Biocomplexity 22

  22. Examples of data holdings Metadata Interoperability Across Data Holdings EML=Ecological Metadata Language BDP=Biological Data Profile DIF=Directory Interchange Format OGIS=OpenGIS DwC=Darwin Core DC=Dublin Core 23 ECHO=EOS ClearingHOuse DCsubset=Dublin Core subset

  23. Providing one-stop shopping for data Simple Pilot Catalog Interface (searches entire metadata record) 40,000 Records NBII Metadata Clearinghouse (31,864) Long Term Ecological Research(LTER)Network (6,897) ORNL Distributed Active Archive Center for Biogeochemical Data (810) Large Scale Biosphere-Atmosphere Experiment in Amazonia (LBA) (783) Organization of Biological Field Stations (124) Inter-American Institute for Global Change Research (IAI) (79) MODIS and ASTER Products (LPDAAC) (38) National Phenology Network (USANPN) (29)

  24. Existing cyberinfrastructure: enabling tools 25

  25. Building new global cyberinfrastructure

  26. New distributed framework • Coordinating Nodes • retain complete metadata catalog • subset of all data • perform basic indexing • provide network-wide services • ensure data availability (preservation) • provide replication services Flexible, scalable, sustainable network • Member Nodes • diverse institutions • serve local community • provide resources for managing their data

  27. Building new global cyberinfrastructure

  28. New partnerships libraries & digital libraries academic institutions research networks NSF- and government-funded synthesis & supercomputer centers/networks government agencies international organizations data and metadata archives professional societies NGOs commercial sector 29

  29. Building global communities of practice… creating long-lived cyberinfrastructure enterprises • Community engagement • Involve cultural memory organizations • Include science educators • Engage citizens and new generations of students in best practices • Build on existing programs • Transparent, participatory governance • Adoption/creation of innovative business

  30. Best Practice Guide Using Metadata for e-research 5 in a series Gold Star Data Management Plan Here’s How Best Practice Guide How to Cite Your Data 6 in a series Best Practice Guide How to Cite Your Data 6 in a series Education and training • Career Long Learning: • best practice guides • exemplary data management plans • podcasts, web-casts • workshops and seminars • downloadable curricula

  31. Engaging citizens in science www.CitizenScience.org

  32. Individual participation Individuals can: • contribute data and metadata • participate in Working Groups • join the software development community • partake of on-line learning opportunities • develop curricula and training materials • become a DIUG member (DataONE International Users Group) Byron Kim, Berkeley, 2003

  33. become a DataONE Member Node receive data-life-cycle software access to training materials join in establishing standards join the DataONE International Users Group set future directions join the software development community contribute curricula and training materials Organizational participation Libraries, research networks, agencies can:

  34. Summing up… Complexities of global change and challenging data environment DataONE will: Create technical infrastructure Develop organizational infrastructure Engage the community Meet the data challenges Dispersed data: facilitating discovery Data diversity: providing tools to manage data Data practice: developing an informatics literate community Data loss: developing infrastructure

  35. We welcome your involvement! • DataONE Management Team: • William Michener, PI, University of New Mexico • Suzie Allard – University of Tennessee • Bob Cook – Oak Ridge National Laboratory DAAC • Patricia Cruse, California Digital Library • Mike Frame – USGS, National Biological Information Infrastructure • Viv Hutchinson, USGS, • Matt Jones – University of California Santa Barbara • Steve Kelling – Cornell Lab of Ornithology • Kathleen Smith, UNC • DataONE Partners • Kepler-CORE • SEEK/KNB Teams

More Related