160 likes | 174 Views
This proposal outlines the creation of a highly distributed network for earth observation data, linking existing systems and focusing on long-term sustainability in ecology, hydrology, taxonomy, genetics, and more. The network aims to provide powerful data access, support synthesis in earth observation sciences, and preserve data for long-term studies. Collaboration with various data sources and providers ensures a comprehensive range of data for researchers. The infrastructure will be designed to support the scientific lifecycle and emphasize free and open-source software.
E N D
A Proposal for a Distributed Earth Observation Data Network • Matthew B Jones • UC Santa Barbara • National Center for Ecological Analysis and Synthesis (NCEAS) • Presentation at TDWG 2008 • Freemantle, Australia
Critical Areas in the Earth System Where local or regional changes may have strong effects on earth system interactions, feedbacks, or connections
Knowledge Network for Biocomplexity Data Distribution Many existing but unlinked data networks and federations in ecology, hydrology, taxonomy, genetics, vegetation science, oceanography, atmospheric science, ...
DataNetONE • DataNetONE (Observation Network for Earth) • Michener, Cook, Frame, Hampton, Smith, Allen, Horsburgh, Jones, Sandusky, Scherle, Servilla, Vieglais, Wilson, Allard, Buneman, Butler, Cobb, Cruse, Deelman, DeRoure, Duke, Goble, Hobern, Honeyman, Hutchison, Kelling, Kranowitch, Kunze, Ludaescher, Normore, Pereira, Pouchard, Tenopir, Weltzin, Von Welch • Highly distributed network of earth observational data • Linking existing systems • Focus on long-term sustainability (30+ years time horizon) • Technical sustainability • Financial sustainability • Mostly focused on production infrastructure • Continual evaluation and incorporation of research findings
Cyberinfrastructure Objectives • Support synthesis in earth observation sciences • Preserve data for long-term studies • Powerful data access to distributed Member Nodes • Support full lifecycle of scientific process • Design goals • Distributed management at Member Nodes • Replication and caching for preservation and performance • Software must provide benefits for scientists today • Evolution of software and standards • Support and adapt existing community software efforts • Emphasize Free and Open Source Software
What are the data/sources/providers? • Biological (genome to ecosystem) • Environmental • Atmospheric • Ecological • Hydrological • Oceanographic • Taxonomic • Sources: • Scientists • Research networks • Environmental observatories • Citizen groups
What are the data/sources/providers? • Sources/Providers: • US and international Long Term Ecological Research Programs • Biological specimens associated with museums, herbaria • Observational data relating to invasive species, infectious diseases, wildlife and fisheries, and habitat • Natural resources and conservation data collected by US and international Parks System • Global and continental land cover/land change and biogeochemical data
Overview of Components • Member Nodes • Earth observing institutions, projects, and networks • Provide resources for their own data and replicated data • Focused on serving their constituencies • Coordinating Nodes • Provide network-wide services to Member Nodes • Geographically replicated services • Investigator Toolkit • Tools for researchers to access DataNetONE • General Purpose and discipline-specific tools • Adapt existing tools where possible
Common Service Interface • DataNetONE Service Interface • Federated Identity and Authorization Services • Object Management Services • Discovery and Usage Services • Preservation Services • Network Services
What is the Investigator Toolkit? • Suite of software tools for researchers • Principal mode of interaction with the network • Design goals • Emphasize Free and Open Source, but support commercial • General analysis frameworks (e.g., R, MATLAB) • Domain-specific tools (e.g., GARP, Phylocom) • Organized using scientific workflows • Communication via the Service Interface
Toolkit Functions • Supports the scientific lifecycle • Data management and preservation • Data query and access • Data analysis and visualization • Process management and preservation • Portal software
Longevity: organization & community • Broad, active community engagement • Library educators engaging new generations of students • Existing outreach and education • e.g., citizen science portals, NCEAS, NESCent, etc. workshops • Strong organizational sustainability • 30 years providing access to ecological data, biodiversity data, etc. • More than 100 years experience for participating libraries