1 / 24

Cyberinfrastructure Overview

Cyberinfrastructure Overview. Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California, Santa Barbara DataONE Kick-off Meeting October 20-22, 2009. Cyberinfrastructure Objectives.

mandar
Download Presentation

Cyberinfrastructure Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California, Santa Barbara DataONE Kick-off Meeting October 20-22, 2009

  2. Cyberinfrastructure Objectives • Support synthesis in earth observation sciences • Support full lifecycle of scientific process • Data acquisition and management • Data preservation • Data discovery and access • Data integration • Data analysis and visualization • Process management and preservation • Evolve to accommodate technology change

  3. Design goals • Distributed management at Member Nodes • Replication and caching for preservation and performance • Software must provide benefits for scientists today • Evolution of software and standards • Support and adapt existing community software efforts • Emphasize Free and Open Source Software

  4. What data are in scope? • Biological • e.g., Gene, Organism, Population, Species, Community, Biome, Ecosystem • Environmental • e.g., Atmospheric, Chemical, Ecological, Hydrological, Oceanographic, Physical • Social • e.g., Land use, human population • Economic • e.g., trade, ecosystem services, resource extraction

  5. Who are the providers and consumers? • Providers • Academic and Agency Scientists • Research networks • Environmental observatories • Citizen groups • Students • Consumers • Academic and Agency Scientists • Research networks • Environmental observatories • Citizen groups • Students Same people, different roles driving needs

  6. Metadata and data integration • Every community has • multiple metadata schemas • Biological Data Profile, Darwin Core, Dublin Core, Ecological Metadata Language, Open GIS schemas • multiple data formats • ASCII, NetCDF, HDF, GeoTiff, ... • Some communities have general and domain specific ontologies • Addressing this heterogeneity is critical • Integrated analysis of datasets requires • Syntax mapping • Semantics mapping • Sophisticated integration tools that do not exist

  7. Integrating with existing infrastructure KNB, ESDIS, and Waters Networks

  8. Overview of Components • Member Nodes • Earth observing institutions, projects, and networks • Provide resources for their own data and replicated data • Focused on serving their constituencies • Coordinating Nodes • Provide network-wide services to Member Nodes • Geographically replicated services • Investigator Toolkit • Tools for researchers to access DataNetONE • General Purpose and discipline-specific tools • Adapt existing tools where possible

  9. Node Design • Member nodes • Geographically Distributed Nodes • Authoritative repository for many datasets • Diversity tolerant (less tightly coordinated) • Freedom to try new tools, methods, and leapfrog forward • Partial replication • Coordinating nodes • Completely replicated • Complete metadata catalogue • Data Subset (initially a large fraction) • Tightly coordinated, stable service platform

  10. DataONE Service Interface • Federated Identity and Authorization Services • Object Management Services • Discovery and Usage Services • Preservation Services • Network Services

  11. Service Interface for Interoperability • Create common access methods for different clients • Create a mechanism to map heterogeneous services • Provide an interface between nodes and service requests • Simplicity of construction • Lightweight • Ease of implementation • Implementations are opaque to service consumers

  12. DataNetONE Components

  13. What is the Investigator Toolkit? • Suite of software tools for researchers • Emphasize Free and Open Source, but support commercial • General analysis frameworks (e.g., R, MATLAB) • Domain-specific tools (e.g., GARP, Phylocom) • Organized using scientific workflows • Supports the scientific lifecycle • Data management and preservation • Data query and access • Data analysis and visualization • Process management and preservation • Communication via the Service Interface

  14. Toolkit Functions • Supports the scientific lifecycle • Data management and preservation • Data query and access • Data analysis and visualization • Process management and preservation • Portal software

  15. Who will build the Toolkit? • Many existing open source efforts exist • Data management: MATT, UDig, Specify • Analysis and modeling: R, Octave • Workflow systems: Kepler, Taverna, Triana, Pegasus • Grid systems: Condor, Globus, BOINC • Data and workflow portals: VegBank, myExperiment • Commercial tools important too • MATLAB, SAS, ArcGIS • DataONE: help communities build their own tools • Integrate, interoperate, stabilize • Create libraries to DataONE Service Interface

  16. Data Management and Preservation • Data management functions • Data creation, input, editing, versioning • Metadata creation, editing, annotation • Local data storage, indexing, searching • Example applications • Morpho metadata editor • Mercury metadata editor • MATT metadata editor • ESRI ArcCatalog • Metacat Data Server -- lab group data management

  17. Data Analysis and Visualization • Need community-standard analysis frameworks • R, Octave, GRASS • SPlus, MATLAB, ArcGIS • Thousands of domain-specific analytical tools exist • GARP: Genetic Algorithm for Rule Processing • Blast search • ClustalW • Phlylocom • Mesquite

  18. Workflow system capabilities • Workflow systems: • Enable communication • Support preservation of scientific processes • Enable component re-use • Allow integration across many software frameworks • Example workflow engines • Kepler, Taverna, Pegasus, Triana

  19. Community tools have been successful • Investigator Toolkit will build upon these successes • Adapt tools to work together with Service Interface • Support Free and Open Source Software • Supported tools will build over time

  20. DataONE discovery portals • Data discovery portal at Coordinating Nodes • Workflow discovery portal at Coordinating Nodes • Other portals as needed

  21. Outstanding issues • Data Discovery, Access, and Availability • Federated Identity, Authentication, and Access Control • Metadata and data standards • Evolution of specifications • Data Integration and Interoperability • Data and Metadata preservation, longevity, and migration • Versioning and identifiers • Scalability

  22. NIH Syndrome • Lots of: • metadata catalogs and specifications • data standards • service definitions • architectures and protocols • Many communities of practice • GEOSS, KNB, CUAHSI, NBII, GBIF, TDWG, Ameriflux, EOS, OGC, W3C, LTER, NEON, OOI • and on and on and on... • DataONE can not just be Community n+1 • Easy to get entrained in the details • Have to save people work • Have to engage groups early and earnestly

  23. I am here NCEAS GEOSS LTER SONet GBIF KNB OGC Kepler TDWG W3C EOS DataONE ME Where are you?

More Related