1 / 22

knb.ecoinformatics seek.ecoinformatics

http://knb.ecoinformatics.org http://seek.ecoinformatics.org. Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.

deidra
Download Presentation

knb.ecoinformatics seek.ecoinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. http://knb.ecoinformatics.org http://seek.ecoinformatics.org Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara

  2. Science Environment for Ecological Knowledge Research Objectives • Access to ecological, environmental, and biodiversity data • Enable data sharing & re-use • Enhance data discovery at global scales • Scalable analysis and synthesis • Taxonomic, Spatial, Temporal, Conceptual integration of data • Address data heterogeneity issues • Enable communication and collaboration for analysis • Enable re-use of analytical components • Collaborators • NCEAS, UNM, SDSC, U Kansas • Vermont, Napier, ASU, UNC

  3. SEEK Components Science Environment for Ecological Knowledge • Kepler • Modeling scientific workflows • EcoGrid • Making diverse environmental data systems interoperate • Semantic Mediation System • “Smart” data discovery and integration • Knowledge Representation WG • Taxon WG • BEAM WG • Education, Outreach, Training

  4. Scientific Workflows • Model the way scientists work with their data now • Mentally coordinate export and import of data among software systems • Workflows emphasize data flow • Output generation includes creating appropriate metadata • The analysis workflow itself becomes metadata • The workflow describes the data lineage as it has been transformed • Derived data sets can be stored in EcoGrid with provenance Query EcoGrid to find data Archive output to EcoGrid with workflow metadata

  5. Kepler: scientific workflows • Collaborative effort of SEEK, SciDAC/SDM, GEON, Ptolemy Project

  6. Kepler understands EML data

  7. Kepler: molecular biology example

  8. SEEK EcoGrid • Goal: allow diverse environmental data systems to interoperate • Hides complexity of underlying systems using lightweight interfaces • We have standardized data via EML, need standard APIs • Integrate diverse data networks from ecology, biodiversity, and environmental sciences • Data systems • Any system can implement these interfaces • Prototyping using: • Metacat, SRB, DiGIR, Xanthoria, etc. • Supports multiple metadata standards • EML, Darwin Core as foci

  9. EcoGrid client interactions • Modes of interaction • Client-server • Fully distributed • Peer-to-peer • EcoGrid Registry • Node discovery • Service discovery • Aggregation services • Centralized access • Reliability • Data preservation

  10. EcoGrid Query Interfaces Result Query • Provides a mechanism for search and retrieval of metadata and federated data • Supports third party interaction with search results – forwarding of result set identifiers to another service instance for retrieval • Different levels of compliance • Low barrier for participation • Bulk of data will be accessible through Type I

  11. Query Interfaces Implemented • Initial prototype to support query and retrieval from: • Storage Resource Broker (SRB) • Metacat • Distributed Generic Information Retrieval (DiGIR) • Xanthoria • Encourage additional experimentation with and feedback based on other system implementations

  12. EcoGrid Query Level I Result Query • Basic, entry level exposure of data and metadata for EcoGrid and SEEK • Response contains data – intended for direct communications rather than 3rd party indirection ResultsetType query(SessionID,QueryType) byte[] get(SessionID,objectID)

  13. Query Conditions Query • Language independent representation of a query structure • Transformed into the appropriate native language of the data store Example: <AND> <condition operator="LIKE“ concept="ScientificName">peromyscus%</condition> <condition operator="NOT EQUALS“ concept="DecimalLatitude">NULL</condition> </AND>

  14. Specifying the Resultset Query • Specify the list of concepts (fields) to be returned in the resultset • Simple paths used to identify elements or document subtrees • Effectively flattens the structure of the records, but allows generic representation Example: <returnfield>/ScientificName</returnfield> <returnfield>/Longitude</returnfield> <returnfield>/Latitude</returnfield>

  15. Full Query Example Query <egq:query queryId="query-digir.1.1" system="http://knb.ecoinformatics.org" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0beta1 ../../src/xsd/query.xsd"> <namespace prefix="darwin">http://digir.net/schema/conceptual/darwin/2003/1.0</namespace> <returnfield>/ScientificName</returnfield> <returnfield>/Longitude</returnfield> <returnfield>/Latitude</returnfield> <title>Peromyscus genus query</title> <condition operator="LIKE" concept="Genus">Peromyscus</condition> </egq:query>

  16. Query Result Set Structure Result <rs:resultset resultsetId="foo.1.1" system="urn:not://sure/what/to/put/here" xmlns:rs="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0beta1 ../../src/xsd/resultset.xsd"> <resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <startRecord>1</startRecord> <endRecord>2</endRecord> <recordCount>2</recordCount> <namespace>http://digir.net/schema/conceptual/darwin/2003/1.0</namespace> <system id="1">http://speciesanalyst.net/digir/DiGIR.php?resource=MammalsDwC2</system> </resultsetMetadata> <record number="1" system="1" identifier="mvz1"> <returnField name="ScientificName">PEROMYSCUS LEUCOPUS NOVEBORACENSIS</returnField> <returnField name="Longitude">100</returnField> <returnField name="Latitude">200</returnField> </record> … </rs:resultset>

  17. EcoGrid Query Level II • More detailed handling of results • Uses RSIDs to identify resultsets- handles that can be passed to a third party RSID search(SessionID,query) Resultset retrieve(SessionID,RSID,start,numrecs) query decodeResultsetIdentifier(SessionID,RSID) statusinfo getResultStatus(SessionID) int transfer(SessionID,sourceURL,destURL,ObjectID)

  18. EcoGrid Write • Used to push data back to sources (e.g. publishing EML documents) • Depends on the availability of an authentication and access control system put(sessionID, objectID, object, type) delete(sessionID,objectID)

  19. Data Instance Query • New requirement to support direct query and retrieval with arbitrary data sets • Generally no common schemas between different instances • Could either • Push data instance to service that can query object (e.g. the SRB) • Implement interface at the data instance location • Simple JDBC / SQL interface? dbSchema getDataSchema(sessionID,objectID) dbResultset search(sessionID,objectID,SQL)

  20. Building the EcoGrid LUQ AND HBR VCR NTL LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) Metacat node SRB node VegBank node DiGIR node Xanthoria node Legacy system

  21. Metadata-driven analysis cycle

  22. Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676. The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research)

More Related