160 likes | 299 Views
SEEK EcoGrid. Integrate diverse data networks from ecology, biodiversity, and environmental sciences Metacat, DiGIR, SRB, Xanthoria, ... EML is the core for data documentation Open programming interface. EcoGrid client interactions. Aims of EcoGrid. Which, Where, How, Who ????
E N D
SEEK EcoGrid • Integrate diverse data networks from ecology, biodiversity, and environmental sciences • Metacat, DiGIR, SRB, Xanthoria, ... • EML is the core for data documentation • Open programming interface
Aims of EcoGrid • Which, Where, How, Who ???? • Share Data and Information • Relate Data from multiple projects/groups • Crosswalks across data structures • Develop Eco-related Finding Aids for Data • Global User: Authenticate and Authorize • Provide an infrastructure for “Archivable Collection-building” for SEEK scientists • Facilitate the A&M layer and the SMS layer
Challenges of EcoGrid • Data & User Diversity • 6000+ datasets & 1500+ scientists • themes, methods, units,structures • Small data sizes but high complexity - metadata • Multiple Data Organizations • Biodiversity Surveys • Population data • GIS, Satellite Images, Weather Data, … • Ontologies & Taxonomies • Data Discovery: No single place to find • Data Entropy – rapid decline of information on data • Autonomy with Centralized access • Leverage Computational Grid work
Existing services • Metacat – syntactic and semantic metadata querying/inserting/updating/deleting, user registration/authentication, data replication, data/metadata versioning, - supports any XML-based metadata • Xanthoria – common-schema mediator (currently 8 sites) metadata query/insert/update/delete for any XML schema to underlying metadatabase (SQL, native XML)
Existing Systems • DiGIR – querying arbitrary XML-describable resources (underlying data sources can be any type: RDB, XMLDB). • ClimDB – integrating (using wrapping at the data source) diverse format climate data. Access through web, common schema identified beforehand – tabular description • HyperLTER – summary ontology as metadata for images put in as metadata, image extraction /geographicsubsetting/band-level subsetting/ - integration with MODIS images and Hyperspectral images, TM images, airphotos, …
Existing Systems • VegBank – 3 databases co-occurrence records, species taxonomic database that is concept-driven, community classification. Distributed vegbank, querying by plots. Querying/insert/update/annotate across three diverse databases that are described using XML • SRB – access distributed data, syntactic, semantics,user-defined (arbitrary relational) metadata based querying. Annotations for data. Opertions on data. Extraction of metadata. ingest,bulk ingest, delete,upate of data/metadata
EcoGrid Services • Query • Search metadata and data, return result sets with ID • Read • Retrieve data objects by ID • Authentication • Verify user identity • Authorization • Record allowable interactions • Write • Write data objects by ID • Replication • Mirror objects for backup and efficiency • Computation • Execute models and simulations from AMS on various nodes
EcoGrid Search Interactions • Features • Well-defined interfaces (e.g., WSDL) • Standardized messaging formats • Automated discovery of implementing services • Aggregation/Indexing across nodes for efficiency • Support heterogeneous data objects via metadata descriptions • Lightweight to implement for various systems like DiGIR and Metacat Registry 2. Find Query Nodes 1. Register Client QueryService QueryService 3. Search (recursive) QueryService QueryService QueryService QueryService
EcoGrid Index Interactions Registry 5. Find Index Nodes 1. Register Client 2. Find Query Nodes QueryService QueryService QueryService 3. Search (recursive) 6. Search QueryService QueryService 4. Read (recursive) QueryService IndexedQueryService
Authentication and Authorization • KNB uses simple LDAP system with referrals • Leverages existing DB (e.g. LTER personnel DB) • Not really scalable in terms of administration • Grid Security Infrastructure (GSI) • Certificate based authentication • Proxy certificates allows transfer of rights • De-centralized administration (I.e., multiple CA’s) • Can we easily transition to GSI?
Data Calculation EcoGrid Query EcoGrid Query EcoGrid Query EcoGrid User Validation Map Sample +A3 +A2 Layer Integration +A1 Native Range prediction workflow KNB Abundance Data (a1) Test sample (d) DiGIR Species presence & absence points (a2) Native range prediction map (f) Training sample (d) GARP rule set (e) Model quality parameter (g) Integrated layers (native range) (c) SRB Environmental layers (b) Archive Slide from D. Pennington
Implementation • Short-term • Define common WSDL services • Simple service registry • Wrappers for Metacat, DiGIR, SRB, Xanthoria, etc. • Medium-term • Use OGSI-compliant interfaces • (add methods to current WSDL) • Grid Registry service
Timing • April 4 • April 11 -- Design Diagrams • April 18 -- WSDL, Registry instance operational, query + read, RSIDS schema and examples. • April 25 • May 2 • May 9 Wrapper implementations + test client(s) • May 16 (SEEK Technical WG meeting) • May 23 • May 30 -- Hard deadline for implementation of Eco-GRID alpha 1
Query Messages <egq:query queryId="test.1.1" system="test" xmlns:egq="ecogrid://ecoinformatics.org/ecogrid-query-1.0.0alpha1"> <namespace prefix="eml" space="eml://ecoinformatics.org/eml-2.0.0"/> <title>Soils metadata query</title> <AND> <OR> <condition operator="LIKE" concept="eml:title">%soil%</condition> <condition operator="LIKE" concept="eml:title">%dirt%</condition> </OR> <OR> <condition operator="LIKE" concept="eml:surName">%Jones%</condition> <condition operator="LIKE" concept="eml:surName">%Vieglais%</condition> </OR> </AND> </egq:query>
Result responses <rs:resultset resultsetId="foo.1.1" system="http://knb.ecoinformatics.org/knb/" xmlns:rs='ecogrid://ecoinformatics.org/ecogrid-resultset-1.0.0alpha1'> <resultsetMetadata> <sendTime>2003-05-02T16:45:50-09:00</sendTime> <recordCount>86</recordCount> </resultsetMetadata> <records startRecord="1" endRecord="1" xmlns:eml='eml://ecoinformatics.org/eml-2.0.0'> <record number="1" identifier="bar.1.23"> <eml:eml packageId="bar.1.23"> <title>Soil data from West Valley, 1983</title> <creator> <individualName><surName>Jones</surName></individualName> </creator> <creator> <individualName><surName>Smith</surName></individualName> </creator> <keywordSet> <keyword>aves</keyword> <keyword>ornithology</keyword> <keyword>biodiversity</keyword> </keywordSet> </eml:eml> </record> </records> </rs:resultset>