90 likes | 106 Views
This presentation explores data management infrastructure, service-oriented data grids, data integration capabilities, and data analysis infrastructure for GTL. It also discusses model management and knowledge representation technologies, computational facilities, and leveraging related efforts. The presentation concludes with a discussion on abstraction and elaboration mechanisms.
E N D
Data R&D Issues for GTL Bertram Ludäscher ludaesch@sdsc.edu Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego
Data R&D Issues for GTL • GTL data management infrastructure • Service-oriented Data Grids for • Seamless data sharing (volume, distribution, access restrictions, …) • Capabilities for data integration (mediators/warehouses), digital library functions, knowledge-based (“semantic”) extensions (e.g. ontologies), and archival capabilities • Data analysis and knowledge-enabling infrastructure • Analytical Pipelines (“Scientific Workflows”) • Rapid design and prototyping, handling of complex data & task semantics, large volume, sci. workflow as a first-class product, validation, execution, monitoring, sharing, archiving • How to go from a scientist’s abstract (conceptual) workflow to a data grid execution plan? • New Model Management and Knowledge Representation Technologies : • Closing the gap between data management (DBMS’s, data grids) and knowledge-based systems (desktop-oriented, rule-based systems) and analysis and modeling systems • Mapping between numerous formalisms at the syntactic, structural, and semantic level (terminological, process-semantics, …) • “Gluing” together models and formalisms across different levels: from genes to proteins to molecular machines to microbial communities…(compare: pnp transistors, boolean circuits, assembly language, high-level PLs , declarative QLs, … ) abstraction & elaboration mechanisms Data exploration and hypothesis generation tools (KNOW-ME, SKIDL, SEEK AMS, …) • Computational facilities • Use of high-end networked facilities a la TeraGrid • Opportunities (and challenges!) in leveraging related efforts: • NIH BIRN, …, NSF Cyberinfrastructure (ITRs GEON, GriPhyN, SCEC, SEEK, …), UK e-Science, … • Standardization (OGSA, KR/Semantic Web technologies, e.g., ontology languages (OWL), inference mechanisms, …), scientific workflow standards, … interoperable, open source tools • One size/standards fits all? Probably not: data-intensive vs computation-intensive vs “semantics-intensive” (capturing implicit domain knowledge, hidden assumptions, …)
Up & Down: Abstraction & Elaboration Mechanisms • How to punch through the technology barriers? • Data Grids • vs Digital Libraries • vs DBMS’s • vs Knowledge-Based Analysis & Modeling Systems Knowledge Mgmt Information Mgmt Data Management
Biomedical Informatics Research Network http://nbirn.net Getting Formal: Source Contextualization & Ontology Refinement in Logic
GeoSciences Network domain knowledge ? Information Integration Knowledge Representation: ontologies, concept spaces Database mediation Data modeling raw data Scientific Data Integration... Questions to Queries ... What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? “Complex Multiple-Worlds” Mediation GeoPhysical (gravity contours) Geologic Map (Virginia) GeoChronologic (Concordia) Foliation Map (structure DB) GeoChemical
domain knowledge Knowledge representation AGE ONTOLOGY +/- Energy Nevada GEON Metamorphism Equation: Geoscientists + Computer Scientists Igneous Geoinformaticists Geologic Map Integration: Geo & IT/CS meet +/- a few hundred million years
AM: Analysis and Modeling System Execution Environment EcoGrid providesunified access toDistributed Data Stores , Parameter Ontologies, & Stored Analyses, and runtime capabilities via theExecution Environment Semantic Mediation System & Analysis and Modeling Systemuse EcoGrid web services, enabling analytically driven data discovery and integration SEEK is the combination of EcoGrid data resources and information services, coupled with advanced semantic and modeling capabilities SAS, MATLAB,FORTRAN, etc Example of “AP0” Analytical Pipeline (AP) TS2 ASy TS1 ASx ASz ASr W S D L etc. Parameters w/ Semantics Data Binding ASr SMS: SemanticMediation System Semantic Mediation Engine AP0 j¬y j¬ a Invasive speciesover time Logic Rules ECO2-CL Query Processing Library of Analysis Steps,Pipelines & Results WSDL WSDL C C Raw data setswrappedfor integrationw/ EML, etc. Dar C C ECO2 MC EML C C ParameterOntologies Wrp KNB SRB Species ... ECO2 TaxOn SEEK Project Overview • Large collaborative NSF/ITR project: UNM, UCSB, UCSD (SDSC), UKansas,.. • “Analysis & Modeling System” to design, execute, reproduce/refine scientific workflows in the ecology and biodiversity domains.