1 / 52

e-Science and Grid The VL-e approach

e-Science and Grid The VL-e approach. L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam bob@science.uva.nl. Content. Developments in Grid Developments in e-Science Objectives Virtual Lab for e-Science

Download Presentation

e-Science and Grid The VL-e approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. e-Science and GridThe VL-e approach L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems GroupDepartment of Computer ScienceUniversiteit van Amsterdam bob@science.uva.nl

  2. Content • Developments in Grid • Developments in e-Science • Objectives • Virtual Lab for e-Science • Research philosophy • Conclusions

  3. Background ICTpush developments • Processing power doubles every 18 month • Memory size doubles every 12 month • Network speed doubles every 9 month • Something has to be done to harness this development • Virtualization of ICT resources • Internet • Web • Grid

  4. Internal versus external bandwidth Mbit/s Computer busses networks

  5. Web& Grid & Web/Grid Services • Web services is a paradigm/way of using/accessing information • Web resources are mostly human-centered (understood by humans, computers can read but can’t understand) • Grid is about accessing & sharing computing resources by virtualization • Data & Information repositories • Experimental facilities • OGSA: Service oriented Grid standard based on Web services

  6. Web & Grid & Semantic Web/Grid • Semantic Web resources should also be understandable by computers • This way, many complex tasks formulated and assigned by humans can be automated and executed by agents working on the Semantic Web • But, both Grid and Web Services only focus on single services. • Semantic Web/Grid should be able to describe a single service as well as the relationship between services e.g. aggregate service using a number of services so making knowledge explicit

  7. Levels of Grid abstraction Knowledge Web/Grid Information Web/Grid Data Grid Computational Grid

  8. Background informationexperimental sciences • There is a tendency to look ever deeper in: • Matter e.g. Physics • Universe e.g. Astronomy • Life e.g. Life sciences • Therefore experiments become increasingly more complex • Instrumental consequences are increase in detector: • Resolution & sensitivity • Automation & robotization • Results for instance in life science in: • So called high throughput methods • Omics experimentation • genome ===> genomics

  9. DNA Genomics RNA Transcriptomics protein Proteomics metabolites Metabolomics New technologies in Life Sciences research Methodology/ Technology cell University of Amsterdam

  10. Paradigm shift in Life science • Past experiments where hypothesis driven • Evaluate hypothesis • Complement existing knowledge • Present experiments are data driven • Discover knowledge from large amounts of data • Apply statistical techniques

  11. Background informationexperimental sciences • Experiments become increasingly more complex • Driven by detector developments • Resolution increases • Automation & robotization increases • Results in an increase in amount and complexity of data

  12. The Application data crisis • Scientific experiments start to generate lots of data • medical imaging (fMRI): ~ 1 GByte per measurement (day) • Bio-informatics queries: 500 GByte per database • Satellite world imagery: ~ 5 TByte/year • Current particle physics: 1 PByte per year • LHC physics (2007): 10-30 PByte per year • Data is often very distributed

  13. Background informationexperimental sciences • Experiments become increasingly more complex • Driven by detector developments • Resolution increases • Automation & robotization increases • Results in an increase in amount and complexity of data • Something has to be done to harness this development • Virtualization of experimental resources: e-Science

  14. The what of e-Science • e-Science is the application domain “Science” of Grid & Web • More thanonly coping with data explosion • A multi-disciplinary activity combining human expertise & knowledge between: • A particular domain scientist • ICT scientist • e-Science demands a different approach to experimentation becausecomputer is integrated part of experiment • Consequence is a radical change in design for experimentation • e-Science should apply and integrate Web/Grid methods where and whenever possible

  15. GT1 GT2 OGSI Started far apart in apps & tech Have been converging WSRF WSDL 2, WSDM WSDL, WS-* HTTP Grid and Web ServicesConvergence Grid Web Definition of Web Service Resource Framework(WSRF) makes explicit distinction between “service” and stateful entities acting upon service i.e. the resources Means that Grid and Web communities can move forward on a common base!!! Ref: Foster

  16. e-Science Objectives • It should enhance the scientific process by: • Stimulating collaboration by sharing data & information • Result is re-use of data & information

  17. The data sharing potential for Cognition • Collaborative scientific research • Information sharing • Metadata modeling • Allows for experiment validation • Independent confirmation of results • Statistical methodologies • Access to large collections of data and metadata • Training • Train the next generation using peer reviewed publications and the associated data

  18. Electron tomography data pipeline Acquisition Alignment ReconstructionSegmentation Interpretation

  19. Cell Centred Database • NCIMR (San Diego) • Maryann Martone and Mark Ellisman

  20. e-Science Objectives • It should enhance the scientific process by: • Stimulating collaborationby sharing data & information • Improve re-use of data & information • Combing data and information from different modalities • Sensor data & information fusion

  21. e-Science Objectives • It should enhance the scientific process by: • Stimulating collaborationby sharing data & information • Improve re-use of data & information • Combing data and information from different modalities • Sensor data & information fusion • Realize the combination of real life & (model based) simulation experiments

  22. Grid resources Simulated Vascular Reconstructionin a Virtual Operating Theatre • An example of the combination of real life & • (model based) simulation experiments • patient specific vascular geometry (from CT) • Segmentation • blood flow simulation (Latice Bolzmann) • Pre-operative planning (interaction) • Suitable for parallelization throughfunctional decomposition Patient’s vascular geometry(CTA) Simulated “Fem-Fem” bypass

  23. e-Science Objectives • It should enhance the scientific process by: • Stimulating collaborationby sharing data & information • Improve re-use of data & information • Combing data and information from different modalities • Sensor data & information fusion • Realize the combination of real life & (model based) simulation experiments • Modeling of dynamic systems

  24. Bird behaviour in relation to weather and landscape RADAR Calibration and Data assimilation Predictions and on-line warnings Bird distributions Ensembles Dynamic bird behaviour MODELS

  25. e-Science Objectives • It should result in : • Computer aided support for rapid prototyping of ideas • Stimulate the creativity process • It should realize that by creating & applying: • New ICT methodologies and a computing infrastructure stimulating this • From this ICT point of view it should support the following application steps: • Design • Development & realization • Execution • Analysis & interpretation • We try to realize e-Science and their applications via the Virtual Lab for e-Science (VL-e) project

  26. Virtual Lab for e-Science research Philosophy • Multidisciplinary research & the development of related ICT infrastructure • Generic application support • Application cases are drivers for computer & computational science and engineering research

  27. Grid Services Harness multi-domain distributed resources VL-e project Data Insive Science/ LOFAR Bio- Informatics Medical Diagnosis & Imaging Data intensive sciences Bio- Diversity Food Informatics Dutch Telescience VL-e Application Oriented Services Management of comm. & computing

  28. Two sides of Bioinformaticsas an e-Science • The scientific responsibility to develop the underlying computational concepts and models to convert complex biological data into useful biological and chemical knowledge • Technological responsibility to manage and integrate huge amounts of heterogeneous data sources from high throughput experimentation

  29. DNA Genomics Data generation/validation Data integration/fusion Data usage/user interfacing RNA Transcriptomics protein Proteomics metabolites Metabolomics Integrative/System Biology Role of bioinformatics bioinformatics methodology cell

  30. Virtual Lab for e-Science research Philosophy • Multidisciplinary research and development of related ICT infrastructure • Generic application support • Application cases are drivers for computer & computational science and engineering research • Problem solving partly generic and partly specific • Re-use of components via generic solutions whenever possible

  31. Application pull Grid/ Web Services Harness multi-domain distributed resources Application Specific Part Application Specific Part Application Specific Part Potential Generic part Potential Generic part Potential Generic part Management of comm. & computing Virtual Laboratory Application Oriented Services Management of comm. & computing Management of comm. & computing

  32. Generic e-Science aspects • Virtual Reality Visualization & user interfaces • Modeling & Simulation • Interactive Problem Solving • Data & information management • Data modeling • dynamic work flow management • Content (knowledge) management • Semantic aspects • Meta data modeling • Ontologies • Wrapper technology • Design for Experimentation

  33. Virtual Lab for e-Science research Philosophy • Multidisciplinary research and development of related ICT infrastructure • Generic application support • Application cases are drivers for computer & computational science and engineering research • Problem solving partly generic and partly specific • Re-use of components via generic solutions whenever possible • Rationalization of experimental process • Reproducible & comparable

  34. parameters/settings, algorithms, intermediate results, … software packages, algorithms … Parameter settings, Calibrations, Protocols … raw data processed data presentation acquisition processing sensors,amplifiers imaging devices,, … conversion, filtering,analyses, simulation, … visualization, animationinteractive exploration, … Rationalization of the experiment and processes via protocols Metadata Issues for a reproducible scientific experiment experiment interpretation Much of this is lost when an experiment is completed.

  35. Step1: designing an experiment Step2: performing the experiment Step3: analyzing the experiment results success Scientific experiments & e-Science • For complex experiments: • contain complex processes • require interdisciplinary expertise • need large scale resource Grid & high level support

  36. Components in a VL-e experiment • Process-Flow Templates(PFT) • Derived from ontologies • Graphical representation of data elements and processing steps in an experimental procedure • Information to support context-sensitive assistance (semantics) • Study • Descriptions of experimental steps represented as aninstance of a PFT with references to experiment topologies Experiment Topology • Graphical representation of self-contained data • processing modules attached to each other in a workflow

  37. e-Science environment Step2: performing the experiment Step3: analyzing the experiment results success Step1: designing an experiment Scientific Workflow Management Systems Taverna, Kepler, and Triana: model processes for computing tasks. Process Flow Template Process Flow Template PFT in VLAMG PFT instances in VLAMG • In VL-e: model both computing tasks and human activity based processes, and model them from the perspective of an entire lifecycle. It tries to support • Collaboration in different stages • Information sharing • Reuse of experiment

  38. Definition of experiment protocols Workflow definitions Recreate complex experiments into process flows Workflow execution Maintain control over the experiment Data processing execution Topologies of data processing modules Interpretation Visualization of processed results to help intuition Ontology definitions can help in obtaining a well-structured definition of experiment, data and metadata.

  39. Virtual Lab for e-Science research Philosophy • Multidisciplinary research and development of related ICT infrastructure • Generic application support • Application cases are drivers for computer & computational science and engineering research • Problem solving partlygeneric and partly specific • Re-use of components via generic solutions whenever possible • Rationalization of experimental process • Reproducible & comparable • Two research experimentation environments • Proof of concept for application experimentation • Rapid prototyping for computer & computational science experimentation

  40. The VL-e infrastructure Application specific service Medical Application Telescience Bio Informatics Applications Application Potential Generic service & Virtual Lab. services Virtual Lab. rapid prototyping (interactive simulation) Test & Cert. VL-software Virtual Laboratory Additional Grid Services (OGSA services) Test & Cert. Grid Middleware Grid Middleware Grid & Network Services Network Service (lambda networking) Test & Cert. Compatibility Surfnet VL-e Experimental Environment VL-e Certification Environment VL-e Proof of Concept Environment

  41. Infrastructure for Applications • Applications are a driving force of the PoC • Experience shows applications value stability • Foster two-way interaction to make this happen

  42. e-Science environment Step2: performing the experiment Step3: analyzing the experiment results success Step1: designing an experiment Research activity to be developed in the Rapid Prototyping environment Stable developments to be used in VL-e in the Proof of Concept environment Research activity to be developed in the Rapid Prototyping environment Scientific Workflow Management Systems PFT PFT Taverna, Kepler, Triana and VLAM. SWMS

  43. VL-e PoC environment • Latest certified stable software environment of core grid and VL-e services • Core infrastructure built around clusters and storage at SARA and NIKHEF (‘production’ quality) • Controlled extension to other platforms and distributions • On the user end: install needed servers: user interface systems, storage elements for data disclosure, grid-secured DB access • Focus on stability and scalability

  44. Hosted services for VL-e • Key services and resources are offered centrally for all applications in VL-e • Mass data and number crunching on the large resources at SARA • Storage for data replication & distribution • Persistent ‘strategic’ storage on tape • Resource brokers, resource discovery, user group management

  45. Why such a complex scheme? • “software is part of the infrastructure” • stability of core software needed to develop the new scientific applications • enable distributed systems management (who runs what version when?) “the grid is one big error amplifier” “computers make mistakes like humans, only much, much faster”

  46. What did we learn • It is not enough to just transport current applications to Grid or e-Science infrastructures • To fully exploit the potential of e-Science infrastructures one has to learn what is possible • Therefore the full lifecycle of an experiment has to be taken into account • Workflow management is a first step • We add semantic information via Process Flow Templates • Application innovation such as for instance bio-banking should be the aim • Grids should be transparent for the end-user

More Related