150 likes | 317 Views
Kepler Scientific Workflows : Current and Future Development. Ilkay ALTINTAS Lab Director, Sc ientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD. Scientific Workflow Systems. Combination of data integration, analysis, and visualization steps
E N D
Kepler Scientific Workflows:Current and Future Development Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD
Scientific Workflow Systems • Combination of • data integration, analysis, and visualization steps • automated"scientific process” • Mission of scientific workflow systems • Promote “scientific discovery” by providing tools and methods to generate scientific workflows • Create an extensible and customizable graphical user interface for scientists from different scientific domains • Support computational experiment creation, execution, sharing, reuse and provenance • Design frameworks which define efficient ways to connect to the existing data and integrate heterogeneous data from multiple resources
Ptolemy II: A laboratory for investigating design KEPLER: A problem-solving environment for Scientific Workflow KEPLER = “Ptolemy II + X” for Scientific Workflows Kepler is a Scientific Workflow System www.kepler-project.org • … and a cross-project collaboration • 3rd Beta release (Jan 8, 2007) • Builds upon the open-source Ptolemy II framework
Ecology SEEK: Ecological Niche Modeling and climate change REAP: Modeling parasite invasions in grasslands using sensor networks NEON: Ecological sensor networks COMET: Environmental science Geosciences GEON: LiDAR data processing, Geological data integration NEESit: Earthquake engineering Molecular biology SDM: Gene promoter identification and ScalaBLAST ChIP-chip: Genome-scale research CAMERA: Metagenomics Oceanography REAP: SST data processing LOOKING/OOI CI: ocean observing CI ROADNet: real-time data modeling and analysis Ocean Life project Phylogenetics ATOL: Processing Phylodata CiPRES: Phylogentic tools Chemistry Resurgence: Computational chemistry DART/ARCHER: X-Ray crystallography Library science DIGARCH: Digital preservation UK Text Mining Center: Cheshire feature and archival Conservation biology SanParks: Thresholds of Potential Concerns Physics SDM: astrophysics TSI-1 and TSI-2 CPES: Plasma fusion simulation ITER-EU: ITM fusion workflows Kepler use cases represent many science domains!
Some of the current R&D Distributed execution of workflow parts (peer to peer) Efficient data transfer Provenance tracking of data and processes Tracking workflow evolution Streaming data analysis Easy-to-deploy batch interfaces Intuitive workflow design Customizable semantic typing Interoperability with other workflow and analytical environments (at exec level) Production workflow examples: GEON LiDAR workflow (GLW) 116 registered, 106 active users 2076 submitted jobs to date Center for Plasma Edge Simulation Code-Coupling Workflow (CPES-CCW) 2000 actors, 5 levels of model hierarchy Longest run duration 3 hours PtII AirForce Lab Model 12920 actors, 65331 attributes Longest run duration: 10 minutes Longest running real-time simple monitoring model in PtII - months at a time All generated using the GUI and executed in batch mode… No coding and text manipulation Kepler today is a research prototype and a production workflow tool!
REAP: Realtime Environment for Analytical Processing reap.ecoinformatics.org • Management and Analysis of Observatory Data using Kepler Scientific Workflows • The vision: • An integrated environment for analyzing data from observatories • Funded 2006-2009 • NSF CEO:P • Jones(PI), Altintas, Baru, Ludaescher, Schildhauer • Partners: • NCEAS/UCSB (Lead), SDSC/UCSD, UCDavis, CENS/UCLA, OpenDAP, OSU • Two scientific use cases: • Terrestrial ecology • Oceanography
REAP Views • For scientists • capabilities for designing and executing complex analytical models over near real-time and archived data sources • For data-grid engineers • monitoring and management capabilities of underlying sensor networks • For outside users • access to observatory data and results of models, approachable to non-scientists.
REAP: Terrestrial Ecology Usecase Workflows to develop and test models exploring the impacts of abiotic factors (real-time light, temperature, and rainfall measurements) on the dynamics of plant host populations and their susceptibility to viral pathogens.
REAP: RBNB Streaming Data Actor Example data from Terrestrial UseCase Hardware: a Campbell Scientific CR800 datalogger with eight attached sensors, operating on a workbench.
REAP: Oceanographic Usecase Facilitate the quantitative evaluation of SST data sets.
Kepler/C.O.R.E kepler-project.org • SDCI NMI Improvement: Development of Kepler/CORE – A Comprehensive, Open, Reliable, and Extensible Scientific Workflow Infrastructure • The vision: • Coordinate development of a comprehensive, open, reliable and extensible Kepler scientific workflow infrastructure • Funded 2007-2010 • NSF SDCI • Ludaescher(PI), Altintas, Bowers, Jones, Mc Phillips, Schildhauer • Partners: • Genome Center/UCDavis (Lead), SDSC/UCSD, NCEAS/UCSB Builds on community participation as a driving force for Kepler.
Kepler/C.O.R.E. • Comprehensive • First-class support for technical features • Open • well designed and clearly articulated mechanisms and interfaces provided to facilitate developing extensions • Reliable • Both as a development platform and as a run-time environment for the user • Extensible • Independently extensible by groups not directly collaborating with the team
Directors in Kepler • Means to execute networks of components under multiple execution models • Dataflow (SDF, PN, DDF) vs. time-based (CT) vs. event-based (DE) vs. all combined • Makes use of separation of concerns principle • e.g., component execution, workflow execution and provenance tracking • The manager acts like a “common execution environment” • governing different concerns related to execution of the network and services Ptolemy and Kepler are unique in combining different execution models in heterogeneous models! Dataflow Time Triggered Synchronous/reactive model Discrete Event Wireless Process Networks Rendezvous Publish and Subscribe Continuous Time Finite State Machines
Credits • Kepler community and colleagues • On REAP and Kepler/CORE: • Shawn Bowers, Bertram Ludaescher, Timothy Mc Phillips, Genome Center, UCD • Matt Jones, Derik Barseghian, Mark Schildhauer, NCEAS, UCSB • Eric Seabloom, OSU • Peter Cornillion, OpenDAP
Questions… Ilkay Altintas altintas@sdsc.edu +1 (858) 822-5453 http://www.sdsc.edu