60 likes | 157 Views
Applications and Requirements for Scientific Workflow. May 1 2006 NSF Geoffrey Fox Indiana University. Team Members. Geoffrey Fox, Indiana University (lead) Mark Ellisman, UCSD Constantinos Evangelinos, Massachusetts Institute of Technology Alexander Gray, Georgia Tech
E N D
Applications and Requirementsfor Scientific Workflow May 1 2006NSF Geoffrey Fox Indiana University
Team Members • Geoffrey Fox, Indiana University (lead) • Mark Ellisman, UCSD • Constantinos Evangelinos, Massachusetts Institute of Technology • Alexander Gray, Georgia Tech • Walt Scacchi, University of California, Irvine • Ashish Sharma, Ohio State University • Alex Szalay, John Hopkins University
The Application Drivers • Workflow is underlying support for new science model • Distributed interdisciplinary data deluged scientific methodology as an end (instrument, conjecture) to end (paper, Nobel prize) process is a transformative approach • Provide CS support for this scientific revolution • This emerging model for science • Spans all NSF directorates • Astronomy (multi-wavelength VO), Biology (Genomics/Proteomics), Chemistry (Drug Discovery), Environmental Science (multi-sensor monitors as in NEON), Engineering (NEES, multi-disciplinary design), Geoscience (Ocean/Weather/Earth(quake) data assimilation), Medicine (multi-modal/instrument imaging), Physics (LHC, Material design), Social science (Critical Infrastructure simulations for DHS) etc.
What has changed? • Exponential growth in Compute(18), Sensors(18?), Data storage(12), Network(8) (doubling time in months); performance variable in practice (e.g. last mile for networks) • Data deluge (ignored largely in grand challenges, HPCC 1990-2000) • Algorithms (simulation, data analysis) comparable additional improvements • Science is becoming intrinsically interdisciplinary • Distributed scientists and distributed shared data (not uniform in all fields) • Establishes distributed data deluged scientific methodology • We recommend computer science workflow research to enable transformative interdisciplinary science to fully realize this promise
Application Requirements I • Reproducibility core to scientific method and requires rich provenance, interoperable persistent repositories with linkage of open data and publication as well as distributed simulations, data analysis and new algorithms. • Distributed Science Methodology publishes all steps in a new electronic logbook capturing scientific process (data analysis) as a rich cloud of resources including emails, PPT, Wikis as well as databases, compiler options, build time/runtime configuration… • Need to separate wheat from chaff in implicit electronic record (logbook) keeping only that required to make process reproducible; need to be able to electronically reference steps in process; • Traditional workflow including BPEL/Kepler/Pegasus/Taverna only describes a part of this • Abstract model of logbook becomes a high level executablemeta-workflow • Multiple collaborativeheterogeneous interdisciplinary approaches to all aspects of the distributed science methodology inevitable; need research on integration of this diversity • Need to maximize innovation (diversity) preserving reproducibility
Application Requirements II • Interdisciplinary science requires that we federate ontologies and metadata standards coping with their inevitable inconsistencies and even absence • Support for curation, data validation and “scrubbing” in algorithms and provenance; • QoS; reputation and trust systems for data providers • Multiple “ibilities” (security, reliability, usability, scalability) • As we scale size and richness of data and algorithms, need a scalable methodology that hides complexity (compatible with number of scientists increasing slowly); must be simple and validatable • Automate efficient provisioning, deployment and provenance generation of complex simulations and data analysis; support deployment and interoperable specification of user’s abstract workflow; support interactive user • Support automated and innovative individual contributions to core “black boxes” (produced by “marine corps” for “common case”) and for general user’s actions such as choice and annotation