230 likes | 332 Views
Supporting Science Through Workflows: Infrastructure, Architecture and Modeling. David Woollard NASA Jet Propulsion Laboratory University of Southern California. Agenda. Motivation Classification of in silico Experimentation Research Problem Related Work Introduction to Workflow Systems
E N D
Supporting Science Through Workflows: Infrastructure, Architecture and Modeling David Woollard NASA Jet Propulsion Laboratory University of Southern California
Agenda • Motivation • Classification of in silico Experimentation • Research Problem • Related Work • Introduction to Workflow Systems • Research Goals • Methodology • Refactoring existing software • Domain Specific Software Architecture • Evaluation • Conclusions & Future Work D.M. Woollard. Supporting Science Through Workflows.
Motivation • The nature of scientific investigations has changed. • Two major trend lines: • Simulation via computer has for many replaced in vivo and in vitro science. • Collaborations are growing (system of systems science). • New discoveries in materials science, chemistry, physics, planetary science, and even social sciences are made via in silico experimentation. D.M. Woollard. Supporting Science Through Workflows.
Theory Development Execution Practice Discovery Distribution Production in silico Experimentation • Discovery is a phase is which a scientist rapidly prototypes, tests hypotheses, and develops a methodology Lone Researcher [Kepner 03] D.M. Woollard. Supporting Science Through Workflows.
in silico Experimentation • Production is the engineeringof replicating an experiment on large volumes of data. We will focus on Production Systems in this talk. Discovery Distribution Production D.M. Woollard. Supporting Science Through Workflows.
in silico Experimentation • Distribution is a phase in which data is dispersed to peers for review and further experimentation including:PapersFederated DataDigital Libraries Discovery Distribution Production D.M. Woollard. Supporting Science Through Workflows.
The Role of Technology • In silico science, especially system of systems science, is facilitated by the Grid. “The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource- brokering strategies emerging in industry, science, and engineering.” The Anatomy of the Grid (2001) D.M. Woollard. Supporting Science Through Workflows.
Research Problem • Scientists harness complex hardware and software systems in order to conduct scientific research in silico. • Once algorithms and processes are established, production systems are created to produce large volumes of data. • Designing a production system is a complex engineering task as well as a complex scientific task. Meeting these production requirements causes scientists to engineer a production system or a software engineer to rewrite scientific code. This is both inefficient and costly. D.M. Woollard. Supporting Science Through Workflows.
T1 T2 T3 T4 T0 Production Systems Workflows Grid Systems Introduction to Workflows Grid Systems have traditionally focused on creating Virtual Organizations. In Grids, workflows orchestrate processing tasks in production systems. Workflows are a processing model that incorporate actors, tasks, data, and rules. Workflow management systems execute tasks on data once the task’s dependencies are satisfied based on rules. D.M. Woollard. Supporting Science Through Workflows.
Workflow System Model D.M. Woollard. Supporting Science Through Workflows.
Workflows Workflows Everywhere Karajan Wings ICENI Askalon Gridbus VDS Taverna Unicore Condor-G GrADS Grid Workflow Pegasus Kepler Yawl Triana DAG-Man GridAnt GridFlow SciFlow OODT D.M. Woollard. Supporting Science Through Workflows.
Bottom-up Taxonomy • Yu & Buyya presented a taxonomy [Yu & Buyya 05] • Based on workflow properties like model representation and scheduling policy • Illustration of divergence in the field No taxonomy by interface to task code. D.M. Woollard. Supporting Science Through Workflows.
Insights from an Architect • Each production workflow task is a complex software application with two primary stakeholders: the scientist and the engineer. • Software architectures are a system’s blueprint–its form, elements, and rationale [Perry & Wolf, 92]. • An architecture provides appropriate views for each stakeholder in addition to encapsulation of computation and communication. These are the architecture’s components, connectors and topology. • Reification of architectural elements in code is a method of bridging the gap between design and implementation. First-class connectors and explicit interfaces are such reifications. D.M. Woollard. Supporting Science Through Workflows.
Research Goals • Develop a Domain Specific Software Architecture (DSSA) for tasks in scientific workflows. • Develop a methodology for refactoring existing scientific code into this DSSA. • Minimize overhead (computation time and memory footprint). • Maximize science code reuse. D.M. Woollard. Supporting Science Through Workflows.
Agenda • Motivation • Classification of in silico Experimentation • Research Problem • Related Work • Introduction to Workflow Systems • Research Goals • Methodology • Refactoring existing software • Domain Specific Software Architecture • Evaluation • Conclusions & Future Work D.M. Woollard. Supporting Science Through Workflows.
Decomposing Software • Decomposition, the first step in the approach, is a process in which scientific modules are identified and control flow determined. • Scientific modules are like functions - they have internal scope and a single entry and exit point. In graph theoretic terms, the call dominancy tree for the basic blocks in the module only have one source and one sink. • The proper level of decomposition is dependant on both scientific functionality and engineering requirements. Therefore, it should be “tunable.” Decomposition Deployment Architecting D.M. Woollard. Supporting Science Through Workflows.
“Injecting” Architecture • In the second part of the approach, these modules must be “architected” into a workflow task with connectors to services at appropriate levels (to satisfy production requirements). • We use Prism-MW wrappers to encapsulate and componentized these decomposed modules. This provides us with a standard interface and utilities at the module level for employing event-based communication. • We use the Exogenous Connector style [Lau et. al.] to mimic the original control and data flow in the workflow task and augment these connectors with a specialized version of the invoking connector. Decomposition Deployment Architecting D.M. Woollard. Supporting Science Through Workflows.
Deploying to the Grid • Deployment is the last step in our approach. • We currently deploy the resulting workflow component into the OODT Science Data System environment. This is a grid workflow management system used at JPL. • We should note that this choice is purely for the sake of developer convenience, the approach such be deployable to any target workflow management system. Decomposition Deployment Architecting D.M. Woollard. Supporting Science Through Workflows.
SWSA Architecture Scientific Workflow Software Architecture (SWSA), a domain specific software architecture for workflow tasks. D.M. Woollard. Supporting Science Through Workflows.
Preliminary Evaluation • We chose a canonical scientific application (matrix multiplication) implemented in both Fortran and C • Six different metrics were taken: • Execution time for: • Base application • Wrapper (no data exchanged) • Wrapper (data exchanged) • Memory Footprint • Base application • Wrapper (no data exchanged) • Wrapper (data exchanged) D.M. Woollard. Supporting Science Through Workflows.
Preliminary Evaluation Refactoring Methodology Example: Molecular Dynamics Simulation Performance results are very promising: Time Overhead: 1.85% Code Reuse: 96.77% D.M. Woollard. Supporting Science Through Workflows.
Conclusions & Future Work • Scientific Workflow Software Architecture (SWSA) improves upon existing workflow systems by providing: • A methodology for accessing services. • A separation of concerns between scientific algorithms and production features of code. • A clean separation of roles between the scientist and the engineer. • Satisfies the “cult of performance.” • Future Work • Extended evaluation on more advanced simulation codes. • Expansion of the the architecture to support parallel codes. D.M. Woollard. Supporting Science Through Workflows.
Thank You • For more information, please see: • D. Woollard, N. Medvidovic, Y. Gil, and C. Mattmann. “Scientific Software as Workflows: From • Discovery to Distribution.” To appear in IEEE Software Special Issue on Developing Scientific • Software, 2008. • D. Woollard, D. Freeborn, E. Kay-Im, S. LaVoie. “Case Studies in Science Data Systems: • Meeting Software Challenges in Competitive Environments.” To appear in Proceedings of the • 10th International Conference on Space Operations (SpaceOps-2008), AIAA press, Heidelberg, • Germany, May 2008. • D. Woollard. “Supporting Scientific Workflows Through First-Class Connectors.” Qualifying • Examination Report. University of Southern California. May, 2007. • D. Woollard, C. Mattmann, and N. Medvidovic "Injecting Software Architectural Constraints • into Legacy Scientific Applications." USC Center for Software Engineering Technical Report, • USC-CSE-2007-701, January 2007. Portions of this research were conducted at the Jet Propulsion Laboratory managed by the California Institute of Technology under a contract with the National Aeronautics and Space Administration. D.M. Woollard. Supporting Science Through Workflows.