1 / 36

CSCI 578 Course Project: Rearchitecting Scientific Code March 31, 2009

CSCI 578 Course Project: Rearchitecting Scientific Code March 31, 2009. David Woollard. Agenda For Today. Introduction Scientific Computing Domain Workflows & Grid Computing Research Challenge Project Goals Project Details Extracting Kernels Executable Creation Workflow Specification

len
Download Presentation

CSCI 578 Course Project: Rearchitecting Scientific Code March 31, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCI 578 Course Project: Rearchitecting Scientific Code March 31, 2009 David Woollard

  2. Agenda For Today • Introduction • Scientific Computing Domain • Workflows & Grid Computing • Research Challenge • Project Goals • Project Details • Extracting Kernels • Executable Creation • Workflow Specification • SWSA – A Domain Specific Software Architecture • Timeline CSCI 578 Course Project - March 31, 2009.

  3. Computational Sciences Today Real science is now being conducted via computational experimentation – or “in silico” computing CSCI 578 Course Project - March 31, 2009.

  4. Theory Development Execution Practice Discovery Distribution Production “in silico” Computing • Discovery is a phase is which a scientist rapidly prototypes, tests hypotheses, and develops a methodology Lone Researcher [Kepner 03] CSCI 578 Course Project - March 31, 2009.

  5. “in silico” Computing • Production is the engineeringof replicating an experiment on large volumes of data. We will focus on Production Systems in this research. Discovery Distribution Production CSCI 578 Course Project - March 31, 2009.

  6. “in silico” Computing • Distribution is a phase in which data is dispersed to peers for review and further experimentation including:PapersFederated DataDigital Libraries Discovery Distribution Production CSCI 578 Course Project - March 31, 2009.

  7. Validating Computational Science • Computational science, like all science, requires validation • Validation comes in two forms: • Scaling (in data and computation) • Independent replication • Both forms require significant computational resources • Grid is a promising resource CSCI 578 Course Project - March 31, 2009.

  8. Vision of the Grid Like the power grid, the computational Grid should scale to the demands of individual users. CSCI 578 Course Project - March 31, 2009.

  9. Workflow-Based Specification • Workflows orchestrate processes on the Grid • Workflows are a processing model that incorporate tasks, data, and rules. • Workflow management systems execute tasks on the Grid using data once the task’s dependencies are satisfied based on rules. Task 1 Task 2 Task 3 Task 5 Task 4 CSCI 578 Course Project - March 31, 2009.

  10. A Plethora of Workflow Languages • Yu & Buyya presented a taxonomy [Yu & Buyya 05] • Based on workflow properties like model representation and scheduling policy • Illustration of divergence in the field • Considered a Grand Challenge [Gil, et al. 07] CSCI 578 Course Project - March 31, 2009.

  11. Scaling the Experiment Other Institutions @Home Laboratory Institution Co-laboratory Task 1 Task 2 Task 3 Task 5 CSCI 578 Course Project - March 31, 2009. Task 4

  12. Independent Replication Collaborator 3rd Party Task 1 Task 2 Task 3 Task 5 CSCI 578 Course Project - March 31, 2009. Task 4

  13. Heterogeneous Environments Laboratory Institution Co-laboratory Task 1 Task 2 Task 3 Task 5 Task 1 Task 1 Task 2 Task 2 Task 3 Task 3 Task 5 Task 5 Task 4 Task 4 Task 4 Workflow Engine 1 Workflow Engine 1 Workflow Engine 2 Grid Infrastructure 1 Grid Infrastructure 2 Grid Infrastructure 2 Collaborator 3rd Party CSCI 578 Course Project - March 31, 2009.

  14. Research Challenge • Scientific validation requires: • Scaling • Replication • Existing technologies exhibit three challenges: • Require scientists to become engineers or vice versa • Existing workflow specifications entwine scientific and engineering concerns • Existing workflow specifications are not portable CSCI 578 Course Project - March 31, 2009.

  15. Agenda • Introduction • Scientific Computing Domain • Workflows & Grid Computing • Research Challenge • Project Goals • Project Details • Extracting Kernels • Executable Creation • Workflow Specification • SWSA – A Domain Specific Software Architecture • Timeline CSCI 578 Course Project - March 31, 2009.

  16. Project Goals CSCI 578 Course Project - March 31, 2009.

  17. Agenda • Introduction • Scientific Computing Domain • Workflows & Grid Computing • Research Challenge • Project Goals • Project Details • Extracting Kernels • Executable Creation • Workflow Specification • SWSA – A Domain Specific Software Architecture • Timeline CSCI 578 Course Project - March 31, 2009.

  18. Scientific Software Packages (1/3) CSCI 578 Course Project - March 31, 2009.

  19. Scientific Software Packages (2/3) CSCI 578 Course Project - March 31, 2009.

  20. Scientific Software Packages (3/3) CSCI 578 Course Project - March 31, 2009.

  21. What is Kernel Decomposition? • Decomposition is a process in which scientific kernels are identified and control/data flow is determined. • Scientific kernels are like functions - they have internal scope and a single entry and exit point. In graph theoretic terms, the call dominancy tree for the basic blocks in the module only have one source and one sink. • Scientific kernels reify a scientific concept. CSCI 578 Course Project - March 31, 2009.

  22. Decomposition Process • Analyze the code for blocks that you can summarize as a step in the high-level process. Candidates include: • functions/methods that the original programmer(s) abstracted • Sequences of code that processes linearly • Draw the call graph between these blocks • Design data structures to pass requisite data between these kernels • Iterate, balancing between the complexity of the data to be passed and the unity of code captured in your kernels CSCI 578 Course Project - March 31, 2009.

  23. Executable Creation For Each Kernel: • Design data/file formats to capture your input and output data structures • Write marshalling & unmarshalling code • Implement kernel as a run() method • Watch for scoping issues (you might not have fully decomposed your system) • Command-line interface: java <KernelName> <input file path> <output file path> CSCI 578 Course Project - March 31, 2009.

  24. Workflow Implementation YAWL: Yet Another Workflow Language http://www.yawl-system.com/ Java-based Implementation Windows, Linux, OS-X CSCI 578 Course Project - March 31, 2009.

  25. Workflow Implementation Taverna: http://taverna.sourceforge.net/ Java-based Implementation Windows, Linux, OS-X CSCI 578 Course Project - March 31, 2009.

  26. Orchestrate the Executables • Once you have finalized your executables: • Implement the control flow you have captured via a workflow specification • Incorporate data dependencies • Validate the correctness of the resulting orchestration • Compare timing and memory usage for the original implementation and the workflow-based system – valgrind, JMX are possible tools CSCI 578 Course Project - March 31, 2009.

  27. Agenda • Introduction • Scientific Computing Domain • Workflows & Grid Computing • Research Challenge • Project Goals • Project Details • Extracting Kernels • Executable Creation • Workflow Specification • SWSA – A Domain Specific Software Architecture • Timeline CSCI 578 Course Project - March 31, 2009.

  28. Making Decisions in Design Space • Existing workflow languages violate separation of concerns • Scientists should work in languages applicable to the design space, not the solution space • Engineers should not have to become scientists to be able to scale workflow-based systems • If workflow languages become the realm of the scientist, how does the software engineer effect change? • Manipulation of the system at the architectural level CSCI 578 Course Project - March 31, 2009.

  29. A Model-Driven Approach Computation Independent Model Workflow Model Implementation Independent Model Domain-Specific Software Architecture Implementation Deployment CSCI 578 Course Project - March 31, 2009.

  30. Workflow Control Elements • Sequence: Do A then B • Conditional: Do A or B • Loop: Do A n times • Branch: Do A and B together • Bound: Do C after completing A and B CSCI 578 Course Project - March 31, 2009.

  31. Orchestration Through Connectors • Lau, et al., have proposed exogenous connectors [Lau, et al. 06]. • encapsulate both control and data flow in a software system • can be hierarchically composed to simulate control flow • Control can be managed through several constructs: • Sequence • Conditional • Branch & Bound A B A B C A B A B CSCI 578 Course Project - March 31, 2009.

  32. Implementation • Prism-MW, an architecturally-aware middleware • Components, Connectors, Topologies and Architecture are reified as first class elements • Exogenous connectors, invoking connectors, and component wrappers around tasks are build with Prism CSCI 578 Course Project - March 31, 2009.

  33. SWSA: A Domain Architecture CSCI 578 Course Project - March 31, 2009.

  34. Putting it Together • After you have implemented your workflow orchestration: • Modify your executables to be prism components: replace marshalling & unmarshalling code with prism event handling code • Develop prism-based connectors with custom event handlers that implement orchestration logic • Compare this implementation to the original code and the workflow system for computation time and memory footprint CSCI 578 Course Project - March 31, 2009.

  35. Timeline • Monday, March 30:Email woollard@usc.edu and csci578@usc.edu with the names, email addresses and student ID numbers of each team member. One email per team please. • Thursday, April 9:Status report 1 due. One per team. Details and format will be sent to the teams by April 2. • Thursday, April 23: Status report 2 due. One per team. Details and format will be sent to the teams by April 16. • Friday, May 1:Project materials due (project report, workflow-based system, and SWSA system). CSCI 578 Course Project - March 31, 2009.

  36. References/Further Reading References of workflow systems: [Yu & Buyya 05] Yu, J. and Buyya, R. A Taxonomy of Workflow Management Systems for Grid Computing. Journal of Grid Computing 3(3-4): pp. 171-200. 2005. [Gil, et. al. 07] Gil, Y., et. al. Examining the Challenges of Scientific Workflows. IEEE Computer 40(12): pp. 24-32. 2007. References of exogenous connectors: [Lau, et. al. 06] Lau, K., et. al. A Software Component Model and its Preliminary Formalisation. In F.S. de Boer et al., editors, Proceedings of Fourth International Symposium on Formal Methods for Components and Objects, Lecture Notes in Computer Science 4111(1-21). 2006. References on SWSA: [Woollard 08] Woollard, D. Supporting the Engineering Aspects of e-Science Through Workflow Services. Proceedings of the First Brazilian e-Science Workshop, Campinas, Brazil, 2008. [Woollard, et. al. 09] Woollard, D. et. al. Injecting Software Architectural Constraints into Legacy Scientific Applications. To appear in Proceedings of the ICSE 2009 Workshop on Software Engineering for Computational Science and Engineering. Vancouver, Canada, 2009. CSCI 578 Course Project - March 31, 2009.

More Related