280 likes | 422 Views
High Performance Workflows for Networks and Grids. Andrew H. Sherman Chief Technology Officer sherman@turboworx.com. Outline. Technical Computing Workflows Deploying Workflows in HPC Environments TurboWorx Workflow Products. Complex Technical Computations are Critical in Many Industries.
E N D
High Performance Workflows for Networks and Grids Andrew H. Sherman Chief Technology Officer sherman@turboworx.com
Outline • Technical Computing Workflows • Deploying Workflows in HPC Environments • TurboWorx Workflow Products
Complex Technical Computationsare Critical in Many Industries • Complex technical computing problems and algorithms have become “business critical” • Solutions often involve integrating several applications and many data sources into workflows • Automated coarse-grain parallelism and grid computing are emerging as key technologies
Complex Technical Computationsare Critical in Many Industries • Complex technical computing problems and algorithms have become “business critical” • Solutions often involve integrating several applications and many data sources into workflows • Automated coarse-grain parallelism and grid computing are emerging as key technologies Life Sciences & Medicine Discovery and Development • Data- & compute-intensive applications • Huge databases from multiple sources & in diverse formats • Manual workflows Information-Based Medicine • Complex, heterogeneous databases & applications • Better and more effective diagnosis & treatment from faster, more accurate information interpretation Automotive/Aero Design and Development • Concurrent Engineering requires integration and collaboration between Concept, Design and Development processes • Global design teams that work around the clock • Suppliers part of the design and development process Finance Portfolio Management/Pricing • Scenario-based modeling • Huge quantities of real-time data • Time is money!
What is a Workflow? “The automation of a business process, in whole or parts, where documents, information or tasks are passed from one participant to another to be processed, according to a set of procedural rules” — Workflow Management Coalition
Technical Computing Workflows How do technical computing workflows differfrom traditional business process workflows? • Data flow vs. control flow • Widely distributed data (often with multiple owners) • Dynamic operating environment (e.g., the Grid) • Hierarchical workflow constructs • Requirement for parameterized executions • Evolving/Customized workflow definitions • Significance of collaboration and reuse
TechnicalWorkflows Characterizing Technical Computing Workflows Business Value Repetition Ref: Production Workflows (Leyman, Roller)
DatabaseServer ComputationCluster Linux UNIX HPC Platforms: SMPs & Clusters Shared Memory Multiprocessor • Expensive to buy, costly to upgrade • Poor scalability for computation • Best use: Data storage & access • Blade Solutions • Similar attributes to Linux clusters • More compact — Better flops/ft3 • Often cheaper • Linux Clusters • Cost-effective • Scalable • Modular — easy to upgrade to faster, better cpus (e.g. 64-bit) • Great for computation
DatabaseServer ComputationCluster Linux UNIX AIX Windows Mac OS X Linux Linux Linux HPC Platforms: Enterprise Grids • Enterprise Grids • Efficient - Uses all the hardware available • Provides user comfort and familiarity • More than cycle stealing on idle desktops — usually includes computing on heterogeneous collections of servers • Great for computation, particularly for Life Sciences, where desktop platforms are appropriate for many algorithms
Technical Computing and Workflows Workflows can address some critical computing challenges: • Integrate, manage, and accelerate collections of heterogeneous applications, data, and platforms • Provide horsepower to process massive amounts of data by applying parallelism without source code modification • Address the needs of key user groups (end users, application experts, and IT staff) through easy-to-use interfaces • Facilitate collaboration and reuse to save time in the design, trials and testing, and deployment of new computing solutions
But . . . There are difficulties to overcome: • Scalability & performance: going beyond multithreading with “transparent parallelism” • Management of dynamic computing environments • Automated data and application staging • Integration with rapidly evolving grid standards(to support reuse and collaboration) • Desktop tools for workflow creation; portals for execution • Debugging and monitoring interfaces
Large, complex scripts to orchestrate applications Static embedded infrastructure control; usually aimed at single machine Communication via temp files “Human-in-the-loop” operation Traditional Workflow Implementation What’s wrong with this?
Large, complex scripts to orchestrate applications Static embedded infrastructure control; usually aimed at single machine Communication via temp files “Human-in-the-loop” operation Traditional Workflow Implementation What’s wrong with this? • Poor performance — Mainly aimed at SMPs (but scalability often limited) • Lack of automation is inefficient and error-prone • No support for application integration or data conversion • Difficult to create, maintain, modify (even for skilled programmers) • Little reusability or portability
Access Data A B C Store Data Fast Slow Fast Traditional Life Science Workflows Typical “Human-in-the Loop” Workflow: • Manual component startup • “Cut and paste” data movement • Sequential execution • Limited throughput due to “bottleneck components”
B Access Data A B C Store Data B A Better Way: Automation & Parallelism TurboWorx High-Performance Workflow: Fast Fast Fast • Automated component startup & data conversion • Pipeline acceleration: asynchronous, dynamic, concurrent execution on distributed machines • Transparent data-driven parallelism to eliminate bottlenecks
TurboWorx Enterprise Architecture AIX Linux Linux Linux Windows Mac OS X Workstations Component Library Data Repository Data Storage Command Line Web Portal TurboWorx Hub Builder Compute Clusters (Managed by BQS/DRM Systems) Interfaces User
Workflow Lifecycle • Design • End user or developer?? • Component & workflow development environment • Integration with data • Testing & Debugging • Deployment • Local storage vs. centralized storage • Sharing & Collaboration • Execution • Execution interface: CLI, Proprietary GUI, Portal, Web/Grid Service • Access Control for workflows and data • Resource management • Monitoring • Events reflecting from workflow and services execution • Refinement & Reuse
TurboWorx Workflows Design & Deployment • Atomic Components • Command-line programs (e.g. C/C++/Fortran, Perl), Java, Jython • XML wrappers created by wizards or by editing templates • Dataflow Components • Workflows built from other components (including other workflows) • Automated data flow & transformations between components • Created using visual programming tool • Deployment • Components stored in a “Component Library” (Local or Centralized) • Import/Export and component sharing (collaboration) • Data references via a virtual “Data Repository” interface (supports WebDav, Avaki, FTP, NFS)
TurboWorx Builder Wizard AtomicComponentCreation ClustalW { } ApplicationJava MethodJython Script TurboWorx Component Component Library WorkflowComponentCreation
Special Components: Loops Support for: “For”, “While”, “Do Until” While Loop:
Special Components: Splitters & Joiners • Components to convert between groups of many data elementsand sequences of the individual data elements • Support “Fork-Join” data parallelism • Standard splitters/joiners provided with the TurboWorx system. Examples: • Arrays: Convert between array and individual elements (in order) • Collections: Convert between a Java.util.Collection and its elements • Strings/Patterns: Split input stream based on regular expressions • Users may create additional types using Jython or Java
Parallelism in Practice TurboWorx High-Performance Workflow: SPLIT JOIN Access Data A B C Store Data Fast Fast Slow Splitting enables pipeline parallelism (A, B, C run concurrently on different data)
B B Parallelism in Practice TurboWorx High-Performance Workflow: SPLIT JOIN Access Data A B C Store Data Fast Fast Fast Scheduler determines amount of data parallelism dynamically at run time
Key Programs Identify BLASTP homologous pairs clustalwhmmbuildhmmsearch Build families around pairs clustalwhmmbuildhmmsearch Refine & optimize protein families Find consensus clustalw sequences Compute identity clustalw scores vs. leaders Protein Characterization Example Overall Task: Group protein domains into families Process Family Subworkflow
Take-Home Points • Technical computing workflows are important in various industries • Effective application of workflows requires HPC, including fault-tolerant automation and dynamic parallelism in a grid-like computing environment • TurboWorx workflow products offer one end-to-end solution for developing and deploying high performance technical workflows