1 / 41

SAGA-based Frameworks: Supporting Application Usage Modes

SAGA-based Frameworks: Supporting Application Usage Modes. Text. Shantenu Jha Director, Cyber-Infrastructure Development, CCT Asst Research Professor, CS e-Science Institute, Edinburgh http://www.cct.lsu.edu/~sjha http://saga.cct.lsu.edu. Outline (1).

mahlah
Download Presentation

SAGA-based Frameworks: Supporting Application Usage Modes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAGA-based Frameworks: Supporting Application Usage Modes Text Shantenu Jha Director, Cyber-Infrastructure Development, CCT Asst Research Professor, CS e-Science Institute, Edinburgh http://www.cct.lsu.edu/~sjha http://saga.cct.lsu.edu

  2. Outline (1) • Understanding Distributed Applications (DA) • Differ from HPC or || App, Challenges of DA • DA Development Objectives (IDEAS) • Understanding SAGA (and the SAGA-Landscape) • Rough Taxonomy of Distributed Applications • Using SAGA to develop Distributed Applications • Examples: Application & Application Frameworks • Discuss how IDEAS are met • Some SAGA-based Tools and Projects • Adv. Of Standards • Derive (Initial) User Requirements for FutureGrid Text

  3. Understanding Distributed ApplicationsCritical Perspectives • The number of applications that utilize multiple sites sequentially, concurrently or asynchronously is low (~5%): • Not referring to tightly-coupled across multiple-sites • Distributed CI: Is the whole > than the sum of the parts? • Managing data and applications across multiple resources is (increasingly) hard: • Distributed Data/Jobs vs Bring it to the Computing • Compute where data is or Data to where computing is • Challenges qualitatively and quantitatively set to get worse: • Increasing complexity, heterogeneity and scale

  4. Understanding Distributed Applications • Distributed Applications Require: • Coordination over Multiple & Distributed sites: • Scale-up and Scale-out • Peta/Exa/Atta - Scientific Applications requiring multiple-runs, ensembles, workflows etc. • Core characteristics of logically and physically distributed applications are the SAME • Application Usage Mode: • Composed using Application as the UNIT of execution • Not a workflow (i.e., composed using control and data flow) • Usage Mode: Closer to an Abstract Workflow (template) • Examples: Run once; or Set of copies of an application with varied input data (Ensemble); Loosely-Coupled ensembles..

  5. Understanding Distributed Applications Development Challenges • Fundamentally a hard problem: • Dynamical Resource, Heterogeneous resources • Add to it: Complex underlying infrastructure • Programming Systems for Distributed Applications: • Incomplete? Customization? Extensibility? • What should end-user control? Must control? • Computational Models of Distributed Computing • Range of DA, no clear taxonomy • More than (peak) performance • Application Usage Mode • Inter-play of Application, Infrastructure, Usage Mode Text

  6. Understanding Distributed ApplicationsImplicit vs Explicit ? • Which approach (implicit vs explicit) is used depends: • How the application is used? • Need to control/marshall more than one resource? • Why distributed resources are being used? • How much can be kept out of the application? • Can’t predict in advance? • Not obvious what to do, application-specific metric • If possible, Applications should not be explicitly distributed • GATEWAYS approach: • Implicit for the end-users • Supporting Applications? Or Application Usage Modes?

  7. Understanding Distributed Applications Development Objectives • Interoperability: Ability to work across multiple distributed resources • Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently • Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure • Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data • Simplicity: Accommodate above distributed concerns at different levels easily… Challenge: How to develop DA effectively and efficiently with the above as first-class objectives?

  8. SAGA: Basic Philosophy • There exists a lack of Programmatic approaches that: • Provide general-purpose common grid functionality for applications and thus hide underlying complexity, varying semantics.. • Hides “bad” heterogeneity, means to address “good” heterogeneity • Building blocks upon which to construct higher-levels of functionality and abstractions • Meets the need for a Broad Spectrum of Application: • Simple Distributed Scripts, Gateways, Smart Applications and Production Grade Tooling, Workflow… • Simple, integrated, stable, uniform and high-level interface • Simple and Stable: 80:20 restricted scope and Standard • Integrated: Similar semantics & style across commonly used distributed functional requirements • Uniform: Same interface for different distributed systems • SAGA: Provides Application* developers with basic units required to compose high-functionality across different distributed systems (*) One person’s Application is another person’s Tool Text

  9. SAGA: In a Thousand Words

  10. SAGA: Job Submission Role of Adaptors (middleware binding)‏ Text

  11. SAGA Job API: Example

  12. SAGA Job Package

  13. SAGA File Package

  14. File API: Example

  15. SAGA Advert

  16. SAGA Advert API: Example

  17. SAGA: Other Packages

  18. SAGA: Implementations • Currently there are several implementations under active development: • C++ Reference Implementation (LSU) -- OMII-UKhttp://saga.cct.lsu.edu/cpp/ • Java Implementation (VU Amsterdam), part of the OMII-UK projecthttp://saga.cct.lsu.edu/java/ • JSAGA (IN2P3/CNRS)http://grid.in2p3.fr/jsaga/ • DEISA (partial) job, file package • C++: Currently at v1.3.3 (October 2009) • Python bindings to the C++ available Good faith effort to keep things working

  19. SAGA: Available Adaptors • Job Adaptors • Fork (localhost), SSH, Condor, Globus GRAM2, OMII GridSAM,Amazon EC2, Platform LSF • File Adaptors • Local FS, Globus GridFTP, Hadoop Distributed Filesystem (HDFS),CloudStore KFS, OpenCloud Sector-Sphere • Replica Adaptors • PostgreSQL/SQLite3, Globus RLS • Advert Adaptors • PostgreSQL/SQLite3, Hadoop H-Base, Hypertable

  20. SAGA: Available Adaptors • Other Adaptors • Default RPC / Stream / SD • Planned Adaptors • CURL file adaptor, gLite job adaptor • Open issues: • Consolidating the Adaptor code base and adding rigorous tests in order to improve adaptor quality • Capability Provider Interface (CPI - the ‘Adaptor API’) is not documented or standardized (yet), but looking at existing adaptor code should get you started if you want to develop your own adaptor • Proof by example..

  21. SAGA and Distributed Applications

  22. Taxonomy of Distributed Application • Example of Distributed Execution Mode: • Implicitly Distributed • 1000 job submissions on the TG • SAGA shell example/tutorial • Example of Explicit Coordination and Distribution • Explicitly Distributed • DAG-based Workflows • EnKF-HM application • Example of SAGA-based Frameworks • MapReduce, Pilot-Jobs

  23. Development Distributed Application Frameworks • Frameworks: Logical structure for Capturing Application Requirements, Characteristics & Patterns • Pattern: Commonly recurring modes of computation • Programming, Deployment, Execution, Data-access.. • Abstraction: Mechanism to support patterns and application characteristics • Frameworks designed to either: • Support Patterns: Map-Reduce, Master-Worker, Hierarchical Job-Submission • Provide the abstractions and/or support the requirements & characteristics of applications • i.e. Encode a Usage-Mode using a Framework

  24. Abstractions for Distributed Computing (1) BigJob: Container Task Adaptive: Type A: Fix number of replicas; vary cores assigned to each replica. Type B: Fix the size of replica, vary number of replicas (Cool Walking) -- Same temperature range (adaptive sampling) -- Greater temperature range (enhanced dynamics)

  25. Abstractions for Distributed Computing (2)SAGA Pilot-Job (Glide-In)

  26. Coordinate Deployment & Scheduling of Multiple Pilot-Jobs

  27. Distributed Adaptive Replica Exchange (DARE)Scale-Out, Dynamic Resource Allocation and Aggregation

  28. Multi-Physics Runtime FrameworksExtensibility • Coupled Multi-Physics require two distinct, but concurrent simulations • Can co-scheduling be avoided? • Adaptive execution model: Yes • Load-balancing required. Capability comes for free! • First demonstrated multi-platform Pilot-Job: • TG(MD) – Condor (CFD)

  29. Dynamic ExecutionReduced Time to Solution

  30. Ensemble Kalman FiltersHeterogeneous Sub-Tasks • Ensemble Kalman filters (EnKF), are recursive filters to handle large, noisy data; use the EnKF for history matching and reservoir characterization • EnKF is a particularly interesting case of irregular, hard-to-predict run time characteristics:

  31. Using more machines decreases the TTC and variation between experiments Using BQP decreases the TTC & variation between experiments further Lowest time to completion achieved when using BQP and all available resources Results: Scale-Out Performance

  32. Performance Advantage from Scale-Out But Why does BQP Help?

  33. Understanding Distributed Applications Development Objectives Redux • Interoperability: Ability to work across multiple distributed resources • SAGA: Middleware Agnostic • Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently • Support Multiple Pilot-Jobs: Ranger, Abe, QB • Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure • Pilot-Job also Coupled CFD-MD, Integrated BQP • Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data • Simplicity: Accommodate above distributed concerns at different levels easily…

  34. SAGA: Bridging the Gap between Infrastructure and Applications Focus on Application Development and Characteristics, not infrastructure details

  35. SAGA-based Tools and Projects • JSAGA from IN2P3 (Lyon) • http://grid.in2p3.fr/jsaga/index.html • Slides Ack: Sylvain Renaud • GANGA-DIANE (EGEE) • http://faust.cct.lsu.edu/trac/saga/wiki/Applications/GangaSAGA • Slides Ack: Jackub Mosciki, Massimo L, O. Weidner • NAREGI/KEK (Active) • DESHL • DEISA-based Shell and Workflow library • XtreemOS • SD Specification • With gLite adaptors Advantage of Standards Text

  36. JSAGA uses SAGA in a module, which hides heterogeneity of grid infrastructures JSAGA implements SAGA to hide heterogeneity of middlewares Applications JSAGA jobs collection SAGA JSAGA core engine + plug-ins Legacy APIs JSAGA: Implementer and user of SAGA

  37. / Projects using JSAGA • Elis@ • a web portal for submitting jobs to industrial and research grid infrastructures • SimExplorer • a set of tools for managing simulation experiments • includes a workflow engine that submit jobs to heterogeneous distributed computing resources • JJS • a tool for running efficiently short-life jobs on EGEE • JUX • a multi-protocols file browser JSAGA

  38. DIANE INTEGRATION cont. Diane without SAGA Diane with SAGA

  39. Applications on heterogeneous resources Federating resources! Payload distribution (Not in this demo: cloud resources, additional Grid infrastructures…) Master Application-aware (and resource-aware) scheduling Ganga/SAGA (to *) Ganga/SAGA (to TeraGrid) Ganga/gLite Agents scheduling Heterogeneous resources allocation (Ganga + Ganga/SAGA)

  40. Acknowledgements SAGA Team and DPA Team and the UK-EPSRC (UK EPSRC: DPA, OMII-UK , OMII-UK PAL) People: SAGA D&D: Hartmut Kaiser, Ole Weidner, Andre Merzky, Joohyun Kim, Lukasz Lacinski, João Abecasis, Chris Miceli, Bety Rodriguez-Milla SAGA Users: Andre Luckow, Yaakoub el-Khamra, Kate Stamou, Cybertools (Abhinav Thota, Jeff, N. Kim), Owain Kenway Google SoC: Michael Miceli, Saurabh Sehgal, Miklos Erdelyi Collaborators and Contributors: Steve Fisher & Group, Sylvain Renaud (JSAGA), Go Iwai & Yoshiyuki Watase (KEK) DPA: Dan Katz, Murray Cole, Manish Parashar, Omer Rana, Jon Weissman

More Related