410 likes | 499 Views
SAGA-based Frameworks: Supporting Application Usage Modes. Text. Shantenu Jha Director, Cyber-Infrastructure Development, CCT Asst Research Professor, CS e-Science Institute, Edinburgh http://www.cct.lsu.edu/~sjha http://saga.cct.lsu.edu. Outline (1).
E N D
SAGA-based Frameworks: Supporting Application Usage Modes Text Shantenu Jha Director, Cyber-Infrastructure Development, CCT Asst Research Professor, CS e-Science Institute, Edinburgh http://www.cct.lsu.edu/~sjha http://saga.cct.lsu.edu
Outline (1) • Understanding Distributed Applications (DA) • Differ from HPC or || App, Challenges of DA • DA Development Objectives (IDEAS) • Understanding SAGA (and the SAGA-Landscape) • Rough Taxonomy of Distributed Applications • Using SAGA to develop Distributed Applications • Examples: Application & Application Frameworks • Discuss how IDEAS are met • Some SAGA-based Tools and Projects • Adv. Of Standards • Derive (Initial) User Requirements for FutureGrid Text
Understanding Distributed ApplicationsCritical Perspectives • The number of applications that utilize multiple sites sequentially, concurrently or asynchronously is low (~5%): • Not referring to tightly-coupled across multiple-sites • Distributed CI: Is the whole > than the sum of the parts? • Managing data and applications across multiple resources is (increasingly) hard: • Distributed Data/Jobs vs Bring it to the Computing • Compute where data is or Data to where computing is • Challenges qualitatively and quantitatively set to get worse: • Increasing complexity, heterogeneity and scale
Understanding Distributed Applications • Distributed Applications Require: • Coordination over Multiple & Distributed sites: • Scale-up and Scale-out • Peta/Exa/Atta - Scientific Applications requiring multiple-runs, ensembles, workflows etc. • Core characteristics of logically and physically distributed applications are the SAME • Application Usage Mode: • Composed using Application as the UNIT of execution • Not a workflow (i.e., composed using control and data flow) • Usage Mode: Closer to an Abstract Workflow (template) • Examples: Run once; or Set of copies of an application with varied input data (Ensemble); Loosely-Coupled ensembles..
Understanding Distributed Applications Development Challenges • Fundamentally a hard problem: • Dynamical Resource, Heterogeneous resources • Add to it: Complex underlying infrastructure • Programming Systems for Distributed Applications: • Incomplete? Customization? Extensibility? • What should end-user control? Must control? • Computational Models of Distributed Computing • Range of DA, no clear taxonomy • More than (peak) performance • Application Usage Mode • Inter-play of Application, Infrastructure, Usage Mode Text
Understanding Distributed ApplicationsImplicit vs Explicit ? • Which approach (implicit vs explicit) is used depends: • How the application is used? • Need to control/marshall more than one resource? • Why distributed resources are being used? • How much can be kept out of the application? • Can’t predict in advance? • Not obvious what to do, application-specific metric • If possible, Applications should not be explicitly distributed • GATEWAYS approach: • Implicit for the end-users • Supporting Applications? Or Application Usage Modes?
Understanding Distributed Applications Development Objectives • Interoperability: Ability to work across multiple distributed resources • Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently • Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure • Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data • Simplicity: Accommodate above distributed concerns at different levels easily… Challenge: How to develop DA effectively and efficiently with the above as first-class objectives?
SAGA: Basic Philosophy • There exists a lack of Programmatic approaches that: • Provide general-purpose common grid functionality for applications and thus hide underlying complexity, varying semantics.. • Hides “bad” heterogeneity, means to address “good” heterogeneity • Building blocks upon which to construct higher-levels of functionality and abstractions • Meets the need for a Broad Spectrum of Application: • Simple Distributed Scripts, Gateways, Smart Applications and Production Grade Tooling, Workflow… • Simple, integrated, stable, uniform and high-level interface • Simple and Stable: 80:20 restricted scope and Standard • Integrated: Similar semantics & style across commonly used distributed functional requirements • Uniform: Same interface for different distributed systems • SAGA: Provides Application* developers with basic units required to compose high-functionality across different distributed systems (*) One person’s Application is another person’s Tool Text
SAGA: Job Submission Role of Adaptors (middleware binding) Text
SAGA: Implementations • Currently there are several implementations under active development: • C++ Reference Implementation (LSU) -- OMII-UKhttp://saga.cct.lsu.edu/cpp/ • Java Implementation (VU Amsterdam), part of the OMII-UK projecthttp://saga.cct.lsu.edu/java/ • JSAGA (IN2P3/CNRS)http://grid.in2p3.fr/jsaga/ • DEISA (partial) job, file package • C++: Currently at v1.3.3 (October 2009) • Python bindings to the C++ available Good faith effort to keep things working
SAGA: Available Adaptors • Job Adaptors • Fork (localhost), SSH, Condor, Globus GRAM2, OMII GridSAM,Amazon EC2, Platform LSF • File Adaptors • Local FS, Globus GridFTP, Hadoop Distributed Filesystem (HDFS),CloudStore KFS, OpenCloud Sector-Sphere • Replica Adaptors • PostgreSQL/SQLite3, Globus RLS • Advert Adaptors • PostgreSQL/SQLite3, Hadoop H-Base, Hypertable
SAGA: Available Adaptors • Other Adaptors • Default RPC / Stream / SD • Planned Adaptors • CURL file adaptor, gLite job adaptor • Open issues: • Consolidating the Adaptor code base and adding rigorous tests in order to improve adaptor quality • Capability Provider Interface (CPI - the ‘Adaptor API’) is not documented or standardized (yet), but looking at existing adaptor code should get you started if you want to develop your own adaptor • Proof by example..
Taxonomy of Distributed Application • Example of Distributed Execution Mode: • Implicitly Distributed • 1000 job submissions on the TG • SAGA shell example/tutorial • Example of Explicit Coordination and Distribution • Explicitly Distributed • DAG-based Workflows • EnKF-HM application • Example of SAGA-based Frameworks • MapReduce, Pilot-Jobs
Development Distributed Application Frameworks • Frameworks: Logical structure for Capturing Application Requirements, Characteristics & Patterns • Pattern: Commonly recurring modes of computation • Programming, Deployment, Execution, Data-access.. • Abstraction: Mechanism to support patterns and application characteristics • Frameworks designed to either: • Support Patterns: Map-Reduce, Master-Worker, Hierarchical Job-Submission • Provide the abstractions and/or support the requirements & characteristics of applications • i.e. Encode a Usage-Mode using a Framework
Abstractions for Distributed Computing (1) BigJob: Container Task Adaptive: Type A: Fix number of replicas; vary cores assigned to each replica. Type B: Fix the size of replica, vary number of replicas (Cool Walking) -- Same temperature range (adaptive sampling) -- Greater temperature range (enhanced dynamics)
Abstractions for Distributed Computing (2)SAGA Pilot-Job (Glide-In)
Distributed Adaptive Replica Exchange (DARE)Scale-Out, Dynamic Resource Allocation and Aggregation
Multi-Physics Runtime FrameworksExtensibility • Coupled Multi-Physics require two distinct, but concurrent simulations • Can co-scheduling be avoided? • Adaptive execution model: Yes • Load-balancing required. Capability comes for free! • First demonstrated multi-platform Pilot-Job: • TG(MD) – Condor (CFD)
Ensemble Kalman FiltersHeterogeneous Sub-Tasks • Ensemble Kalman filters (EnKF), are recursive filters to handle large, noisy data; use the EnKF for history matching and reservoir characterization • EnKF is a particularly interesting case of irregular, hard-to-predict run time characteristics:
Using more machines decreases the TTC and variation between experiments Using BQP decreases the TTC & variation between experiments further Lowest time to completion achieved when using BQP and all available resources Results: Scale-Out Performance
Performance Advantage from Scale-Out But Why does BQP Help?
Understanding Distributed Applications Development Objectives Redux • Interoperability: Ability to work across multiple distributed resources • SAGA: Middleware Agnostic • Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently • Support Multiple Pilot-Jobs: Ranger, Abe, QB • Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure • Pilot-Job also Coupled CFD-MD, Integrated BQP • Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data • Simplicity: Accommodate above distributed concerns at different levels easily…
SAGA: Bridging the Gap between Infrastructure and Applications Focus on Application Development and Characteristics, not infrastructure details
SAGA-based Tools and Projects • JSAGA from IN2P3 (Lyon) • http://grid.in2p3.fr/jsaga/index.html • Slides Ack: Sylvain Renaud • GANGA-DIANE (EGEE) • http://faust.cct.lsu.edu/trac/saga/wiki/Applications/GangaSAGA • Slides Ack: Jackub Mosciki, Massimo L, O. Weidner • NAREGI/KEK (Active) • DESHL • DEISA-based Shell and Workflow library • XtreemOS • SD Specification • With gLite adaptors Advantage of Standards Text
JSAGA uses SAGA in a module, which hides heterogeneity of grid infrastructures JSAGA implements SAGA to hide heterogeneity of middlewares Applications JSAGA jobs collection SAGA JSAGA core engine + plug-ins Legacy APIs JSAGA: Implementer and user of SAGA
/ Projects using JSAGA • Elis@ • a web portal for submitting jobs to industrial and research grid infrastructures • SimExplorer • a set of tools for managing simulation experiments • includes a workflow engine that submit jobs to heterogeneous distributed computing resources • JJS • a tool for running efficiently short-life jobs on EGEE • JUX • a multi-protocols file browser JSAGA
DIANE INTEGRATION cont. Diane without SAGA Diane with SAGA
Applications on heterogeneous resources Federating resources! Payload distribution (Not in this demo: cloud resources, additional Grid infrastructures…) Master Application-aware (and resource-aware) scheduling Ganga/SAGA (to *) Ganga/SAGA (to TeraGrid) Ganga/gLite Agents scheduling Heterogeneous resources allocation (Ganga + Ganga/SAGA)
Acknowledgements SAGA Team and DPA Team and the UK-EPSRC (UK EPSRC: DPA, OMII-UK , OMII-UK PAL) People: SAGA D&D: Hartmut Kaiser, Ole Weidner, Andre Merzky, Joohyun Kim, Lukasz Lacinski, João Abecasis, Chris Miceli, Bety Rodriguez-Milla SAGA Users: Andre Luckow, Yaakoub el-Khamra, Kate Stamou, Cybertools (Abhinav Thota, Jeff, N. Kim), Owain Kenway Google SoC: Michael Miceli, Saurabh Sehgal, Miklos Erdelyi Collaborators and Contributors: Steve Fisher & Group, Sylvain Renaud (JSAGA), Go Iwai & Yoshiyuki Watase (KEK) DPA: Dan Katz, Murray Cole, Manish Parashar, Omer Rana, Jon Weissman