1 / 21

VGrADS Runtime System Architecture and Research

This paper discusses the challenges of implementing virtual grid abstractions in resource management optimization, including simplifying resource abstraction, enabling better optimization, scaling to larger resource environments, and providing useful and current information for dynamic decisions.

ameliab
Download Presentation

VGrADS Runtime System Architecture and Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VGrADS Runtime System Architecture and Research Andrew Chien, Henri Casanova, Rich Wolski, Carl Kesselman, Fran Berman, Dan Reed, Jack Dongarra Feb 2004 Kickoff Meeting Rice University

  2. Runtime Challenges • Simplify Resource Abstraction for PPS • Enable Simpler and Better Optimization • Scale to Larger and More Complex Resource Environment • Grids resource pools are large and growing • Provide Better Information • Useful and Current Information for Dynamic Decisions

  3. What is a Virtual Grid? “PPS/Application” • A Resource Management Oriented Abstraction • Provides Simple Resource Performance Model for the PPS/Application • Structures Information Collection by the Execution System • Accelerate And Improve Decision Making About Resources • Reduced Scope Of Resource Monitoring And Scheduling • Improved Scalability And Quality • Enables Proactive And Reactive Resource Monitoring, Acquisition To Improve Properties Of Performance, Reliability, Stability, Security, Etc. “Virtual Grids” “The Grid”

  4. SDSC SDSC SDSC Caltech Caltech Caltech 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms TeraGrid Backplane TeraGrid Backplane TeraGrid Backplane 30 Gb/s 20ms 30 Gb/s 20ms 30 Gb/s 20ms 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms 1 Gb/s 1 Gb/s 1 Gb/s 1 Gb/s 1 Gb/s 1 Gb/s 1 Gb/s 1 Gb/s 1 Gb/s NCSA NCSA NCSA PSC PSC PSC ANL ANL ANL Figure 5. Teragrid topology Figure 5. Teragrid topology Figure 5. Teragrid topology 1 Gb/s 1 Gb/s 1 Gb/s 1 Gb/s 1 Gb/s 1 Gb/s How are Virtual Grid Abstractions Defined? App • Top-down (Application => Virtualization) • Bottom-up (Resource Properties – Individual & Aggregate => Virtualization) • Better Aggregate Resource Capabilities – Rapid Selection and Binding) • Attributes: Resource Properties, Communication Structure, Aggregates Over These Such as Reliability, Quality of Service (Into Picture) Resources

  5. Application-Driven Example • Dataset and Computations on Data Elements • EOL: Genomic and Proteomic Databases • Annotation Pipeline Employs Myriad Applications and Heterogeneous Workers • Computations Operate on Parts of the Dataset and Compute New Elements of New Datasets

  6. SDSC Caltech 10 Gb/s5ms 10 Gb/s5ms TeraGrid Backplane 30 Gb/s 20ms 10 Gb/s5ms 10 Gb/s5ms 10 Gb/s5ms 1 Gb/s 1 Gb/s 1 Gb/s NCSA PSC ANL Figure 5. Teragrid topology 1 Gb/s 1 Gb/s Resource-Driven Example • TeraGrid: Five Clusters, Fast Network • Direct Access to Individual Clusters and Parts • Virtualized as Clusters (Multiple Choices) and as a Single Cluster • One/Part of a Cluster Dedicated as Data Servers Single Cluster Data Server+ Clusters Singleton Clusters Teragrid

  7. Example: Uniform Parallel Grid • N Uniform Performance Nodes; Rich connectivity • Range of “close approximations”

  8. Heterogeneous Collection • Oracle Grid: Database and Cluster of Workers • Abstract Component Machine?

  9. And many more… • Cluster of clusters • Big bag of processors • Big bag of disks • Humongous bag of processors • Humongous bag of disks • …

  10. Uniform Cluster Heterogeneous Cluster … Generic or Custom Grid Services Rsc Info/ Perf Monitor Rsc Access Selection Checkpointing & Fault Toler. Schedule/ Reschedule Proactive Rsc Reserve/Bind Narrowed Scope Resource Classes: Classification, Composition, Virtualization Secure Clusters x86 Clusters IA64 Clusters Desktop Grid … Virtual Grid Architecture Application/PPS/Libraries Compute: (inst set, special operations, libraries, Etc.) Comm: (multicast, reduce, P2p, Lambda’s, etc.) Dynamic RM: (add rsc, release, inquire, swap, overallocate, reserve, etc.) VGrid RSpec: (security, reliability, communication, performance, location, QoS, etc)

  11. Virtual Grid Runtime Challenges: Implement the Abstractions • Custom Abstractions: Intelligence for Each • Scheduling, Composition, Proactive techniques • Resource Characterization and Classification • Monitoring and Detecting Pailures and Performance Violations of VG Abstractions • Rapid Rescheduling in Response to Failures and Performance Violations • Scaling and Information and Decision Quality • Others?

  12. Resource Classes • Characterization and Organization of Resources • => Short and Long-term Monitoring and Analysis of Resources • Many Open Questions • What are the Meaningful and Useful Resource Classes? • How do We Both Support Large-scale of Resources, Yet Refined Classification? • Is this a Multi-classification? • Is this Centralized or Decentralized or Both?

  13. Virtual Grid GrADSoft vs. Virtual Grids and GrADS Conceptual GrADSoft • Broad, General-Purpose Model vs. • Narrow / Specialized Model per Abstraction

  14. Virtual Grids and GrADS Architecture • Small Set of Virtual Grid Abstractions = Performance Models = PPS View • Decouples the Optimization Problems • BUT, coordination on adaptation still required • Customized Information Collection (per PPS view) • Customized Resource Management / Scheduling (per PPS view) • Customized monitoring (per PPS view)

  15. Initial Steps and Activities • Take Familiar and Important Application/Workloads and Explore Issues • What Type of Virtual Grid Might an Application Specify? How Might We Exploit These Attributes for Better Selection/Scheduling, Etc. • Initial Work On EOL and Speeding Critical Phases • Take Typical Resource Configurations And Elicit Structure • What are Grid Resource Configuration? • What are Their Characteristics (Static, Dynamic) • Do These Classify Naturally Fall into Structured Classification? • Can We Reduce the Scope Needed thru Virtual Grid Mechanism? • Explore Future Grid Information Systems • What Information Can be Provided with What Resolution and Accuracy, Scaling? • Can Virtual Grid Improve and Organize the Quality of Information for Adaptation? • Techniques To Make The Gathering And Distribution Of Different Types Of Information More Efficient And Scalable

  16. Runtime Deliverables • September 2004, end Year 1 Virtual Grid • Prototype Resource Virtualization and Abstraction Classes [V1] • Virtual Scheduling requirements study [V2] Performance Provisioning: • Initial time-space reasoning for contracts and signatures [PP1] Grid Economy: • Develop rudimentary simulation of VGrADS resource allocation mechanisms. [GE1] • Begin the exploration of Tatonnement, Smale's method, and Continuous-Price Double auctions using simulation. [GE2] Fault Tolerance: • Experimental measurement of Grid & cluster reliability [FT1]

  17. Runtime Deliverables (cont.) • September 2005, end Year 2 Virtual Grid: • Prototype Virtual Grid examples defined [V3] • Prototype virtual scheduler [V4] Performance Provisioning: • Extended time-space reasoning for contracts and signatures [PP2] Grid Economy: • Determine initial pricing conditions and pricing methods that prevent multiple equilibria. [GE3] • Verify stability results using simulation environment. [GE4] Fault Tolerance: • Prototype fault tolerant library [FT2]

  18. And beyond… • September 2006, end Year 3 Virtual Grid: • Novel resource selection and virtual scheduling strategy experiments with application kernels on virtual grid environments [V5] Performance Provisioning: • Limited tunable performance/fault-tolerance capabilities [PP3] Grid Economy: • Begin designing experiments to test pricing techniques using VGrADS framework. [GE5] • Continue simulation experiments to evaluate resource allocation efficiency. [GE6] Fault Tolerance: • Consider novel techniques [FT3] • September 2007, end Year 4 • September 2008, end Year 5

  19. Research Questions I • How General and Precise a Description Language for Virtual Grid Abstractions do We NEED? Or do Application/PPS WANT? • Vgrid is an SOA, All Can Be Used Electively – No Layering • Can We Meaningfully Support Use of the System and Modification at Multiple Levels of Abstraction? • What are the New Scheduling, Adaptation, Monitoring Capabilities and Opportunities of Virtual Grid? Can We Prove Properties Relative to Global Grid Views? • What are a Minimal and Expressive Set of Resource Management Services for Virtual Grid Abstractions? Pass Appropriate Information to Allow Lower Level Optimization; Higher Level Control

  20. Research Questions II • Where Do Ideas Of Transparent Fault-tolerance Fit? Embedded In For Example In Many Reliable Virtual Grid Abstractions? • Classification: What Are The Meaningful/Useful Resource Classes? • How Do We Both Support Large-scale Of Resources, Yet Refined Classification? • Is This A Multi-classification? Is This Centralized Or Decentralized Or Both? • How Do These Affect The Interfaces To The Program Preparation System? • What Interfaces Might be Preserved? • Move Towards A Limited Set Of Descriptions, Conversion, Basic Resource Selection • SOA Based on Java or WS-resource • Potentially A Separate Implementation Of Each VG Abstraction; Significant Sharing Expected • How To They Affect The Functionality Needed In The Program Preparation System? • Incremental Adaptation? Checkpointing? Fault Tolerance?

More Related