250 likes | 350 Views
Towards a High Performance Extensible Grid Architecture. Klaus Krauter Muthucumaru Maheswaran {krauter, mahes}@cs.umanitoba.ca Computer Science Department University of Manitoba Winnipeg Manitoba. Outline. Grid Computing Issues Network computing environment
E N D
Towards a High PerformanceExtensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter, mahes}@cs.umanitoba.ca Computer Science Department University of Manitoba Winnipeg Manitoba
Outline • Grid Computing Issues • Network computing environment • Scalability, Extensibility, and Adaptability • Quality of Service • Grid Models • Resource Management Techniques • Application Execution Models • Grid Architecture • Example Applications • Compiling, Numerical Processing, Grid Aware Application • Related Work
Network Computing Environment • Heterogeneous Nodes • Autonomous administration domains with different resource management policies • Servers, network devices, workstations, PDA, etc. • Connected by Communication Links • Support differentiated service levels • Use native operating system services • Does not replace existing scheduling and resource control mechanisms • Native operating system is a Grid device driver
Scalability • Target Size • Hundreds to Millions of nodes • Different platforms for different scale Grids • Global resource management protocols • Fixed format messages • Ability to locally tune protocol performance parameters to match local infrastructure and administrative policy • Local policies for resource management • Scheduling, Quality of Service, Tolerance to faults
Extensibility and Adaptability • Extensible resource protocol content • Fixed message framework with structured extensibility (XML like) • Extensible resource management protocol processing • Message content extensions are processed by extension modules • Modules are dynamically loaded and register content identifiers • Variability • Multiple different implementations of the resource protocols • Adaptability • Nodes and resources enter and leave the grid continuously • Fault tolerance by resource replication • Operate in an actively hostile environment • Try to survive Byzantine failures
Quality of Service • Not restricted to end-to-end network • Processor, memory, I/O also need to support QoS specifications • Co-allocation and Co-reservation • Allocation and scheduling need to take into account QoS given to other jobs already in the Grid • Providing Service Level Agreements • Aggregate performance levels or on a per job basis? • Site autonomy and resource control restricts the ability to provide guarantees • Applications should be able to negotiate QoS with the Grid
Resource Management Techniques • Super Scheduler • Hierarchy of cooperating schedulers • Issues: Co-allocation • Market Based • Auctioning for resources • Issues: Price management and co-allocation • Resource Discovery • Resource attribute and status in a distributed database • Centralized, Agent based, or Hybrid • Issues: devise highly distributed, scalable, fault tolerant schemes
Application Execution Models • Legacy application • Native OS resource and scheduling, implicit QoS • Use external resource description language • Modify native OS and service libraries and infer resource requirements and QoS • Recompile with Grid aware compiler that inserts specialized Grid code • Grid Aware application • Use specialized Grid API • First “applications” will be compilers, service libraries (MPI, PVM), Grid workbenches and monitoring tools
Design Approach • Layered • Grid Kernel • Grid Core Services • Grid toolkits, workbenches, and user interfaces • Fully distributed peer-to-peer model • No centralized information servers • Implementations free to use specialized servers • Minimal configuration • Use Service Location Protocol like service
Grid Kernel Architectural Principles • Functions that use the services are aware of the distributed environment • No guarantees made about reliability of nodes or links • Operate on all types of heterogeneous nodes using minimal resources • Services will be implemented using native OS with minimal changes to trusted computing base • Provide uniform extensible API and services across all nodes • Provide resource management mechanisms but do not implement resource management policies
Applications • Compiling • Ensure similar compiler and libraries are used on all nodes • Compute how long to transfer and compile • Perform deadline scheduling • Legacy Numerical Processing • Dynamically linking of Grid code, variable QoS for job steps • Describe network QoS requirements or infer dynamically • Much further research required • Collaborative Research Workbench • Negotiate video bandwidth required • Query if a simulation can be run and completed quickly, or schedule it later • Different GUI depending on resources nearby to a research
Related Work • Application Enabling Systems • Provide tools to allow applications to access globally distributed resources in an integrated fashion • ATLAS, Globe, Globus/GUSTO, Legion, ParaWeb, Symera • User Access Systems • Provide end users of the Grid transparent access to geographically distributed systems in a location independent manner • CCS, MOL, NetSolve, PUNCH