Towards a High Performance Extensible Grid Architecture

Towards a High PerformanceExtensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter, mahes}@cs.umanitoba.ca Computer Science Department University of Manitoba Winnipeg Manitoba

Outline • Grid Computing Issues • Network computing environment • Scalability, Extensibility, and Adaptability • Quality of Service • Grid Models • Resource Management Techniques • Application Execution Models • Grid Architecture • Example Applications • Compiling, Numerical Processing, Grid Aware Application • Related Work

Grid Computing Issues

Network Computing Environment • Heterogeneous Nodes • Autonomous administration domains with different resource management policies • Servers, network devices, workstations, PDA, etc. • Connected by Communication Links • Support differentiated service levels • Use native operating system services • Does not replace existing scheduling and resource control mechanisms • Native operating system is a Grid device driver

Scalability • Target Size • Hundreds to Millions of nodes • Different platforms for different scale Grids • Global resource management protocols • Fixed format messages • Ability to locally tune protocol performance parameters to match local infrastructure and administrative policy • Local policies for resource management • Scheduling, Quality of Service, Tolerance to faults

Extensibility and Adaptability • Extensible resource protocol content • Fixed message framework with structured extensibility (XML like) • Extensible resource management protocol processing • Message content extensions are processed by extension modules • Modules are dynamically loaded and register content identifiers • Variability • Multiple different implementations of the resource protocols • Adaptability • Nodes and resources enter and leave the grid continuously • Fault tolerance by resource replication • Operate in an actively hostile environment • Try to survive Byzantine failures

Quality of Service • Not restricted to end-to-end network • Processor, memory, I/O also need to support QoS specifications • Co-allocation and Co-reservation • Allocation and scheduling need to take into account QoS given to other jobs already in the Grid • Providing Service Level Agreements • Aggregate performance levels or on a per job basis? • Site autonomy and resource control restricts the ability to provide guarantees • Applications should be able to negotiate QoS with the Grid

Grid Models

Resource Management Techniques • Super Scheduler • Hierarchy of cooperating schedulers • Issues: Co-allocation • Market Based • Auctioning for resources • Issues: Price management and co-allocation • Resource Discovery • Resource attribute and status in a distributed database • Centralized, Agent based, or Hybrid • Issues: devise highly distributed, scalable, fault tolerant schemes

Application Execution Models • Legacy application • Native OS resource and scheduling, implicit QoS • Use external resource description language • Modify native OS and service libraries and infer resource requirements and QoS • Recompile with Grid aware compiler that inserts specialized Grid code • Grid Aware application • Use specialized Grid API • First “applications” will be compilers, service libraries (MPI, PVM), Grid workbenches and monitoring tools

Grid Aware Applications

Non-Grid Aware Applications

Grid Architecture

Design Approach • Layered • Grid Kernel • Grid Core Services • Grid toolkits, workbenches, and user interfaces • Fully distributed peer-to-peer model • No centralized information servers • Implementations free to use specialized servers • Minimal configuration • Use Service Location Protocol like service

Grid Kernel Architectural Principles • Functions that use the services are aware of the distributed environment • No guarantees made about reliability of nodes or links • Operate on all types of heterogeneous nodes using minimal resources • Services will be implemented using native OS with minimal changes to trusted computing base • Provide uniform extensible API and services across all nodes • Provide resource management mechanisms but do not implement resource management policies

Grid Architecture

Grid Layers and Core Services

Grid Example Applications

Applications • Compiling • Ensure similar compiler and libraries are used on all nodes • Compute how long to transfer and compile • Perform deadline scheduling • Legacy Numerical Processing • Dynamically linking of Grid code, variable QoS for job steps • Describe network QoS requirements or infer dynamically • Much further research required • Collaborative Research Workbench • Negotiate video bandwidth required • Query if a simulation can be run and completed quickly, or schedule it later • Different GUI depending on resources nearby to a research

Related Work

Related Work • Application Enabling Systems • Provide tools to allow applications to access globally distributed resources in an integrated fashion • ATLAS, Globe, Globus/GUSTO, Legion, ParaWeb, Symera • User Access Systems • Provide end users of the Grid transparent access to geographically distributed systems in a location independent manner • CCS, MOL, NetSolve, PUNCH

Questions ?

Towards a High Performance Extensible Grid Architecture

Towards a High Performance Extensible Grid Architecture

Presentation Transcript

High Performance Cloud Storage Technical Architecture

Grid Architecture

A High-Performance Scalable Graphics Architecture

Work towards high performance accelerator structures

ANA: A Flexible and High-Performance Network Architecture?

Towards a unified Cyberaide architecture

High Performance and Grid Computing Group

European Grid Initiative - Towards a Sustainable Grid Infrastructure

High Performance Processor Architecture

High-Performance Grid Computing and Research Networking

Asia Pacific Grid: Towards a production Grid

A Component Architecture for High Performance Computing

Grid Middleware for High Performance Computing

High-Performance Computer Architecture

High Performance Data Streaming in a Service Architecture

A HIGH PERFORMANCE VLSI FFT ARCHITECTURE

Grid Architecture

High Performance Cluster and Grid Computing

Towards High Performance Network Defense

High Performance Data Streaming in a Service Architecture