200 likes | 455 Views
Application Scheduling in a Grid Environment. Nine month progress talk Laurie Young. Overview. Introduction to grid computing Work so far… Imperial College E-Science Networked Infrastructure (ICENI) Scheduling within ICENI Optimisation criteria/Scheduling policy
E N D
Application Scheduling in a Grid Environment Nine month progress talk Laurie Young
Overview • Introduction to grid computing • Work so far… • Imperial College E-Science Networked Infrastructure (ICENI) • Scheduling within ICENI • Optimisation criteria/Scheduling policy • Scheduling/Mapping algorithms
What is a Grid? Visulisation/Steering Software CPU Node Scientific Instrument CPU Node Storage Node
~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec Online System Tier 0 CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) Tier 1 FermiLab ~4 TIPS France Regional Centre Germany Regional Centre Italy Regional Centre ~622 Mbits/sec Tier 2 Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Caltech ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS HPSS HPSS HPSS HPSS HPSS ~622 Mbits/sec Institute ~0.25TIPS Institute Institute Institute Physics data cache ~1 MBytes/sec 1 TIPS is approximately 25,000 SpecInt95 equivalents Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Tier 4 Physicist workstations What is a Grid Application?
Current Work • Development of Supporting Technologies • Development of EPIC (E-Science Portal @ IC) • GridFTP (High throughput FTP) • Grid/Globus submission of jobs to resources • Development of test application • Parameter sweep analysis of submarine acoustics • Multithreaded and Component versions • Integration with EPIC
ICENI • ICe-Science Networked Infrastructure • Developed by LeSC Grid Middleware Group • Collect and provide relevant Grid meta-data • Use to define and develop higher-level services The Iceni, under Queen Boudicca, united the tribes of South-East England in a revolt against the occupying Roman forces in AD60.
ICENI Component Applications • Each ICENI job is composed of multiple components. Each runs on a different resource • Each component is connected to at least one other component. Data is passed along these connections
The Scheduling Problem Given a component application and a (large) network of linked computational resources, what is the best mapping of components onto resources?
Scheduler in ICENI ICENI App Builder (GUI) Component Repository Performance Models Scheduler Broker Resources
Multiple Metrics (1) • “It is the goal of a scheduler to optimise one or more metrics” (Feitelson & Rudolph) • Generally one metric is most important • Application Optimisation • Execution time • Execution cost • Host Optimisation • Host utilisation • Host throughput • Interaction Latency
Multiple Metrics (2) • In a Grid Environment there are three application optimisation based important metrics • Start time ( ) • End time ( ) • Cost ( ) • Relative importance varies on a user by user and application by application basis
Combining Metrics – Benefit Fn • A Benefit Function maps the metrics we are interested in to a single Benefit Value metric • Different benefit functions represent different optimisation preferences
Optimisation Preferences • Cost Optimisation • Time Optimisation • Cost/Time Optimisation
Graph Oriented Scheduling (1) • Applications are described as a graph • Nodes represent application components • Edges represent component communication • Resources are described as a graph • Nodes represent resources • Edges represent network connections
Storage Storage VIKING 2 P4/Linux Cluster 68 dual node 100Mb VIKING 1 P4/Linux Cluster 66 dual node Myrinet Centre Resources PIONEER Athlon Cluster 22 processor 100Mb 24TB VOYAGER Microsoft/Dell Intel Cluster 32 processor Giganet 1.2TB SATURN Sun E6800 SMP 24 processors Backplane: 9.6GB/s ATLAS Compaq / Quadrics Cluster 32 processor MPI: ~5.7us & >200 MB/s 6TB AP3000 80 Sparc Ultra II APNet CONDOR POOL ~ 150 PIII processors
Graph Oriented Scheduling (2) Atlas Saturn Design Factory Analyse Scatter Viking Mesh Mesh Mesh Condor pool DRACS DRACS DRACS Gather
Analyse Factory Gather Scatter Graph Oriented Scheduling (3) Atlas Saturn Design Condor pool Viking
Schedule Benefit • Each component and communication has a benefit function • Each resource and network connection has a predicted time & cost for each component or communication that could be deployed • Fit the task graph onto the resource graph to get the maximum Total Predicted Benefit
Future Work • Develop benefit maximisation algorithms • Test schedulers • On grid simulators such as SimGrid, GridSim and MicroGrid • On grid testbeds, such as IC Testbed and the EUDG • Develop brokering methods • Define Scheduler-Broker communications
Summary • Concept of grid computing for HPC/HTC • ICENI Middleware for utilization of grids • Importance of scheduling metrics • Combining metrics • Mapping application graphs - resource graphs • Optimisation of total benefit • Need good mapping algorithms…