200 likes | 334 Views
Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration. Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research Laboratory Warren Smith, Rupak Biswas NASA Advanced Supercomputing Division NASA Ames Research Center. Motivation.
E N D
Scheduling in Heterogeneous Grid Environments:The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research Laboratory Warren Smith, Rupak Biswas NASA Advanced Supercomputing Division NASA Ames Research Center
Motivation • Geographically distributed resources • Difficult to schedule and manage efficiently • Autonomy (local scheduler) • Heterogeneity • Lack of perfect global information • Conflicting requirements between users and system administrators
Current Status • Grid Initiatives • Global Grid Forum, NASA Information Power Grid, TeraGrid, Particle Physics Data Grid, E-Grid, LHC Challenge • Grid Scheduling Services • Enabling multi-site application • Multi-Disciplinary Applications, Remote Visualization, Co-Scheduling,Distributed Data Mining, Parameter Studies • Job Migration • Improve Time-to-Solution • Avoid dependency on single resource provider • Optimize application mapping to target architecture • But what are the tradeoffs of data migration?
Our Contributions • Interaction between grid scheduler and local scheduler • Architecture: distributed, centralized, and ideal • Real workloads • Performance metrics • Job migration overhead • Superscheduler scalability • Fault tolerance • Multi-resource requirements
Grid Queue Local Queue Distributed Architecture Communication Infrastructure Info Job Job Middleware Grid Scheduler Grid Env Local Env Local Scheduler Compute Server PE PE … PE
Grid Queue Job Middleware JR Grid Scheduler AWT & CRU Local Scheduler Local Queue Sender-Initiated (S-I) Receiver-Initiated (R-I) Symmetrically-Initiated (Sy-I) Else : Considered for Migration Interaction between Grid and Local Schedulers • AWT: Approximate Wait Time • CRU: Current Resource Utilization • JR: Job Requirements If AWT < :
Sender-Initiated (S-I) Partner 1 Host Partner 2 Jobi Jobi Requirements Jobi Requirements ART0 & CRU0 ART1 & CRU1 ART2 & CRU2 Jobi Resultsi Select the machine with the smallest Approximate Response Time (ART), Break tie by CRU ART = Approx Wait Time + Estimated Run Time
Receiver-Initiated (R-I) Partner 1 Host Partner 2 Jobi Free Signal Free Signal Jobi Requirements Jobi Requirements ART0 & CRU0 ART1 & CRU1 ART2 & CRU2 Jobi Querying begins after receiving free signal
No Volunteer After Time Period Have Volunteers R-I S-I Symmetrically-Initiated (Sy-I) • First, work in R-I mode • Change to S-I mode if no machines volunteer • Switch back to R-I after job is scheduled
Web Portals Or Super Shell Jobs Grid Queue Centralized Architecture Middleware Grid Scheduler Advantages: Global View Disadvantages: Single point of failure, Scalability
Resource Configuration and Site Assignment • Each local site network has peak bandwidth of 800Mb/s (gigabit Ethernet LAN) • External network has 40Mb/s available point-to-point (high-performance WAN) • Assume all data transfers share network equally (network contention is modeled) • Assume performance linearly related to CPU speed • Assume users pre-compiled code for each of the heterogeneous platforms
Job Workloads • Systems located at Lawrence Berkeley Laboratory, NASA Ames Research Center,Lawrence Livermore Laboratory, San Diego Supercomputing Center • Data volume info not available. Assume volume is correlated to volume of work • B is number if Kbytes of each work unit (CPU * runtime) • Our best estimate is B=1Kb for each CPU second of application execution
Scheduling Policy 12 Sites Workload B • Large potential gain using grid superscheduler • Reduced average wait time by 25X compared with local scheme! • Sender-Initiated performance comparable to Centralized • Inverse between migration (FOJM,FDVM) and timing (NAWT, NART) • Very small fraction of response time spent moving data (DMOH)
Data Migration Sensitivity Sender-I 12 Sites • NAWT for 100B almost 8X than B, NART 50% higher • DMOH increases to 28% and 44% for 10B and 100B respectively • As B increases, data migration (FDVM) decreases due to increasing overhead • FOJM inconsistent because it measures # of jobs NOT data volume
Site Number Sensitivity Sender-I • 0.1B causes no site sensitivity, • 10B has noticeable effect as sites decrease from 12 to 3: • Decrease in time (NAWT, NART) due to increase in network bandwidth • Increase in fraction of data volume migrated (FDVM) • 40% Increase in fraction of response time moving data (DMOH)
Communication ObliviousScheduling Sender-I • For B10 If data migration cost is not considered in scheduling algorithm: • NART increases 14X, 40X for 12Sites, 3Sites respectively • NAWT increases 28X,43X for 12Sites, 3Sites respectively • DMOH is over 96%! (only 3% for B set) • 16% of all jobs blocked from executing waiting for data • Compared with practically 0% for communication-aware scheduling
Increased WorkloadSensitivity Sender-I12 Sites Workload B • Grid scheduling 40% more jobs, compared with non-grid local scheme: • No increase in time NAWT NART • Weighted Utilization increased from 66% to 93% • However there is fine line, when # of jobs increase by 45% • NAWT grows 3.5X, NART grows 2.4X!
Conclusions • Studied impact of data migration, simulating: • Compute servers • Grouping of serves into sites • Inter-server networks • Results showed huge benefits of grid scheduling • S-I reduced average turnaround time by 60% compared with local approach, even in the presence of input/output data migration • Algorithm can execute 40% more jobs in grid environment and deliver same turnaround times as non-grid scenario • For large data files, critical to consider migration overhead • 43X increase in NART using communication-oblivious scheduling
Future Work • Superscheduling scalability: • Resource discovery • Fault tolerance • Multi-resource requirements • Architectural heterogeneity • Practical deployment issues