Scheduling From the Perspective of the Application

Scheduling From the Perspective of the Application By Francine Berman & Richard Wolski Presenter:Kun-chan Lan

Outline of the talk • Overview • Case study • Application-centric scheduling • AppleS Project • Result • Conclusion

Overview.. • Why scheduling is important in metacomputing system • Better utilization of resource • Performance efficiency • Application-centric scheduling • Everything is evaluated in terms of its impact on the application

..Overview.. • Metacomputing • Aggregation of distributed and high-performance resources on coordinated networks, for performance required to address modern scientific problems • Heterogeneity(administrative domain, software/hardware architecture, protocol etc) • contention

Performance oriented Aggregation of resources from a single site(a mutli-processor machine) Communicate via dedicated devices like switch,share-memory etc. Homogeneous(hardware/software infrastructure, administrative domain etc) Performance oriented aggregation of resource from multiple sites Communicate via a distributed network Heterogeneous resources A software infrastructure required to coordinate distributed networks into a communication substrate Parallel computing vs. Metacomputing

Scheduling for parallel computing • Multiprocessor nodes generally have uniform capabilities • Usually there is a centralized system scheduler • Processors are dedicated to tasks of a single application -- No contention

Scheduling for Metacomputing • Resources are often managed by separate schedulers which are not coordinated – no single system scheduler • Data conversion between sides • Overlapping of communication and computation to amortize network communication • Separate optimized algorithm for tasks on different machine

Outline of the talk • Overview • Case study • Application-centric scheduling • AppleS • Result • Conclusion

Case 1: CLEO/NILE

CLEO • A high energy physics project • Each collision detected by CLEO is called an event • Each event is recorded and passed to a program called “pass2” to computer offline the physical properties of the particles • Records computed by “pass2” are read and compressed by another program for certain frequently-accessed fields • One terabyte of data being generated per year

Nile.. • A by-product of CLEO • Each CLEO’s collaborating institution is a site • Goal • provide a scalable, fault-tolerant, heterogeneous system of hundreds of commodity workstations, with access to a distributed database in excess of 100~TB • Resources(CLEO data) are spread across the United States and Canada at 24 collaborating institutions • resource can be accessed and used transparently from anywhere by any member of the CLEO collaboration

..Nile.. • Not specific to CLEO, can be used by any application that is easily parallelizable • Currently implemented in CORBA/JAVA • Three components • Nile Control System(NCS) • Data Repository • User Interface • Interconnecting networks include ATM,FDDI and Ethernet

Nile..

..Nile.. • NCS: • Site manager: • Interface between NCS and clients • Receive job requests • For each job request, create a job manager, store the job context into Job Database and place the job into queue • stateless

..Nile.. • NCS: • Job DB: • Store the state of job • Resource DB: • Maintain the state of available hardware resources at local site • Data Location Manager: • Translate logical data specification in the job profile to a set of corresponding physical data objects, which can be used to determine the suitable hosts to run the sub jobs

..Nile.. • NCS: • Job Manager • Divide a single job into a set of sub jobs which can be executed in parallel • Monitor the state of sub-jobs • Collect and assemble the results, and pass them back the site manager • Planner • Produce an execution plan consisting of a list of sub-jobs,each having a host machine and a set of data objects

Characteristics of CLEO/NILE • The quantity of data for the problem is so large that no single site can provide all the resources needed • Efficient resource allocation is crucial • Execution sites and network interconnection are heterogeneous • Some resources are shared by other application, so performance might vary greatly based on contention for resources

CASE 2: 3-D REACT • Try to predict the energy level of reaction using quantum mechanics • Simulate a hydrogen-deuterium reaction • Essentially calculating the solution to a six-dimensional Schrodinger equation, and can be decomposed into three tasks • LHSF(local hyper-spherical surface function) • Log-D(logarithmic derivative propagation): use the result of LHSF as input • ASY:an asymptotic analysis on the matrices generated during the Log-D calculation

Scheduling 3D-REACT • Distribute 3D-REACT two computation units • Cray C90 in SDSC • 64-node Intel Paragon in CalTech • The problem is divided into smaller sub-domains of 5-20 surface function per sub-domains, so LHSF and Log-D can be executed concurrently • First C90 calculate the LHSF for a given sub-domain, and then the result is passed to Paragon which will calculate the log-D portion of that sub-domain • While Paragon is calculating the first sub-domain, C90 can start calculating the second sub-domain • After all the sub-domains are considered, the ASY will determine whether the calculation should stop

Characteristics of 3D-REACT • The algorithm implemented by a task is optimized for the machine to which it has assigned • Eg. The Log-D implementation used in C90 is different than that used in Paragon • Computation and communication can be pipelined to amortize communication delays • Data might need to be converted into different format when being transferred between different sites • Eg. The floating point needed to be converted when C90 sends data to Paragon • Scheduling is critical for performance • Each of the sub-tasks (LHSF/Log-D/ASY) can be execute on either machine

Generalization of Application-Centric scheduling • Each application develop a schedule to optimize its own performance without regard to the performance goals of other applications which share the system • Each application-centric schedule for different application is unrelated • However, there are still some commonalities which underly application-centric program development

Components of Application-Centric scheduling.. • Performance criteria/metrics • Dynamic system state • Application-specific resource locality • Application performance characteristics • User preferences • Prediction

Performance criteria/metrics • Performance criteria/metrics vary with the application • Eg. to minimize execution time • 3D-REACT: by maximizing speedup over a single-machine implementation • NILE: by distributing analysis of independent events • Some common metrics • Execution time • Speedup • Cost of execution cycle • User will attempt to optimize the usages of same resource for different performance criteria at the same time

Dynamic system state • Mixture of dedicated and non-dedicated resources • Should wait until the dedicated resources become available, or • Should execute the application with lesser performance on the non-dedicated resource currently available • Requirement of dynamic assessment of • Current system state • Resource loads • Short-term, but accurate prediction

taskX taskY X Y Application-specific Resource Locality • Applications seek to use “close” resources? • “Closeness” is a function of what the application requires from a resource as well as the resource’s capability • “Distance” of resources: the resource performance deliverable to application • Is X and Y close?

Application Characteristics • Implementation-dependent and implementation-independent • Some common categories of attributes • Task-specific implementation characteristics • Computation paradigm,number/size of data structure, data communication pattern, memory requirement, etc. • Inter-task communication characteristics • Data format for each task,pipeline size,communication regularity and frequency, etc. • Application structure information • Input/output requirement,iteration pattern, etc.

User Preferences • Not necessary directly related to application performance • Act as a filter over the possible resources and implementation available to the user

Role of Prediction • Prediction tells you • Potential communication and computation behavior of the application • Potential availability and load of resource • Potential performance of the application with respect to candidate schedules • Sources of prediction • App-specific or app-independent benchmark • Statistical analysis • Sensed or sampled data • Analytical model

Process of scheduling an application • Use user preference to filter out infeasible schedules • Use application-specific and dynamic information to develop an schedule • Use individual notion of performance and resource locality to evaluate the schedule • Predict the performance of candidate schedules • Compare and determine the “best schedule” that can be implemented on the available resources

AppleS(Application-level Scheduler) • Each application will have its own AppleS agent(a customized scheduler for each application) • What does AppleS do? • Select resources • Determine a performance-efficient schedule • Implement that schedule with respect to the appropriate resource management system • AppleS is NOT a resource management system: it rely on systems such as Globus,Legion

Organization of an AppleS agent

components of AppleS • Resource Selector: • choose and filter different resource combination • Planner • Generate a description of a resource-dependent schedule from a given resource combination • Performance estimator • Generate an estimate for candidate schedules according to the user’s performance metric • Coordinator • Choose the “best” schedule • Actuator • Implement the “best” schedule on the target resource management system

Input of AppleS: Information Pool • Network Weather Service • Dynamic information of system state and forecast of resource load • Heterogeneous Application Template(HAT) • information for the structure, characteristics and implementation of application and its tasks • Model • Used for performance estimation, planning and resource selection • User Specification(US) • Information on user’s criteria for performance, execution constraint, preference for implementation, etc

Using AppleS • User provide information to AppleS via HAT and US • Coordinator uses this information to filter out infeasible/possibly-bad schedules • Resource selector identify promising sets of resource, and prioritize them based the logical “distance” between resources • Planner computes a potential schedule for each viable resource configuration • Performance estimator evaluates each schedule in terms of the user’s performance objective • Coordinator chooses the best schedule and then implements it with Actuator

Using AppleSExample: 3D-REACT • Assuming implementations of LHSF and Log-D are available for several architectures • HAT: specify the computation-to-communication ratios for LHSF and Log-D, degree of overlap that is possible between the two, etc. for each implementation • Resource selector determine viable pairs of resources • Planner identify a set of candidate schedules • Performance estimator calculate the transfer unit size between LHSF and Log-D for each candidate schedule • Coordinator sends the best schedule to the Actuator

Jacobi2D code.. • a distributed data-parallel two dimensional Jacobi iterative solver • commonly used to solve the finite-difference approximation to Poisson's equation • Variable coefficients are represented as elements of a two-dimensional grid • At each iteration, the new value of each grid element is defined to be the average of its four nearest neighbors during the previous iteration

..Jacobi2D code • Typically, the Jacobi computation is parallelized by partitioning the grid into rectangular regions, and then assigning each region to a different processor • Parallelism vs. communication overhead P0 is twice as fast as processor P1 or P2

RS600 FDDI Alpha workstation

Three partition methods • HPF Uniform/Blocked • each processor is assigned (at compile-time) a relatively equal-sized square region of the grid to compute • Non-Uniform Strip • uses good static estimates for resource performance and uses resource selection to select a resource set from the total resources • AppleS

Memory availability • Adding two IBM SP-2 node with 128M memory into resource pool • dedicated access to the two SP-2 nodes and the link between them • the best partitioning is to split the grid evenly between the two SP-2 nodes as long as neither partition exceeded the available real memory on each node

A lot of page swapping

Conclusion • Performance-efficient schedule must exploit the concurrency of independent application task as well as factor in the impact of resource contention/diversity/autonomy • AppleS: http://apples.ucsd.edu/, still a working-in-progress • Related work: MARS: http://www.uni-paderborn.de/pc2/projects/mol/mars.htm • CLEO: http://www.lns.cornell.edu/public/CLEO/ • 3D-REACT: http://www.cacr.caltech.edu/Publications/techpubs/CASA/cacr123/web4.htm

Scheduling From the Perspective of the Application

Scheduling From the Perspective of the Application

Presentation Transcript

Mycology from the perspective of the Clinician

Encouragement from the Perspective of

THE PERSPECTIVE FROM THE FRONT LINES

Femtocells : From the Perspective of Development

Demystifying the Cloud: An Application Perspective

The View from the Agency Perspective

Fiscal Perspective from the Office of the Superintendent

From the customer’s perspective

ParFUM from an Application Perspective

Green Business from the perspective of

Priority Scheduling: An Application for the Permutahedron

The Crusades from the Perspective of

MAGIC Seen from the Perspective of RAGS

From the Nurses Perspective

The University Application: A Parent’s Perspective

Parallel File Systems from the Application Programmer Perspective (Part 1)

Quality of Employment from the French perspective

The Ishikawa Diagram From the Perspective of the Fish

ADASS the Planning and Scheduling Perspective

Encouragement from the Perspective of

Concept of Money From The Islamic Perspective

Priority Scheduling: An Application for the Permutahedron