Static Process Scheduling

Static Process Scheduling Yi Sun

Overview • Before execution, processes need to be scheduled and allocated with resources • Objective • Enhance overall system performance metric • Process completion time and processor utilization • In distributed systems: location and performance transparency • In distributed systems • Local scheduling (on each node) + global scheduling • Communication overhead • Effect of underlying architecture • Dynamic behavior of the system

Process Interaction Models • Precedence process model: Directed Acyclic Graph (DAG) • Represent precedence relationships between processes • Minimize total completion time of task (computation + communication) • Communication process model • Represent the need for communication between processes • Optimize the total cost of communication and computation • Disjoint process model • Processes can run independently and completed in finite time • Maximize utilization of processors and minimize turnaround time of processes

Communication overhead Process Models Partition 4 processes onto two nodes

System Performance Model Attempt to minimize the total completion time of (makespan) of a set of interacting processes

System Performance Model (Cont.) • Related parameters • OSPT: optimal sequential processing time; the best time that can be achieved on a single processor using the best sequential algorithm • CPT: concurrent processing time; the actual time achieved on a n-processor system with the concurrent algorithm and a specific scheduling method being considered • OCPTideal: optimal concurrent processing time on an ideal system; the best time that can achieved with the concurrent algorithm being considered on an ideal n-processor system(no inter-communication overhead) and scheduled by an optimal scheduling policy • Si: the ideal speedup by using a multiple processor system over the best sequential time • Sd: the degradation of the system due to actual implementation compared to an ideal system

System Performance Model (Cont.) P1 P2 P3 Pi: the computation time ofthe concurrent algorithm onnode i P4 (RP  1) P1 P3 P1 P2 P4 P2 OCPTideal P3 P4 OCPTideal

System Performance Model (Cont.) (The smaller, the better) (The larger, the better)

System Performance Model (Cont.) • RP: Relative processing (algorithm) • How much loss of speedup is due to the substitution of the best sequential algorithm by an algorithm better adapted for concurrent implementation but which may have a greater total processing need • Loss of parallelism due to algorithm conversion • Increase in total computation requirement • Sd • Degradation of parallelism due to algorithm implementation • RC: Relative concurrency (algorithm?) • How far from optimal the usage of the n-processor is • RC=1  best use of the processors • Theoretic loss of parallelism • : loss of parallelism when implemented on a real machine (system architecture + scheduling)

Efficiency Loss  Impact factors: scheduling, system, and communication

Efficiency Loss  (Cont.)

Workload Distribution • Performance can be further improved by workload distribution • Loading sharing: static workload distribution • Dispatch process to the idle processors statically upon arrival • Corresponding to processor pool model • Load balancing: dynamic workload distribution • Migrate processes dynamically from heavily loaded processors to lightly loaded processors • Corresponding to migration workstation model • Model by queuing theory: X/Y/c • Proc. arrival time distribution:X; Service time distribution:Y; # of servers: c • : arrival rate; : service rate; : migration rate • : depends on channel bandwidth, migration protocol, context and state information of the process being transferred.

Processor-Pool and Workstation Queueing Models Static Load Sharing Dynamic Load Balancing M for Markovian distribution

Comparison of Performance for Workload Sharing (Communication overhead) (Negligible Communication overhead)

Static Process Scheduling • Static process scheduling: deterministic scheduling policy • Scheduling a set of partially ordered tasks on a non-preemptive multi-processor system of identical processors to minimize the overall finishing time (makespan) • Optimize makespan  NP-complete • Need approximate or heuristic algorithms… • Attempt to balance and overlap computation and communication • Mapping processes to processors is determined before the execution • Once a process starts, it stays at the processor until completion • Need prior knowledge about process behavior (execution time, precedence relationships, communication patterns) • Scheduling decision is centralized and non-adaptive

Precedence Process and Communication System Models Communication overhead for A(P1) and E(P3)= 4 * 2 = 8 Communication overhead for one message Execution time No. of messagesto communicate

Precedence Process Model • Precedence Process Model – NP-complete • A program is represented by a DAG (Figure 5.5 (a)) • Node: task with a known execution time • Edge: weight showing message units to be transferred • Communication system model: Figure 5.5 (b) • Scheduling strategies • List Scheduling (LS): no processor remains idle if there are some tasks available that it could process (no communication overhead) • Extended List Scheduling (ELS): LS first + communication overhead • Earliest Task First (ETF) scheduling: the earliest schedulable task is scheduled first • Critical path: longest execution path • Lower bound of the makespan • Try to map all tasks in a critical path onto a single processor

Makespan Calculation for LS, ELS, and ETF

Communication Process Model • Communication process model • Maximize resource utilization and minimize inter-process communication • Undirected graph G=(V,E) • V: Processes • E: weight = amount of interaction between processes • Cost equation • e = process execution cost (cost to run process j on processor i) • C = communication cost (C==0 if i==j) • Again!!! NP-Complete

Stone’s two-processor model to achieve minimum total execution and communication cost • Example: Figure 5.7 (Don’t consider execution cost) • Partition the graph by drawing a line cutting through some edges • Result in two disjoint graphs, one for each process • Set of removed edges  cut set • Cost of cut set  sum of weights of the edges • Total inter-process communication cost between processors • Of course, the cost of cut sets is 0 if all processes are assigned to the same node • Computation constraints (no more k, distribute evenly…) • Example: Figure 5.8 (Consider execution cost) • Maximum flow and minimum cut in a commodity-flow network • Find the maximum flow from source to destination

Computation Cost and Communication Graphs

Minimum-Cost Cut Only the cuts that separate A and Bare feasible

Discussion – Static Process Scheduling • Once a process is assigned to a processor, it remain there until its execution has been completed • Need prior knowledge of execution time and communication behavior • Not realistic

Reference • Distributed Operating Systems & Algorithms, by Randy Chow and Theodore Johnson, 1997

Static Process Scheduling