Distributed Operating Systems CS551

Distributed Operating SystemsCS551 Colorado State University at Lockheed-Martin Lecture 6 -- Spring 2001

CS551: Lecture 6 • Topics • Distributed Process Management (Chapter 7) • Distributed Scheduling Algorithm Choices • Scheduling Algorithm Approaches • Coordinator Elections • Orphan Processes • Distributed File Systems (Chapter 8) • Distributed Name Service • Distributed File Service • Distributed Directory Service CS-551, Lecture 6

Distributed Deadlock Prevention • Assign each process a global timestamp when it starts • No two processes should have same timestamp • Basic idea:“When one process is about to block waiting for a resource that another process is using, a check is made to see which has a larger timestamp (i.e. is younger).”Tanenbaum, DOS (1995) CS-551, Lecture 6

Distributed Deadlock Prevention • Somehow put timestamps on each process, representing creation time of process • Suppose a process needs a resource already owned by another process • Determine relative ages of both processes • Decide if waiting process should Preempt, Wait, Die, or Wound owning process • Two different algorithms CS-551, Lecture 6

Distributed Deadlock Prevention • Allow wait only if waiting process is older • Since timestamps increase in any chain of waiting processes, cycles are impossible • Or allow wait only if waiting process is younger • Here timestamps decrease in any chain of waiting process, so cycles are again impossible • Wiser to give older processes priority CS-551, Lecture 6

Example: wait-die algorithm Wants resource Holds resource 54 79 Waits Wants resource Holds resource 79 54 Dies CS-551, Lecture 6

Example: wound-wait algorithm Wants resource Holds resource 54 79 Preempts Wants resource Holds resource 79 54 Waits CS-551, Lecture 6

Algorithm Comparison • Wait-die kills young process • When young process restarts and requests resource again, it is killed once more • Less efficient of these two algorithms • Wound-wait preempts young process • When young process re-requests resource, it has to wait for older process to finish • Better of the two algorithms CS-551, Lecture 6

Figure 7.7 The Bully Algorithm.(Galli, p. 169) CS-551, Lecture 6

Process Management in a Distributed Environment • Processes in a Uniprocessor • Processes in a Multiprocessor • Processes in a Distributed System • Why need to schedule • Scheduling priorities • How to schedule • Scheduling algorithms CS-551, Lecture 6

Distributed Scheduling • Basically resource management • Want to distribute processing load among the processing elements in order to maximize performance • Consider having several homogeneous processing elements on a LAN with equal average workloads • Workload may still not be evenly distributed • Some PEs may have idle cycles CS-551, Lecture 6

Efficiency Metrics • Communication cost • Low if very little or no communication required • Low if all communicating processes • on same PE • not distant (small number of hops) • Execution cost • Relative speed of PE • Relative location of needed resources • Type of • operating system • machine code • architecture CS-551, Lecture 6

Efficiency Metrics, continued • Resource Utilization • May be based upon • Current PE loads • Load status state • Resource queue lengths • Memory usage • Other resource availability CS-551, Lecture 6

Level of Scheduling • When to run process locally or to send it to an idle PE? • Local Scheduling • Allocate process to local PE • Review Galli, Chapter 2, for more information • Global Scheduling • Choose which PE executes which process • Also called process allocation • Precedes local scheduling decision CS-551, Lecture 6

Figure 7.1 Scheduling Decision Chart.(Galli,p.152) CS-551, Lecture 6

Distribution Goals • Load Balancing • Tries to maintain an equal load throughout system • Load Sharing • Simpler • Tries to prevent any PE from becoming too busy CS-551, Lecture 6

Load Balancing / Load Sharing • Load Balancing • Try to equalize loads at PEs • Requires more information • More overhead • Load Sharing • Avoid having an idle PE if there is work to do • Anticipating Transfers • Avoid PE idle wait while a task is coming • Get a new task just before PE becomes idle CS-551, Lecture 6

Figure 7.2 Load Distribution Goals.(Galli,p.153) CS-551, Lecture 6

Processor Allocation Algorithms • Assume virtually identical PEs • Assume PEs fully interconnected • Assume processes may spawn children • Two strategies • Non-migratory • static binding • non-preemptive • Migratory • dynamic binding • preemptive CS-551, Lecture 6

Processor Allocation Strategies Non-migratory (static binding, non-preemptive) • Transfer before process starts execution • Once assigned to a machine, process stays there Migratory (dynamic binding, preemptive) • Processes may move after execution begins • Better load balancing • Expensive: must collect and move entire state • More complex algorithms CS-551, Lecture 6

Efficiency Goals • Optimal • Completion time • Resource Utilization • System Throughput • Any combination thereof • Suboptimal • Suboptimal Approximate • Suboptimal Heuristic CS-551, Lecture 6

Optimal Scheduling Algorithms • Requires state of all competing processes • Scheduler must have access to all related information • Optimization is a hard problem • Usually NP-Hard for multiple processors • Thus, consider • Suboptimal Approximate solutions • Suboptimal Heuristic solutions CS-551, Lecture 6

SubOptimal Approximate Solutions • Similar to Optimal Scheduling algorithms • Try to find good solutions, not perfect solutions • Searches are limited • Include intelligent shortcuts CS-551, Lecture 6

SubOptimal Heuristic Solutions • Heuristics • Employ rules-of-thumb • Employ intuition • May not be provable • Generally considered to work in an acceptable manner • Examples: • If PE has heavy load, don’t give it more to do • Locality of reference for related processes, data CS-551, Lecture 6

Figure 7.1 Scheduling Decision Chart.(Galli,p.152) CS-551, Lecture 6

Types of Load Distribution Algs • Static • Decisions are hard-wired in • Dynamic • Use static information to make decisions • Overhead of keeping track of information • Adaptive • A type of dynamic algorithm • May work differently at different loads CS-551, Lecture 6

Load Distribution Algorithm Issues • Transfer Policy • Selection Policy • Location Policy • Information Policy • Stability • Sender-initiated versus Receiver-Initiated • Symmetrically-Initiated • Adaptive Algorithms CS-551, Lecture 6

Load Dist. Algs. Issues, cont. • Transfer Policy • When it is appropriate to move a task? • If load at sending PE > threshold • If load at receiving PE < threshold • Location Policy • Find a receiver PE • Methods: • Broadcast messages • Polling: random, neighbors, recent candidates CS-551, Lecture 6

Load Dist. Algs. Issues, cont. • Selection Policy • Which task should migrate? • Simple • Select new tasks • Non-Preemptive • Criteria • Cost of transfer • should be covered by reduction in response time • Size of task • Number of dependent system calls (use local PE) CS-551, Lecture 6

Load Dist. Algs. Issues, cont. • Information Policy • What information should be collected? • When? From whom? By whom? • Demand-driven • Get info when PE becomes sender or receiver • Sender-initiated – senders look for receivers • Receiver-initiated – receivers look for senders • Symmetrically-initiated – either of above • Periodic – at fixed time intervals, not adaptive • State-change-driven • Send info about node state (rather than solicit) CS-551, Lecture 6

Load Dist. Algs. Issues, cont. • Stability • Queuing Theoretic • Stable: Sum(arrival load + overhead) < capacity • Effective: Using the algorithm gives better performance than not doing load distribution • An effective algorithm cannot be unstable • A stable algorithm can be ineffective (overhead) • Algorithmic Stability • E.g. Performing overhead operations, but making no forward progress • E.g. moving a task from PE to PE, only to learn that it increases the PE workload enough that it needs to be transferred again CS-551, Lecture 6

Load Dist Algs Issues, concluded • Stability • Queuing Theoretic • Stable: Sum(arrival load + overhead) < capacity • Effective: Using the algorithm gives better performance than not doing load distribution • An effective algorithm cannot be unstable • A stable algorithm can be ineffective (overhead) • Algorithmic Stability • E.g. Performing overhead operations, but making no forward progress • E.g. moving a task from PE to PE, only to learn that it increases the PE workload enough that it needs to be transferred again CS-551, Lecture 6

Load Dist Algs: Sender-Initiated • Sender PE thinks it is overloaded • Transfer Policy • Threshold (T) based on PE CPU queue length (QL) • Sender: QL > T • Receiver: QL < T • Selection Policy • Non-preemptive • Allows only new tasks • Long-lived tasks makes this policy worthwhile CS-551, Lecture 6

Load Dist Algs: Sender-Initiated • Location (3 different policies) • Random • Select a receiver at random • Useless or wasted if destination is loaded • Want to avoid transferring the same task from PE to PE to PE • Include limit on number of transfers • Threshold • Start polling PEs at random • If ‘receiver’ found, send task to it • Limit search to ‘Poll-limit’ • If limit hit, keep task on current PE CS-551, Lecture 6

LDAs: Sender-Initiated • Location (3 different policies, cont.) • Shortest • Poll a random set of PEs • Choose PE with shortest queue length • Only a little better than Threshold Location Policy • Not worth the additional work CS-551, Lecture 6

LDAs: Sender-Initiated • Information Policy • Demand-driven • After identifying a sender • Stability • At high load, PE might not find a receiver • Polling will be wasted • Polling increases the load on the system • Could lead to instability CS-551, Lecture 6

LDAs: Receiver-Initiated • Receiver is trying to find work • Transfer Policy • If local QL < T, try to find a sender • Selection Policy • Non-preemptive • But there may not be any • Worth the effort CS-551, Lecture 6

LDAs: Receiver-Initiated • Location Policy • Select PE at random • If taking a task does not move that PE’s load below threshold, take it • If no luck after trying the Poll Limit times, • Wait until another task completed • Wait another time period • Information Policy • Demand-driven CS-551, Lecture 6

LDAs: Receiver-Initiated • Stability • Tends to be stable • At high load, a sender should be found • Problem • Transfers tend to be preemptive • Tasks on sender node have already started CS-551, Lecture 6

LDAs: Symmetrically-Initiated • Both senders and receivers can search for tasks to transfer • Has both advantages and disadvantages of two previous methods • Above average algorithm • Try to keep load at each PE at acceptable level • Aiming for exact average can cause thrashing CS-551, Lecture 6

LDAs: Symmetrically-Initiated • Transfer Policy • Each PE • Estimates the average load • Sets both an upper and a lower threshold • Equal distance from any estimate • If load > upper, PE acts as a sender • If load < lower, PE acts as a receiver CS-551, Lecture 6

LDAs: Symmetrically-Initiated • Location Policy • Sender-initiated • Sender broadcasts a TooHigh message, sets timeout • Receiver sends Accept message, clears timeout, increases Load value, sets timeout • If sender still wants to send when Accept message comes, sends task • If sender gets TooLow message before Accept, sends task • If sender has TooHigh timeout with no Accept • Average estimate is too low • Broadcasts ChangeAvg message to all PEs CS-551, Lecture 6

LDAs: Symmetrically-Initiated • Location Policy • Receiver-initiated • Receiver sends TooLow message, sets timeout • Rest is converse of sender-initiated algorithm • Selection Policy • Use a reasonable policy • Non-preemptive, if possible • Low cost CS-551, Lecture 6

LDAs: Symmetrically-Initiated • Information Policy • Demand-driven • Determined at each PE • Low overhead CS-551, Lecture 6

LDAs: Adaptive • Stable: Symmetrically-Initiated • Previous instability was due to too much polling by the sender • Each PE keeps lists of the other Pes sorted into three categories • Sender overloaded • Receiver overloaded • Okay • Each PE has all other Pes receiver list at start CS-551, Lecture 6

LDAs: Adaptive • Transfer Policy • Based on PE CPU queue length • Low threshold (LT) and high threshold (HT) • Selection Policy • Sender-initiated: only sends new tasks • Receiver-initiated: takes any task • Trying for low cost • Information Policy • Demand-driven – maintains lists CS-551, Lecture 6

LDAs: Adaptive • Location Policy • Receiver-initiated • Order of polling • Sender’s list – head to tail (new info first) • OK list – tail to head (out-of-date first) • Receiver list (tail to head) • When PE becomes receiver, QL < LT • Starts polling • If it finds a sender, transfer happens • Else use replies to update lists • Continues until • It finds a sender • It is no longer a receiver • It hits the Poll Limit CS-551, Lecture 6

LDAs: Adaptive • Notes • At high loads, activity is sender-initiated, but there sender will soon have an empty receiver list  no polling • So it will go to receiver-initiated • At low loads, receiver-initiated  failure • But overhead doesn’t matter at low load • And lists get updated • So sender-initiated should work quickly CS-551, Lecture 6

Load Scheduling Algorithms (Galli) • Usage Points • Charged for using remote PEs, resources • Graph Theory • Minimum cutset of assignment graph • Maximum flow of graph • Probes • Messages to locate available, appropriate PEs • Scheduling Queues • Stochastic Learning CS-551, Lecture 6

Figure 7.3 Usage Points.(Galli,p.158) CS-551, Lecture 6

Distributed Operating Systems CS551

Distributed Operating Systems CS551

Presentation Transcript

Distributed Operating Systems

Distributed Operating Systems - Introduction

Distributed Operating Systems

Distributed Operating Systems

Distributed (Operating) Systems -Architectures-

Distributed (Operating) Systems -Architectures-

Distributed Operating Systems CS551

Distributed (Operating) Systems -Introduction-

Distributed Operating Systems

Distributed (Operating) Systems -Communication in Distributed Systems-

Chapter 14: Distributed Operating Systems

Distributed Operating Systems - Introduction

Distributed Operating Systems - Introduction

Distributed Operating Systems

Distributed Operating Systems

Distributed Operating Systems

CS551 Distributed Operating Systems

Distributed Operating Systems - Introduction

Distributed Operating Systems - Introduction

Distributed Operating Systems