Distributed Process Management

Distributed Process Management Chapter 14 Chapter 14 - 28 pages

Process Migration • Move an active process from one machine to another • The process migrates to a target machine • Transferring a sufficient amount of the state of a process from one machine to another Chapter 14 - 28 pages

Why Migrate? • Load sharing • move processes from heavily loaded to lightly load systems • load can be balanced to improve overall performance • Communications performance • processes that interact intensively can be moved to the same node to reduce communications cost • may be better to move process to where the data reside when the data is large Chapter 14 - 28 pages

Why Migrate? • Availability • long-running process may need to move because the machine it is running on will be down • Utilizing special capabilities • process can take advantage of unique hardware or software capabilities Chapter 14 - 28 pages

Initiation of Migration • Operating system • when goal is load balancing • Process • when goal is to reach a particular resource Chapter 14 - 28 pages

What is Migrated? • Must destroy the process on the source system • Process control block and any links must be moved Chapter 14 - 28 pages

What is Migrated? • Eager (all):Transfer entire address space • no trace of process is left behind • if address space is large and if the process does not need most of it, then this approach my be unnecessarily expensive Chapter 14 - 28 pages

What is Migrated • Precopy: Process continues to execute on the source node while the address space is copied • pages modified on the source during precopy operation have to be copied a second time • reduces the time that a process is frozen and cannot execute during migration Chapter 14 - 28 pages

What is Migrated? • Eager (dirty): Transfer only that portion of the address space that is in main memory • any additional blocks of the virtual address space are transferred on demand • the source machine is involved throughout the life of the process • good if process is temporarily going to another machine • good for a thread since the threads left behind need the same address space Chapter 14 - 28 pages

What is Migrated? • Copy-on-reference: Pages are only brought over on reference • variation of eager (dirty) • has lowest initial cost of process migration Chapter 14 - 28 pages

What is Migrated? • Flushing: Pages are cleared from main memory by flushing dirty pages to disk • later use copy-on-reference strategy • relieves the source of holding any pages of the migrated process in main memory Chapter 14 - 28 pages

KJ KJ KJ KJ KJ Negotiation of Process Migration 1: Will you take P? Starter Starter 2: Yes, migrate to machine 3 6: MigrateIn P 5: Offer P 3: MigrateOut P A B P S D 0 1 2 3 4 4: Offer P 7: Accept offer Chapter 14 - 28 pages

Distributed Global States • Operating system cannot know the current state of all process in the distributed system • A process can only know the current state of all processes on the local system • Remote processes only know state information that is received by messages • these messages represent the state in the past Chapter 14 - 28 pages

Example • Bank account is distributed over two branches • The total amount in the account is the sum at each branch • At 3 PM the account balance is determined • Messages are sent to request the information Chapter 14 - 28 pages

SA = $100.00 l l l l l Branch A SB = $0.00 l l l l l Branch B 3:00 Total = $100.00 Example Chapter 14 - 28 pages

Example • If at the time of balance determination, the balance from branch A is in transit to branch B • The result is a false reading Chapter 14 - 28 pages

Example SA = $0.00 2:59 l l Branch A msg = “Transfer $100 to Branch B” l l Branch B SB = $0.00 3:01 3:00 Total = $0.00 Chapter 14 - 28 pages

Example • All messages in transit must be examined at time of observation • Total consists of balance at both branches and amount in message Chapter 14 - 28 pages

Example • If clocks at the two branches are not perfectly synchronized • Transfer amount at 3:01 from branch A • Amount arrives at branch B at 2:59 • At 3:00 the amount is counted twice Chapter 14 - 28 pages

Example SA = $100.00 l l Branch A 3:00 3:01 msg = “Transfer $100 to Branch B” l l Branch B 2:59 SB = $100.00 3:00 Total = $200.00 Chapter 14 - 28 pages

Example of a Snapshot Process 3 Outgoing channels 2 sent 1,2,3,4,5,6,7,8 Incoming channels 1 received 1,2,3 stored 4,5,6 2 received 1,2,3 stored 4 4 received 1,2,3 Process 1 Outgoing channels 2 sent 1,2,3,4,5,6 3 sent 1,2,3,4,5,6 Incoming channels Process 2 Outgoing channels 3 sent 1,2,3,4 4 sent 1,2,3,4 Incoming channels 1 received 1,2,3,4 stored 5,6 3 received 1,2,3,4,5,6,7,8 Process 4 Outgoing channels 3 sent 1,2,3 Incoming channels 2 received 1,2 stored 3,4 Chapter 14 - 28 pages

Ordering of Events • Events must be order to ensure mutual exclusion and avoid deadlock • Clocks are not synchronized • Communication delays • State information for a process is not up to date Chapter 14 - 28 pages

Ordering of Events • Need to consistently say that one event occurs before another event • Messages are sent when want to enter critical section and when leaving critical section • Time-stamping • orders events on a distributed system • system clock is not used Chapter 14 - 28 pages

Time-Stamping • Each system on the network maintains a counter which functions as a clock • Each site has a numerical identifier • When a message is received, the receiving system sets is counter to one more than the maximum of its current value and the incoming time-stamp (counter) Chapter 14 - 28 pages

Time-Stamping • If two messages have the same time-stamp, they are ordered by the number of their sites • For this method to work, each message is sent from one process to all other processes • ensures all sites have same ordering of messages • for mutual exclusion and deadlock all processes must be aware of the situation Chapter 14 - 28 pages

Token-Passing Algorithm if not token_present then begin clock := clock + 1; Prelude broadcast(Request, clock I); wait(access, token); token_present := True; end endif; token_held := True: <critical section> token(i) := clock; Postlude token_head := False; for j := i + 1 to n, 1 to i - 1 do if (request(j) > token(J)) [Symbol]^token_present then begin token_present := False; send(access, token(j)) end endif; when received (Request, k, j) do Notation: request(j) := max(request(j), k); send(j, access, token) send message of type access, with token, by process j if token_present[Symbol]Ynot token_held then broadcast(request, clock, i) send message from process i of type request, with <text of postlude> timestamp clock, to all other processes endif received(request, t, j) receive message from process j of type request, enddo; with timestamp t Chapter 14 - 28 pages

Distributed Deadlock Detection Algorithm {Transaction Tj receiving update message} begin if Wait_for(Tj) ¹ Wait_for(Ti) then Wait_for(Tj) := Wait_for(Ti); if Wait_for(Tj) Ç Request_Q(Tj) = nil then update(Wait_for(Ti), Request_Q(Tj)) else begin DECLARE DEADLOCK; {initiate deadlock resolution as follows} {Tj is chosen as the transaction to be aborted} {Tj releases all the data objects it holds} send clear(Tj, Held_by(Tj)); allocate each data object Di held by Tj to the first requester Tk in Request_Q(Tj); for every transaction Tn in Request_Q(Tj) requesting data object Di held by Tj do Enqueue(Tn, Request_Q(Tk)); end end. {Date object Dj receiving a lock_request(Ti)} begin if Locked_by(Dj) = nil then send granted else begin send not granted to Ti; send Locked)by(Dj) to Ti end end. {Transaction Ti makes a lock request for data object Dj} begin send lock_request(Ti) to Dj; wait for granted/not granted; if granted then begin Locked_by(Dj) := Ti; Held_by(Ti) := f end else {suppose Dj is being used by transaction Tj} begin Held_by(Ti) := Tj; Enqueue(Ti, Request_Q(Tj)); if Wait_for(Tj) = nil then Wait_for(Ti) := Tj else Wait_for(Ti) := Wait_for(Tj); update(Wait)for(Ti), Request_Q(Ti)) end end. {Transaction Tk receiving a clear(Tj, Tk) message begin purge the tuple having Tj as the requesting transaction from Request_Q(Tk) end. Chapter 14 - 28 pages

T4 T5 T4 T5 T0 T1 T2 T3 T0 T1 T2 T3 T6 T6 Example of Distributed Deadlock Algorithm Chapter 14 - 28 pages

Distributed Process Management