Remote Reference Counting Distributed Garbage Collection with Low Communication and Computation Overhead. www.cs.technion.ac.il/~assaf/publications/gc.ps.

  1. Remote Reference CountingDistributed Garbage Collection with Low Communication and Computation Overhead www.cs.technion.ac.il/~assaf/publications/gc.ps

  2. Distributed Systems • Consist of nodes: • Lowest level: local address space • Next level: disk partition, processor • Top level: local net • Interaction through message passing • Failures: • Due to hardware or software problems • Disconnection: due to network overload, reboot...

  3. Distributed GC • Motivations: • Transparent object management • Storage management is complex - not to be handled by users • Goals: • Efficiency • Scalability • Fault tolerance

  4. Distributed GC • The main problem: • A section of GC code running on one node must verify that no other node needs an object before collecting it • Result: • Many modules must cooperate closely, leading to a tight binding between supposedly independent modules

  5. Distributed GC • Problems with simple approaches: • Determining the status of a remote node is costly • Asynchronous systems  inconsistent data • Failures

  6. Remote References • Terminology: • Owner - node which contains the object • Client - node which has a reference to the object • Creation: • A reference to an object crosses node boundaries • Side effect of message passing • Duplication: • Client of a remote object sends to a receiver node a reference to that object

  7. Naive Reference Counting • Keep a reference count for each object • Upon duplication or creation, inform the owner to update the counter, by sending him a control message • Problems: • Increases communication overhead • Loss or duplication of messages • Race between decrement/increment messages

  8. &V +1 -1 Race Conditions in Naive Reference Counting:Decrement/Increment RA RC X U RB V Counterv = 1

  9. &V +1 -1 Race Conditions in Naive Reference Counting:Increment/Decrement RA RC X U RB V Counterv = 1

  10. &V +1 ack 2 Avoiding Race by Acknowledge Messages RA RC X U RB V Counterv = 1

  11. Weighted Reference Counting • Each object referenced has a partial weight and a total weight • Object creation: • total weight = partial weight = even value > 0 RB V Total = 64 Partial = 64

  12. &V/16 16 Partialv = 16 Weighted Reference Counting:Reference Duplication partial weight halved and sent with the reference RA RC X U Partialv = 32 RB V Totalv = 64 Partialv = 32

  13. -16 48 Weighted Reference Counting:Reference Deletion • partial weight sent to owner and subtracted from total weight RA RC X U Partialv = 16 Partialv = 16 RB V Totalv = 64 Partialv = 32

  14. Weighted Reference Counting • Invariant: total weightv =  partial weightv • When total weight = partial weight there are no remote references • Advantage: Eliminates increment messages, and therefore race conditions

  15. Weighted Reference Counting • Shortcomings: • Weight underflow • Possible solutions: • Use partial weights which are powers of 2, keep only the exponent • [Yu-Cox] “Stop the world”, last resort global trace • Not resilient to message loss or duplication: • Loss may cause garbage objects to remain uncollected • Duplication may cause an object to be prematurely collected

  16. Indirect Reference Counting • Stub contains strong and weak locators • Strong: refers to a scion in the sender node; used only for distributed GC • Weak: refers to the node where target object is located; used to invoke target object in a single hop • Duplication performed locally without informing the owner node • The weak reference is sent along with the message containing the reference

  17. strong locator &scionB, &scionA 1 VA weak locator Indirect Reference Duplication RA RC X U VA stub RB scion 1 V

  18. strong locator 1 VA weak locator Indirect Reference Deletion RA RC X U VA stub RB scion 1 V

  19. -1 1 Indirect Reference Deletion RA RC X U VA stub RB scion 1 V

  20. Indirect Reference Deletion RA RC X U VA stub RB scion 1 V

  21. Indirect Reference Counting • Advantages: • Unlimited number of duplications • Access to object in one hop through weak locator • Disadvantages: • Not resilient to message failures • Messages are sent whenever an object is deleted

  22. XB Cx stub scion Reference Listing • The object’s owner allocates a table of outgoing pointers (scions), one for each client that owns a reference to the object • Client nodes hold tables of incoming pointers (stubs) RA Z XB RC RB Ax X Y object

  23. Sent delete X/1 XB Cx Sent &X/2 stub scion Use of Timestamps RC RB X Y object Sent &X/1 Ignored Received delete X/1

  24. Reference Listing • Advantages: • Resilience to message duplication when timestamps are used • Resilience to node failure: Owner can prompt client to send a live/delete message • Owner may explicitly query about a reference that is suspected to be part of a distributed garbage cycle • Owner can decide whether to keep objects referred to by a crashed client node until it recovers or not • Disadvantages: • Memory overhead • Doesn’t collect cycles of garbage

  25. Remote Reference Counting • Advantages: • Depends only on the number of nodes in the system • Independent of pointer operations • Independent of heap size • Messages are sent only during GC, when the chance of collecting an object is very high • Independent of consistency protocols and global order of operations

  26. Remote Reference Counting • Disadvantages: • Doesn’t collect cycles of garbage • Dependent on the number of nodes in the system

  27. The System Model • Communication through a reliable asynchronous message-passing system • Messages are never lost, duplicated or altered • Messages can be delayed or arrive out of order • Processors can share objects • Objects can be replicated

  28. Local and Remote Counters • Local and remote counters are attached to every shared object • Locali(X) • Increased by m when node i receives a message containing m pointers to X • Otherwise maintained as in traditional reference counting • When Locali(X) = 0, i is clean - has no references to X

  29. Local and Remote Counters • Remotei(X) • Increased by m when some object Y containing m pointers to X is sent from node i • Decreased by m when some object Y containing m pointers to X is received at node i • The sum of Remotei(X) is the number of pointers to X in transit in the system

  30. The Algorithm - Layout • Build a spanning tree covering all the nodes • Collection of object X: • The root send signals to all its children • Inner nodes pass the signal down • When a leaf is clean it sends up a token • An inner node sends up a token when it received tokens from all its children and is clean • When the root received tokens from all its children it checks a condition C: • If C = true X is garbage • Otherwise - another wave begins

  31. Signals 0 a node with local(x) = a The Algorithm

  32. 0 1 Tokens 0 0 0 1 0 0 0 0 0 a node with local(x) = a The Algorithm

  33. Tokens The Algorithm R = R0  all the nodes outside S are clean 0 1 S 0 0 0 1 0 0 0 0 0 a node in S - hasn’t sent a token

  34. Example: R0 falsification 0 1 S Y:=Z j 0 0 1 0 0 0 0 0 a node in S - hasn’t sent a token

  35. Z Example: R0 falsification Locali(x) = 1 Remotei(x) = 1 0 i XZ 1 Localj(x) = 2 Remotej(x) = -1 S Y:=Z j 0 0 1 0 0 0 0 0 a node in S - hasn’t sent a token

  36. The Algorithm • Use the remote counter to count pointers sent and received • idefinition: • for a node i outside S, i is the value held at remotei(X) when i sent its token • for a node i in S, i is the value held at remotei(X) •  = i • fin =  at the end of the wave

  37. The Algorithm • A leaf sends in the token the value of its remote counter • An inner node sends up the sums of its remote counter and those of its descendants • R1   > 0 • R = R0  R1

  38. Example (cont.) Locali(x) = 1 Remotei(x) = 1 0 i XZ 1 Localj(x) = 2 Remotej(x) = -1 S Y:=Z j 0 0 1 0 0 0 0 0  = 1  R1 is true

  39. Example: R1 Falsification Locali(x) = 1 Remotei(x) = 1 0 k i XZ W:=Y Localj(x) = 2 Remotej(x) = -1 S Y:=Z j 0 0 1 0 0 0 0 0

  40. Y Example: R1 Falsification Locali(x) = 1 Remotei(x) = 1 0 Localk(x) = 2 Remotek(x) = -1 k i XZ W:=Y Localj(x) = 2 Remotej(x) = 0 S Y:=Z j 0 0 1 0 0 0 0 0  = 0  R1 is false

  41. The Algorithm • Detect if  may have decreased due to a node in S: • Initially paint all nodes in white • A node that decreases remote(X) turns black • R2  at least one node in S is black • R = R0  R1  R2

  42. Y Example: R2 Falsification Locali(x) = 1 Remotei(x) = 1 0 k Localk(x) = 2 Remotek(x) = -1 i XZ W:=Y Localj(x) = 2 Remotej(x) = 0 S Y:=Z j 0 0 1 0 0 0 0 0

  43. Example: R2 Falsification Locali(x) = 1 Remotei(x) = 1 0 k Localk(x) = 2 Remotek(x) = -1 i XZ W:=Y Localj(x) = 2 Remotej(x) = 0 S Y:=Z j 0 0 1 0 0 0 0 0

  44. Token Example: R2 Falsification Locali(x) = 1 Remotei(x) = 1 0 k Localk(x) = 0 Remotek(x) = -1 i XZ Localj(x) = 2 Remotej(x) = 0 S Y:=Z j 0 0 1 0 0 0 0 0 No node is S is black  R2 is false

  45. The Algorithm • Propagate the color information: • A node that is black or has received a black token transmits a black token • Otherwise, transmits a white token • A node that transmits a black token becomes white • R3  some node in S has a black token • R = R0  R1  R2  R3

  46. Token Example (cont.) Locali(x) = 1 Remotei(x) = 1 0 k Localk(x) = 0 Remotek(x) = -1 i XZ Localj(x) = 2 Remotej(x) = 0 S Y:=Z j 0 0 1 0 0 0 0 0

  47. TheAlgorithm • C = [S = {root}  root is white and localroot(X) = 0  all tokens at the root are white  fin = 0] • Once the root received tokens from all its children and localroot(x) = 0 it checks C: • C = true  object X is garbage • Otherwise - the root becomes white and initiates another wave

  48. Correctness Proof • Layout: • Show that R = (R0  R1  R2  R3) is invariant • C = true  (R1  R2  R3) = false  R0 = true  object X is garbage

  49. R0R1R2R3 is invariant • Assume by negation R is false • Look at the wave in which R first became false: • R = false  R0 = false  some node outside S was dirty • i = the first node outside S to become dirty • Case 1: R became false before i first became dirty • Implies that some node became dirty before i - impossible by definition of i

  50. R0R1R2R3 is invariant • Case 2: R became false after i first became dirty • i received a message containing a pointer to X after sending its token • case 2.1: the message was sent in a previous wave • More pointers sent than received g > 0 at the beginning of the wave • If g doesn’t decrease  R1 = true • Otherwise: some node becomes black  R2  R3 = true

