1 / 33

Chapter 7: roadmap

Chapter 7: roadmap. 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes and Repair. Introduction. We present a scheme that can be used to correct the state of algorithms for ongoing long-lived tasks.

nicole
Download Presentation

Chapter 7: roadmap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 7: roadmap 7.1 Super stabilization 7.2 Self-Stabilizing Fault-Containing Algorithms 7.3 Error-Detection Codes and Repair Chapter 7 - Local Stabilization

  2. Introduction We present a scheme that can be used to correct the state of algorithms for ongoing long-lived tasks. Converting non-stabilizing algorithms for such tasks to self-stabilizing algorithm for the same task. Chapter 7 - Local Stabilization

  3. The Malicious Fault Model Starting from a safe configuration c, after which k processors experience transient fault - a new configuration c’ is reached. The states of the faulty processors can be chosen as the states that result in the longest convergence time. Chapter 7 - Local Stabilization

  4. The Malicious Fault Model (2) • This worst case measure minimize the convergence time in the worst case scenario • However, algorithms designed with the worst case measure may have largeraverage convergence timethan other algorithms Chapter 7 - Local Stabilization

  5. The Non-malicious Fault Model • In this model, a transient fault assigns a state to a processor, that is chosen with equal probability from the state space of the processor Chapter 7 - Local Stabilization

  6. Average Convergence Time • Pr (c, k, c’) : The probability of reaching a particular configuration c’ from a safe configuration c due to the occurrence of k faults • WorstCase(c) : The maximal number of cycles before the system reaches a safe configuration when it starts in c Chapter 7 - Local Stabilization

  7. Average Convergence Time (2) • The average convergence time following the occurrence of k non-malicious transient faults is: Σ [pr(c, k, c’) · WorstCase(c’)]Computed over all possible configurations c’ Chapter 7 - Local Stabilization

  8. Error Detection Codes • We use error-detection codes to reduce average convergence time • For each processor we maintain a variableErrorDetectholding the error-detection codeed, of its current states • The error-detecting function computes a pair <s, ed> given s Chapter 7 - Local Stabilization

  9. Converting the Algorithm Replace every step a by a step a’ that does the following: Examines whether the value of ErrorDetect fits the current state If (1) holds, execute a Otherwise, execute a special repair step a’’ Compute the new ed’ by using the error-detecting function on the resulting state s’ Chapter 7 - Local Stabilization

  10. Converting the Algorithm (2) • A transient fault can corrupt all the memory bits of a processor • Thus, the probability that the value of ErrorDetect will fit the state of the faulty processor, decreases as the number of bits in ErrorDetect increases Chapter 7 - Local Stabilization

  11. Pyramids A pyramid ∆i = vi[0], vi[1], vi[2],…, vi[d]of views is maintained by every processor Pi , where vi[h] is a view of all the processors that are within a distance of no more than hfrom Pi, h times units ago. In particular, vi[d] is a view of the entire system, d time units ago. Chapter 7 - Local Stabilization

  12. V1[0] : View of V1 Now. V1 Chapter 7 - Local Stabilization

  13. V1[1] : View of colored vertices, one time unit ago. V1 Chapter 7 - Local Stabilization

  14. V1[2] : View of colored vertices, two time units ago. V1 Chapter 7 - Local Stabilization

  15. V1[3] : View of colored vertices, three time units ago. V1 Chapter 7 - Local Stabilization

  16. V1[4] : View of the entire system, four time units ago. V1 Chapter 7 - Local Stabilization

  17. V1[5] and V1[6] are views of the entire system as well, the difference is only in the time these views were taken. V1 Chapter 7 - Local Stabilization

  18. Neighboring Pyramids • Neighboring processors exchange pyramids between themselves, and checkagreementon the shared portions • If shared portions are equal, then all the v[d] views are equal In addition, every processor checks that vi[d] is aconsistent configuration for the input algorithm AL and the current task (the configuration is reachable from the initial state of AL) Chapter 7 - Local Stabilization

  19. Checking Consistent Configuration • Pichecks that its state in the view vi[h] , for 0 ≤ h ≤ d-1, is obtained by executing AL using the state of Pi and its neighbors in vi[h+1] . Chapter 7 - Local Stabilization

  20. Updating the Pyramids • In every time unit, Pi receives the pyramid ∆j = vj[0], vj[1], vj[2],…, vj[d] of every neighbor, and uses the values of vj[d-1]to construct the value of the new vi[d] • The values of vj[d-1] contain information about every processor at distance d from Pi, d-1 time units ago In the same way, Pi uses the received values of vj[k-1], for 0 ≤ k ≤ d-1, (together with vi[k-1] ) to compute vi[k] Chapter 7 - Local Stabilization

  21. The Repair Scheme • First, we will assume that the error detection code, identifies all the faults In general, the faulty processors initialize their states, and collect state information from non-faulty processors to reconstruct their pyramids Chapter 7 - Local Stabilization

  22. The Repair Scheme(2) • Let c’ be a configuration reached after several faults • Three groups of processors: Faulty,Border-non-faulty, Operating. • A Process which identifies an error, assigns faultyto its local status variable, and resets its pyramid Chapter 7 - Local Stabilization

  23. Border-Non-Faulty and Operating • The pyramid of a non-faulty processor that is neighbor to a faulty processor has almost all the information stored in the faulty processor before the fault. • Such process assigns its local status variable the value border-non-faulty. • The rest non-faulty processors are defined operating. Chapter 7 - Local Stabilization

  24. Faulty Border-non-faulty Operating Chapter 7 - Local Stabilization

  25. Freezing the Pyramids • A border-non-faulty processor does not change its pyramid until all the faulty processors finished reconstructing theirs • The Topology Collectionprocedure is used to verify that. Chapter 7 - Local Stabilization

  26. Topology Collection • Every faulty and border-non-faulty processors send their topology known at that moment to their neighbors • After several rounds (the diameter of the corrupted region + 1), all the information in the pyramids of processors next to a faulty one has arrived Chapter 7 - Local Stabilization

  27. Topology Collection (2) • Every processor checks if there exists a faulty processor which has an edge connected to a processor with an unknown state • When this test returns false, the processor pyramids can be reconstructed Chapter 7 - Local Stabilization

  28. Reconstruction • The faulty processors reconstruct their pyramids using the collected information from the other pyramids and the transition functions of the processors Chapter 7 - Local Stabilization

  29. Back to Operating • Using a local counter, and the collected topology, the faultyand border-non-faultyprocessors conclude when the rest have finished reconstructing their pyramids • At the end of the repair process, all the processors change their status to operating Chapter 7 - Local Stabilization

  30. The algorithm State variables: • Status = {operating, faulty, border non faulty} • Topology = {V , E} • Pyramid (Explained before) • Round Counter – counts the number of rounds since the occurrence of the recent fault. Chapter 7 - Local Stabilization

  31. The algorithm (cont.) Detects if a transient error occurred Error Detection Codes Upon a clock tick: • If (status = operating) 1.1 if (DetectError()) 1.1.1 status = faulty 1.1.2 Pyramid = nil 1.1.3 RoundCounter = 0 1.2 else if (HaveFaultyNeighbor()) 1.2.1 status = Border non faulty 1.2.2 RoundCounter = 0 1.3 else UpdatePyramid() 2. Else 2.1 ExchangeLocalTopologyInformation() 2.2 if ( HasAllTopology() & status = faulty) 2.2.1 ReconstructPyramid() 2.3 RoundCounter++ 2.4 If (Diamater(Topology) = RoundCounter) 2.4.1 status = operating If one of the neighbors is faulty Returns true iff there is not an edge coming out from faulty to an unknown state processor` Send immediate neighbors information, and receive Information from neighbors Chapter 7 - Local Stabilization

  32. Undetected Faults What happens in case the faults are not detected? Transient fault detectors and watch dog counters are used in this situation When an error is detected by the transient fault detector, the faulty process starts counting while letting the repair scheme try and fix the problem Chapter 7 - Local Stabilization

  33. Undetected Faults (2) • When the counter reaches its upper bound, the system is examined again • If the repair failed, a reset is triggered to the system Chapter 7 - Local Stabilization

More Related