270 likes | 284 Views
This paper presents SAMCA, a fast model checker for detecting Heisenbugs in distributed systems. It explores various message processing orders, network delays, and interposition layers to uncover bugs and provide semantic-aware exploration algorithms. The tool is demonstrated with a sample leader election program and integrated with Apache ZooKeeper.
E N D
SAMCA Fast Model Checker forFinding Heisenbugsin Distributed Systems TanakornLeesatapornwongsaHaryadi S. Gunawi
SAMC @ ISSTA ’15 Distributed-SystemsModel Checker TCP/UDP node1 node2 node3 TCP/UDP TCP/UDP
SAMC @ ISSTA ’15 Distributed-SystemsModel Checker Message processing order 1. Node 2 processes A 2. Node 3 processes B 3. Node 2 processes C A node1 node2 node3 C B
SAMC @ ISSTA ’15 Distributed-SystemsModel Checker Message processing order 1. Network delays A 2. Node 3 processes B 3. Node 2 processes C 4. Node 2 processes A A node1 node2 node3 C B
SAMC @ ISSTA ’15 Distributed-SystemsModel Checker Message processing order 1. Node 2 processes A 2. Node 3 processes B 3. Node 2 processes C A node1 node2 node3 1. Node 3 processes B 2. Node 2 processes A 3. Node 2 processes C C B 1. Node 3 processes B 2. Node 2 processes C 3. Node 2 processes A
SAMC @ ISSTA ’15 Model Checker Architecture node1 node2 node3 Interposition layer Interposition layer Interposition layer C A B D Model Checking Server A, B C, D A, B, C, D
SAMC @ ISSTA ’15 Model Checker Architecture node1 node2 node3 Interposition layer Interposition layer Interposition layer C A B D Model Checking Server A, B C, D . .. . .. D, A, C, B D, C, B, A A, B, D, C
SAMC @ ISSTA ’15 Outline • SAMC demo • Integration of SAMC • Real integration • Conclusion
SAMC @ ISSTA ’15 Outline • SAMC demo • Integration of SAMC • Real integration • Conclusion
SAMC @ ISSTA ’15 SampleSys • Demo program • Leader election • Find which node has the BIGGEST ID at the election time • Have only one leader!
SAMC @ ISSTA ’15 SampleSys • When start up, it supports itself • Broadcast support • If receiving ID is smaller, do nothing • If bigger, change support • After support change, broadcast again • Stop when majority agree Support = 2 Support = 3 V=2 Leader = 3 V=3 node1 node2 node3 V=1 V=3 V=2 V=3
SAMC @ ISSTA ’15 Demo • Run SAMC with 2 exploration algorithms • Brute force • Slow and inefficient • Local-message independent (LMI) • Fast white-box testing • Requires semantic information • Message semantic and system state
SAMC @ ISSTA ’15 Execution Replay • Replaying buggy execution path again • Use execution path output to replay • Debug the execution until the desired step Very easy for developers to debug code and fix bugs
SAMC @ ISSTA ’15 Result • Re-order all messages as we want • Report execution path and execution result • SAMC is semantic-aware • Supporting semantic-aware exploration algorithms • Fast model checking • SAMC with LMI can catch 2-leader bug in 3 executions!!! • Execution replay function
SAMC @ ISSTA ’15 Outline • SAMC demo • Integration of SAMC • Real integration • Conclusion
SAMC @ ISSTA ’15 Interposition Layer • Aspect-oriented programming for interposition layer • Written separately, not clutter with system code • Intercept at message sending method • Inform message semantic to the server via key-value pairs LeaderElectionAspect.aj
SAMC @ ISSTA ’15 SAMC Server • Basic algorithms • Brute force, random, etc. • Extendable dynamic-partial order reduction (DPOR) • Implement LMI by adding application-specific logic to DPOR
SAMC @ ISSTA ’15 Workload Driver • Extends abstract class WorkloadDriver • How to start / stop / reset the system • How to start workload we want to check
SAMC @ ISSTA ’15 Start Java processes that run SampleSys with given config files
SAMC @ ISSTA ’15 Specification Verifier • Extend abstract class SpecificationVerifier • Does system behave as specification? How many leader? Does everyone agree on one leader?
SAMC @ ISSTA ’15 Outline • SAMC demo • Integration of SAMC • Real integration • Conclusion
SAMC @ ISSTA ’15 Apache ZooKeeper Integration • Non-determinism • Network communication • Disk I/O • Machine crash / machine restart • Model check 5 versions • Reproduce 7 old bugs • Leader election and atomic broadcast protocol • Some require multiple crashes and reboots • Find 1 new bug
SAMC @ ISSTA ’15 ZooKeeperIntegration Result Number of execution to run to reproduce old bugs ZAB = ZooKeeper atomic broadcast protocol ZLE = ZooKeeper leader election protocol
SAMC @ ISSTA ’15 Outline • SAMC demo • Integration of SAMC • Real integration • Conclusion
SAMC @ ISSTA ’15 SAMC • Semantic awareness for fast model checking • AOP for interposition layer • SAMC server is extendable and comes with replay function • Able to integrate to real systems
SAMC @ ISSTA ’15 Future Works • Timeout interposition • Catching performance bugs • Step-by-step replay function
SAMC @ ISSTA ’15 Thank you!Questions? http://ucare.cs.uchicago.edu Code can be found at http://ucare.cs.uchicago.edu/projects/samc