Distributed Verification of Multi-threaded C++ Programs

Distributed Verification of Multi-threaded C++ Programs Stefan Edelkamp joint work with Damian Sulewski and Shahid Jabbar

Motivation: IO-HSF-SPIN Same states in both parts Arrives at the final state Large jumps due to 2nd heuristic Current state Already seen final state Arrives again at same final state 2.9 TB 20 days 1 node ---- 8 days 3 nodes

Overview • Software Checking in StEAM • Externalization • Virtual Addresses • Parallelization

Software Checking • Advantages + Building a model unnecessary + Learning specification language unnecessary + Checking can be done more often • Disadvantages - Code has to be executed - Huge number of states - Huge states

StEAM • Can check concurrent C++ programs • Uses a virtual machine for execution • supports BFS, DFS, Best-First, A*, IDA* • finds • Deadlocks • Assertion Violations • Segmentation Faults

StEAM - Checking a C++ Program Model checker igcc Compiler Virtual Machine char globalChar; int globalBlocksize = 7; int main(){ allocateBlock(blocksize); } void allocateBlock(int size){ void *memBlock; memBlock = (void *) malloc(size); } Objectcode

StEAM - Interpreting the Object Code ICVM Virtual Machine Register char globalChar; int globalBlocksize = 7; int main(){ allocateBlock(blocksize); } void allocateBlock(int size){ void *memBlock; memBlock = (void *) malloc(size); } Objectcode Text Section BSS Section Data Section Stack Memory Pool

StEAM State 1 Initial State State 2 Register Register Register TextSection BSSSection BSSSection BSSSection DataSection DataSection Stack Stack Stack MemoryPool MemoryPool StEAM – Generating States ICVM Virtual Machine Register Text Section BSS Section Data Section Stack Memory Pool

Overview • Software Checking in StEAM • Externalization • Virtual addresses • Parallelization

Externalization - Motivation time Internal External problem size

Disk RAM Externalization – Mini States [EJMRS 06] • pointer to a state in RAM or on Disk • pointer to the predecessor mini state • constant size

Mini States Secondary Memory Internal Memory Externalization – Expanding a State Cache

Secondary Memory Internal Memory Externalization – Flushing the Cache Cache Mini States

Externalization – Collapse Compression State Caches Files on Disk Register Text Section BSS Section Data Section Stack Memory Pool

Virtual Addresses • programs request memory • memory assignment done by system • moving program between nodes impossible • two possible strategies • converting the addresses before executing • using virtual addresses

Data Stack Text BSS Stack pointer 0 Stack pointer Program counter virtual address: y AVL-Tree y x, size RAM real address: x Virtual Addresses – Memory Management Memory pool

Virtual Addresses - Overhead time virtual real nodes

Parallelization – Motivation Distributed (Shared) Memory  MPI channels/shared RAM communication Sending full states too expensive (if not used for expansion) Exploit externalization  DualChannel (Speedup vs. Load Balance) Appropriate State Space Partitioning

Parallelization – Dual Channel Communication

Parallelization – Hash Partitioning Partitioning by hashing full state Problem: Successors often not in same partition  high communication overhead Partitioning by hashing partial state, e.g. memory pool Problem: Too many states map to one hash value  Load balancing

Parallelization – Incremental Tree Hashing [EM05] h(s) = (Σi si 3^i) mod 17 h(1,2,3,1,2,2,1,2) = 4+1*3^2 + 9*3^(2+2) mod 17 = 11 h(3,1) = 3*3+1*9 mod 17= 1 h(2,2,1,2) = 9 = 6+h(2,1,2)*3^1 = 6+1*3 mod 17 h(1,2) = 1*3+2*9 mod 17 = 4 h(2) = 2*3^1 mod 17= 6

Parallelization – Search Partitioning horizontal slices vertical slices DFS [Holzman & Bosnacki 2006] Best-First, A*

Parallelization - Hardware • Cluster Vision System (PBS) • Linux Suse 10.0 • MPI via infiniband • Files via GBit Ethernet • 224 nodes (464 procs), < 15 used • AMD Opteron DP 50 (2.4 GHz)

Experiments: 15-Puzzle Partial Hash speedup time nodes

Experiments – Depth-First Slicing 200 Philosophers time Top Result: 600 Phils / 6 nodes 97 KB /state Ex Collapse Compression & Distribution 16GB  1.5 GB per node processors

Experiments - Bath-Tub Effect (50 phils-avg.) Time validates Holzmann & Bosnacki Size of Depth Layer

Experiment - Shared Memory Bakery (pthread) • 4 Opteron MP 852 (2.6 GHZ) speedup time nodes

Conclusion Preceeding Work: Full Externalization of States, inIO-HSF-SPIN  Constant-Size RAM, e.g. 1.8 GB RAM, 20 days 1 proc, 8 days 4 procs, 2.9TB disk [EJ06], Distribution via (g+h)-Value Problem: Huge & Highly Dynamic States Solution:Mini States as Constant Size Finger Prints of States in RAM for Dual-Channel Communication to combine External and Parallel Search with Memory-Pool, Best-First Slicing Partitioning

Distributed Verification of Multi-threaded C++ Programs

Distributed Verification of Multi-threaded C++ Programs

Presentation Transcript

Multi-threaded RTOS

Multi-threaded Active Objects

Behavioural Verification of Distributed Components

Behavioural Verification of Distributed Components

Multi-threaded Active Objects

Multi-threaded applications

Multi-threaded Reachability

Tera MTA (Multi-Threaded Architecture)

Multi Threaded Chat Server

Distributed Dynamic Partial Order Reduction based Verification of Threaded Software

Multi-threaded Reachability

Multi-Threaded Transactions

Multi-threaded programming with NSPR

Parallelism (Multi-threaded)

Multi-threaded RTOS

Multi-Threaded Video Rendering

Multi-threaded ROOT

Building High Throughput, Multi-threaded Servers in C#/.NET

Distributed Dynamic Partial Order Reduction based Verification of Threaded Software

Performance Evaluation of a Multi-Threaded Distributed Telerobotic Framework

Multi-Threaded Systems with Queues

Lecture 17: Multi-threaded Applications