A New Reachability Algorithm for Symmetric Multi-processor Architecture

A New Reachability Algorithm for Symmetric Multi-processor Architecture D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin D. Dill, Stanford ATVA 05

Outline • Standard Reachability Analysis • Multithreaded Reachability • Multithreaded Reachability in SMP machines • Engineering Issues • Results • Conclusion and Future Work

Related Work • Parallel Reachability Analysis: • Explicit state • Stern and Dill [CAV, 97] • Parallel BDDs • Stornetta and Brewer [DAC, 96] • Yang, Hallaron [97] • BDD-based parallel reachability • Heyman, Geist, Grumberg, Schuster [CAV, 00] • Garavel, Mateescu, Smarandache [SPIN, 01] • Grumberg el.al. [CHARME, 05], [ATVA, 05] • Verification and multi-threading • Pixley, Havlicek [03]

Reachability using BDD [Burch et al. : 91] Partitioned Transition Relation Initial State I … … R1 Image computation Trn Tri Tr1 R2 Least Fixed Point Ri

Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Partitioned Reachability using POBDD POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Initial States : I

Local Fixed Point 3 Local Fixed Point 4 Communicate from 1 -> 2 Communicate from 1 -> 4 Communicate from 1 -> 3 Partitioned Reachability using POBDD POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Initial States : I Local Fixed Point 1 Local Fixed Point 2

Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Communicate from 2 -> 1 Communicate from 2 -> 3 Communicate from 2 -> 4 Partitioned Reachability using POBDD POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Initial States : I Similarly repeat for other partitions

Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Partitioned Reachability using POBDD POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] I Improvements: [Iyer et al. : 03] [Sahoo et al. : 04]

Motivation for Multi-threaded Approach • Scheduling Problem • Increasing availability of powerful SMP machines • Multi-threading is a way of achieving real parallelism in SMP machines

Multi-threaded Reachability [DAC 05] Naïve parallelization Time • Advantage: • Parallel speedup • Catch a bug faster than the sequential version • Problems: • Not much parallelism

Multi-threaded Reachability [DAC 05] Early Communication Time • Advantage: • Parallel speedup • Finishes the reachability analysis faster • Catches bug faster than the naive version • Problems: • Parallelism could be better

Multi-threaded Reachability [DAC 05] Early Communication and Partial Communication Time • Advantage: • Parallel speedup • Finishes the reachability analysis faster • Catches bug faster than the previous versions

Reachability in SMP Architecture Time • We find the bugs faster ! • Improved parallelism • Better parallel speedup

Engineering Issues • Thread-safe BDD library • We don’t really need a parallel BDD library. • Non-deterministic behavior • Runtime varies from 92sec to 13000sec • Results are not reproducible

Extensive memory based optimizations Pointer comparisons Hashing based on memory address Solutions: Deterministic hashing Deterministic comparisons Use of a pointerless BDD library [Janssen:IWLS, 01] Sources of Non-determinism Thread 1 Thread 2 p = malloc (…) p = malloc (…) key = hash(p) if (p > p1) …

Thread synchronization Solutions Synchronization based on deterministic count Number of ITE operations Number of Sift operations Sources of Non-determinism Thread 1 Thread 2 Image #n Image #n+1

BDD Performance : CUDD Vs New

Performance : Non-deterministic Vs Deterministic

Results on Industrial Circuits

Results on public benchmarks

Results : Gantt charts Real execution traces from our multi-threaded reachability program

Performance: Cache or Parallelism

Smart Thread Scheduling Thread • Problem: • Each processor has its own cache • Thread is assigned to a processor • The cache fills up with the thread’s memory usage. • The same thread assigned to a different processor after sometime • A large number of unnecessary cache miss when the thread use its previously used memory locations • Solutions: • Bind thread to a processor • Leads to suboptimal throughput • If the number of threads exceeds the number of processors CPU2 CPU1 Cache2 Cache1 0x07ffd0 Lookup 0x07ffd0 Cachemiss

Conclusion and Future Work • Parallelize the Reachability • Multi-threaded Reachability • Better results • Deterministic behavior • Future Work • Smart thread scheduling • Improve the parallelism further • Study cache behavior

A New Reachability Algorithm for Symmetric Multi-processor Architecture