190 likes | 362 Views
Partial Order Reduction for Scalable Testing of SystemC TLM Designs. Sudipta Kundu , University of California, San Diego Malay Ganai, NEC Laboratories America Rajesh Gupta, University of California, San Diego. Can be 3 orders of magnitude faster. Simulation. Formal Methods.
E N D
Partial Order Reduction for Scalable Testing of SystemC TLM Designs Sudipta Kundu, University of California, San Diego Malay Ganai, NEC Laboratories America Rajesh Gupta, University of California, San Diego
Can be 3 orders of magnitude faster Simulation Formal Methods Simulation & Formal Methods Functional Verification Simulation & Formal Methods Hardware Design Methodology S Y S T E M C Architecture Level Transaction Level Model (TLM) (Non-Synthesizable Subset) Mostly Manual Micro-architecture Level (Synthesizable Subset) High Level Synthesis Register Transfer Level (RTL)
Outline • Motivation • Background • SystemC Semantics • Partial-order Reduction • Our Approach • Static Analysis • Query-based Framework: Satya • Experiments • Conclusion
Process 1 Process 2 … e2.notify() wait(e1) … e1.notify() wait(e2) Immediate Notification Wait on event … … … e2.notify() wait(5) … e1.notify() wait(3) Delta Cycle Design Errors: • Deadlock • Write Conflicts • Assertion Violations • Data Races Semantics of SystemC • C++ library • Co-operatively Multitasking • Asynchronous and Synchronous concurrency • Variables • Signals : Blocking variables • Non-signals : Non-blocking variables
Wait on event Immediate Notification 0 P1 3 P2 5 Example: Producer-Consumer Global variables int num = 0; char data[2]; sc_event e; 0 • Process C (bool flag) • if (!flag) • wait(e) • c = data[--num] • wait(1, SC_NS) • //local computation • return • Process P () • data[num++] = ‘A’ • notify (e); • wait(1, SC_NS) • //local computation • return !flag flag C1 C3 2 C2 4 C4 6 Pi and Ci are atomic blocks
(0, 0) C1 P1 (0, 2) P1 (3, 0) δ-cycle (3, 2) C1 C2 (3, 2) (3, 4) τ τ (3, 2) (3, 4) P2 C4 P2 (5, 4) (3, 6) δ-cycle C4 P2 (5, 6) (5, 2) (5, 6) Execution Tree No Deadlock Deadlock Single Interleaving not enough .. 0 0 SystemC scheduler is deterministic • For given input it explores only one interleaving !flag data[num++] = ‘A’ notify (e); wait(1, SC_NS) flag C1: wait(e) P1 C3 2 c = data[--num] wait(1, SC_NS) C2: 3 4 P2 C4 5 6 Input: flag = false Problem 1: Problem 2: Exponential number of possible interleavings
s t2 t1 s1 s2 t1 t2 r Enabled transitions Partial Order Reduction (POR) • Reduces the interleaving that needs to be searched • Exploits the commutative of concurrently executed transitions. t1 and t2 are commutative (independent) Explore interleaving t1.t2 or t2.t1(not both) • Concurrent Software Verification • Static POR [Godefroid 95] • Dynamic POR [Flanagan 05]
Our Approach: Overview • Adapts POR techniques for SystemC TLM Designs • Exploits SystemC specific semantics • Co-operatively multitasking • Wait to wait atomic block • Notion of δ-cycle • Signal (blocking) variables • We implemented a query-based framework • Combines static and dynamic POR techniques
Our Framework: Satya SystemC Design Intermediate Representation Static Analysis Partial Order Information Query Engine Explore Engine Modified SystemC Simulator Explicit Stateless Model Checker Satya Satya is a Sanskrit word that translates into English as "truth" or "correct."
0 !flag flag C1 C3 2 C2 c = data[--num] wait(1, SC_NS) 4 C4 6 Static Analysis: Basic Steps • Get a control skeleton. • Find out the wait boundaries (atomic blocks) • Summarize static informations (Wns,Rns, Ws, Rs,Notify, Wait) • Compute the dependence relation between atomic blocks. (next slide) ns – non signal s - signal
Dependence Relation (D) Given two transitions (atomic blocks) t1 and t2, (t1, t2) D if • A write on shared non-signal variable v in t1 and a read or a write on the same variable v in t2. (data dependency) • A write on a shared signal variable s in t1 and a write on the same variable s in t2. (write-write conflict) • A wait on an evente in t1 and an immediate notification on the same event e in t2 (causal dependency) Special Case: We consider symmetric writes (increment, decrement) on non-signals as independent. OR OR
A1 B1 A2 B2 Causal Dependency i++ Symmetric write No conflict (signal variable) Data Dependency Dependence Relation: Example Dependent? YES NO YES NO Query Table
(0, 0) (0, 0) (0, 2) (0, 2) (3, 0) (3, 0) δ-cycle (3, 2) (3, 2) (3, 2) (3, 2) (3, 4) (3, 4) (3, 2) (3, 2) (3, 4) (3, 4) (5, 4) (5, 4) δ-cycle (5, 2) (5, 2) (5, 6) (5, 6) Our Explore Algorithm Runnable Sleep Todo <{ P1 , C1 }, { C1, P1 }, {}> Runnable Sleep Todo <{ P1 , C1 }, {}, {}> Runnable Sleep Todo <{ P1 , C1 }, { C1 }, {}> Runnable Sleep Todo <{ P1 , C1 }, { C1 }, {P1}> C1 P1 Scheduler State = <Runnable, Sleep, Todo> Runnable Set – Transitions enabled at the state Sleep Set - Transitions that no longer need to explore Todo Set - Transitions that will be explored next • Randomly execute an execution path till some depth. • Analyze the path bottom up considering each δ-cycle separately. • If there exist a transition in (Todo \ Sleep) then execute it from start (as our algorithm is stateless). P1 C1 C2 τ τ Runnable Sleep Todo <{ P2 , C4 }, { P2 }, {}> Runnable Sleep Todo <{ P2 , C4 }, {}, {}> P2 C4 Dependent? P2 (3, 6) Is ( P2, C4 ) Dependent? Is ( P1, C1 ) Dependent? C4 P2 (5, 6) Execution Tree Query Engine
Our Contributions • Commutative checks between the transitions are not done across δ-cycles (not required) • Low cost commutative checks • No book-keeping for dynamic reads and writes • Use pre-computed query table • Conservative approach • Independent transitions are precise, but not the dependent ones • Dependent transitions identified statically are most likely dependent • Large wait to wait atomic blocks • Signal variables are commonly used
Experiments and Results 1/2 • No POR – Explore all execution paths • POR – Our Approach using POR Fifo Benchmark • Open SystemC Initiative (OSCI) Repository • Array Bound Violation (2 producer, 1 consumer)
Memory2 Memory1 Router Traffic Generator2 Traffic Generator1 Timer Experiments and Results 2/2 Transaction Accurate Communication Benchmark (TAC) • ST Microelectronics • 6 modules – 2 traffic generators, 2 memories, 1 timer, 1 router • Static slicing of the router while testing for deadlock
Conclusion and Future Work • We presented Satya, a query-based approach build over SystemC Simulator • Compute and use static information efficiently • We exploit SystemC specific semantics • Reduces interleaving that are needed to explore • Improve previous explore algorithm • Avoids book-keeping cost • Avoid dynamic commutative checks • In future, • We are working on intelligent test bench generation