300 likes | 532 Views
Optimizing SystemC for higher speed and coverage. Dogan Fennibay. Y?. SystemC becoming the de facto system-level design language SystemC emulates parallelism via scheduling Additional element effecting the result Makeup for this hole in coverage We want faster simulations
E N D
Optimizing SystemC forhigher speed and coverage Dogan Fennibay
Y? • SystemC becoming the de facto system-level design language • SystemC emulates parallelism via scheduling • Additional element effecting the result • Makeup for this hole in coverage • We want faster simulations • To do more executions / cheaper executions • SystemC’s flexibility adds up to slowness
Outline • Automatic generation of schedulings for higher coverage • Introduction • Related work • Definitions • Algorithms • Evaluation • Scoot • Introduction • Related work • Idea • Evaluation • Conclusion
SystemC • Do you know SystemC? • No • Yes
Introduction • 3 different schedulings => 3 different results • a; b; a; te; b; a => “Ok” • a; b; a; te; a; b => “Ko” • b; a; te; b => deadlock (lost notification) • Include process C • 30 different schedulings • => same 3 different results • Equivalence classes void top::A() { wait(e); wait(20,SC_NS); if (x) cout << "Ok\n"; else cout << "Ko\n";} void top::B() { e.notify(); x = 0; wait(20,SC_NS); x = 1;} void top::C() { sc_time T(20,SC_NS); wait(T);}
Introduction • Scheduling also effects the results • Not just input data • We have to test all possible schedulings • Impossible schedulings: do not test them • Due to synchronization constraints • Equivalent schedulings: test only one • e.g. two reads from a shared variable • Focus is on scheduling • Input data generation is not considered if (x) x = 1 e.notify()
Introduction • Dynamic Dependency Graph void top::P() { wait(e); wait(20,SC_NS); if (x) cout << "Ok\n"; else cout << "Ko\n";} void top::Q() { e.notify(); x = 0; wait(20,SC_NS); x = 1;} • green: non-permutable, red: non-commutative
Related work • Formal model • Extract a formal model from SystemC model • Combine with a formal model of the non-deterministic scheduler • => Model checking • State space explosion • Partial order reduction • Dynamic extension is new • Used by model checkers, but no non-abstract uses • Test case generation & output checker • Assertion based verification promising
Definitions • SUTD: System Under Test + one test data • Assume: Independent test case generator • Generator always independent of scheduling • Process: event or thread • p, q, r, … • Transition: one execution of a process in a scheduling • a, b, c, … or p1, p2, q1, r1, p3, r2, … • Scheduling • String of transitions & new cycles (delta or te) • Full state • Full memory dump incl. PC of processes p: a = x; wait(e); printf(“%d\n”, a); wait(e2); a = x * 2; q: e.notify(); b = x; wait(20, SC_NS); x = b * 2; p1 p2 p3 q1 q2 SUTD:Schedulings → Full states
Definitions • Permutation • Modify a scheduling • Change the order of a and b • Other transitions may come in-between • Equivalence • Two different schedulings lead to the same full state p: a = x q: b = x
Definitions • Permutability: a and b in a scheduling can be exchanged • An equivalent scheduling with a & b consecutive available • a and b can be exchanged in this equivalent scheduling • Commutativity: which permutations are useful? • Exchanging a and b produces an equivalent scheduling • Non-commutative permutations are interesting void t1() { … wait(e1); v2 = 2;} void t2() { … v2 = 1; e1.notify(); wait();} void t3() { … printf(“%d\n”, v1); wait();} void t4() { … printf(“%d\n”, v1 + 1); wait();} ++v1
Definitions void t1() { … wait(e1); v2 = 2;} void t2() { … v2 = 1; e1.notify(); wait();} void t3() { … printf(“%d\n”, v1); wait();} void t4() { … printf(“%d\n”, ++v1); wait();} • Dependency • Boolean:permutable’ + permutable.commutative’ • a must come before b, otherwise (1) is impossible or (2) a different result will be produced • Causal order: Permutable transitions wrt dependency • Equivalent schedulings have the same causal order
Algorithms: Computing commutativity • Shared variables • Read, then modifying write • Modifying write, then read • Write, then modifying write • Events • Notification, then wait • Wait, then notification • Caught notification, then notification • Non-commutative actions • All other actions do not harm commutativity
Algorithms: Causal Partial Order • Computed step-by-step • Start with empty scheduling • Choose candidates a, b; where • a or b are new cycles (delta or te) • a and b from the same process • b is woken up by a • Extend CPO set • Add (a, b) • Add non-commutative transitions of b • Compute & add transitive closure of calculated relations
Algorithms: Generating schedulings • Generating one alternative scheduling • Choose two non-commutative transitions: a and b • Execute the scheduling until a • Execute additional transitions not causally ordered to a • Execute b, then a • Execute the rest • Generating all schedulings
Evaluation: prototype • Model and kernel instrumented • Checker • Get the scheduling, generate new one, feed it to patched kernel • Until no more schedulings available
Evaluation: experiments • Is the overhead of calculating schedulings worth it? • V. T vs G.T + O • 3 examples • Indexer • Small, V calculable • MPEG Decoder • 50 KLOC, 4 processes • Full SoC • 250 KLOC, 57 processes • Indexer • 128 element array for hash table • n components, each with 2 threads, each write 4 elements • G << V • n = 2, V = 3.35e11;n = 3, V = 2.43e25
Evaluation: experiments • MPEG decoder • Overhead is insignificant • G.T = 50 s, O = 18 s • Special structures in code not recognized • Persistent events • Complete SoC • Scalability • Not tested fully because of manual instrumentation • Expectation: ok up to 200 transitions • Observation: more detailed models produce more constrained schedulings => longer schedulings testable
Scoot • Helmstetter et al explore all schedulings • To much time spent • Let’s go in the opposite direction • Make SystemC less flexible to get it faster • Blanc et al
Introduction • Faster execution (up to 5 times!) • Use verification back-ends • CBMC, SATABS • => Get a plain C++ model from SystemC • => Use C++ frontend to support more language constructs
Related work • Work on HW synthesis via model extraction • Kostaras & Vergos and Castillo et al • Only for small subset of C++ • Savoiu et al • Speedup via Petri-net reductions • Only 1.5 times • Pérez et al • Static scheduling • Only event processes considered
Idea • SystemC is very flexible • Dynamic run-time binding of ports • Via polymorphism • Sensitivity lists • Module hierarchy • => Consolidate hierarchy • Scheduler’s inefficiencies • Run-time memory allocations • Processes triggered via function pointers • => Convert to static schedule
Evaluation • AES encryption/decryption • Speedup achieved up to 5.3 times
Conclusion • Helmstetter et al • Eliminated the effect of scheduling • At a reasonable overhead • Problems at scability • Scoot • Significant speedup achieved • Most structures of C++ supported • Preparation for model checking • Further discussion • Why equivalence classes among schedulings? Shouldn’t all schedulings produce the same result? • Why not use Helmstetter’s algorithm for regular software to catch races? • Uses for Scoot?
SystemC Primer • A system-level design language • Used for HW/SW codesign • Based on native C++ • Different abstraction levels: TLM to RTL SC_MODULE(nand2) { sc_in<bool> A, B; sc_out<bool> F; void do_nand2() // a C++ function { F.write( !(A.read() && B.read())); } SC_CTOR(nand2) { SC_METHOD(do_nand2); sensitive << A << B; } };
SystemC Primer: Concepts • Modules • Containers for other SystemC elements incl. modules • Channels • Communication means of modules • Ports • Connection point of modules to channels • Interfaces • Connection point of channels to modules • Method processes • Non-blocking code parts triggered on events they’re sensitive to • Thread processes • Independent flows of executions • May call wait • Events • Basic means of synchronization • Shared variables • Same as C++
SystemC Primer: Scheduler • Non-deterministic specification: unspecified order • Non-preemptive • Delta cycles used to imitate concurrency • True parallelism on the real system
Properties of relations • Reflexivity: • aRa • Symmetry • aRb => bRa • e.g. “is equal to” • Totality • aRb or bRa • Transitivity • aRb and bRc => aRc • e.g. “is ancestor of” • Transitive closure • e.g. all ancestors in a community
References Blanc, N., Kroening, D. and Sharygina, N., 2008, “Scoot: A Tool for the Analysis of SystemC Models”, TACAS, 2008, 467-470. Helmstetter, C., Maraninchi, F., Maillet-Contoz, L. and Moy, M., 2006, “Automatic Generation of Schedulings for Improving the Test Coverage of Systems-on-a-Chip”, Verimag Research Report, TR-2006-06. Helmstetter, C., Maraninchi, F. and Maillet-Contoz, L., 2007, “Test Coverage for Loose Timing Annotations”, Formal Methods: Applications and Technology, 4346/2007, 100-115.