1 / 30

Optimizing SystemC for higher speed and coverage

Optimizing SystemC for higher speed and coverage. Dogan Fennibay. Y?. SystemC becoming the de facto system-level design language SystemC emulates parallelism via scheduling Additional element effecting the result Makeup for this hole in coverage We want faster simulations

caraf
Download Presentation

Optimizing SystemC for higher speed and coverage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing SystemC forhigher speed and coverage Dogan Fennibay

  2. Y? • SystemC becoming the de facto system-level design language • SystemC emulates parallelism via scheduling • Additional element effecting the result • Makeup for this hole in coverage • We want faster simulations • To do more executions / cheaper executions • SystemC’s flexibility adds up to slowness

  3. Outline • Automatic generation of schedulings for higher coverage • Introduction • Related work • Definitions • Algorithms • Evaluation • Scoot • Introduction • Related work • Idea • Evaluation • Conclusion

  4. SystemC • Do you know SystemC? • No • Yes

  5. Introduction • 3 different schedulings => 3 different results • a; b; a; te; b; a => “Ok” • a; b; a; te; a; b => “Ko” • b; a; te; b => deadlock (lost notification) • Include process C • 30 different schedulings • => same 3 different results • Equivalence classes void top::A() { wait(e); wait(20,SC_NS); if (x) cout << "Ok\n"; else cout << "Ko\n";} void top::B() { e.notify(); x = 0; wait(20,SC_NS); x = 1;} void top::C() { sc_time T(20,SC_NS); wait(T);}

  6. Introduction • Scheduling also effects the results • Not just input data • We have to test all possible schedulings • Impossible schedulings: do not test them • Due to synchronization constraints • Equivalent schedulings: test only one • e.g. two reads from a shared variable • Focus is on scheduling • Input data generation is not considered if (x) x = 1 e.notify()

  7. Introduction • Dynamic Dependency Graph void top::P() { wait(e); wait(20,SC_NS); if (x) cout << "Ok\n"; else cout << "Ko\n";} void top::Q() { e.notify(); x = 0; wait(20,SC_NS); x = 1;} • green: non-permutable, red: non-commutative

  8. Related work • Formal model • Extract a formal model from SystemC model • Combine with a formal model of the non-deterministic scheduler • => Model checking • State space explosion  • Partial order reduction • Dynamic extension is new • Used by model checkers, but no non-abstract uses • Test case generation & output checker • Assertion based verification promising

  9. Definitions • SUTD: System Under Test + one test data • Assume: Independent test case generator • Generator always independent of scheduling • Process: event or thread • p, q, r, … • Transition: one execution of a process in a scheduling • a, b, c, … or p1, p2, q1, r1, p3, r2, … • Scheduling • String of transitions & new cycles (delta or te) • Full state • Full memory dump incl. PC of processes p: a = x; wait(e); printf(“%d\n”, a); wait(e2); a = x * 2; q: e.notify(); b = x; wait(20, SC_NS); x = b * 2; p1 p2 p3 q1 q2 SUTD:Schedulings → Full states

  10. Definitions • Permutation • Modify a scheduling • Change the order of a and b • Other transitions may come in-between • Equivalence • Two different schedulings lead to the same full state p: a = x q: b = x

  11. Definitions • Permutability: a and b in a scheduling can be exchanged • An equivalent scheduling with a & b consecutive available • a and b can be exchanged in this equivalent scheduling • Commutativity: which permutations are useful? • Exchanging a and b produces an equivalent scheduling • Non-commutative permutations are interesting void t1() { … wait(e1); v2 = 2;} void t2() { … v2 = 1; e1.notify(); wait();} void t3() { … printf(“%d\n”, v1); wait();} void t4() { … printf(“%d\n”, v1 + 1); wait();} ++v1

  12. Definitions void t1() { … wait(e1); v2 = 2;} void t2() { … v2 = 1; e1.notify(); wait();} void t3() { … printf(“%d\n”, v1); wait();} void t4() { … printf(“%d\n”, ++v1); wait();} • Dependency • Boolean:permutable’ + permutable.commutative’ • a must come before b, otherwise (1) is impossible or (2) a different result will be produced • Causal order: Permutable transitions wrt dependency • Equivalent schedulings have the same causal order

  13. Algorithms: Computing commutativity • Shared variables • Read, then modifying write • Modifying write, then read • Write, then modifying write • Events • Notification, then wait • Wait, then notification • Caught notification, then notification • Non-commutative actions • All other actions do not harm commutativity

  14. Algorithms: Causal Partial Order • Computed step-by-step • Start with empty scheduling • Choose candidates a, b; where • a or b are new cycles (delta or te) • a and b from the same process • b is woken up by a • Extend CPO set • Add (a, b) • Add non-commutative transitions of b • Compute & add transitive closure of calculated relations

  15. Algorithms: Generating schedulings • Generating one alternative scheduling • Choose two non-commutative transitions: a and b • Execute the scheduling until a • Execute additional transitions not causally ordered to a • Execute b, then a • Execute the rest • Generating all schedulings

  16. Evaluation: prototype • Model and kernel instrumented • Checker • Get the scheduling, generate new one, feed it to patched kernel • Until no more schedulings available

  17. Evaluation: experiments • Is the overhead of calculating schedulings worth it? • V. T vs G.T + O • 3 examples • Indexer • Small, V calculable • MPEG Decoder • 50 KLOC, 4 processes • Full SoC • 250 KLOC, 57 processes • Indexer • 128 element array for hash table • n components, each with 2 threads, each write 4 elements • G << V • n = 2, V = 3.35e11;n = 3, V = 2.43e25

  18. Evaluation: experiments • MPEG decoder • Overhead is insignificant • G.T = 50 s, O = 18 s • Special structures in code not recognized • Persistent events • Complete SoC • Scalability • Not tested fully because of manual instrumentation • Expectation: ok up to 200 transitions • Observation: more detailed models produce more constrained schedulings => longer schedulings testable

  19. Scoot • Helmstetter et al explore all schedulings • To much time spent • Let’s go in the opposite direction • Make SystemC less flexible to get it faster • Blanc et al

  20. Introduction • Faster execution (up to 5 times!) • Use verification back-ends • CBMC, SATABS • => Get a plain C++ model from SystemC • => Use C++ frontend to support more language constructs

  21. Related work • Work on HW synthesis via model extraction • Kostaras & Vergos and Castillo et al • Only for small subset of C++ • Savoiu et al • Speedup via Petri-net reductions • Only 1.5 times • Pérez et al • Static scheduling • Only event processes considered

  22. Idea • SystemC is very flexible • Dynamic run-time binding of ports • Via polymorphism • Sensitivity lists • Module hierarchy • => Consolidate hierarchy • Scheduler’s inefficiencies • Run-time memory allocations • Processes triggered via function pointers • => Convert to static schedule

  23. Evaluation • AES encryption/decryption • Speedup achieved up to 5.3 times

  24. Conclusion • Helmstetter et al • Eliminated the effect of scheduling • At a reasonable overhead • Problems at scability • Scoot • Significant speedup achieved • Most structures of C++ supported • Preparation for model checking • Further discussion • Why equivalence classes among schedulings? Shouldn’t all schedulings produce the same result? • Why not use Helmstetter’s algorithm for regular software to catch races? • Uses for Scoot?

  25. Extras

  26. SystemC Primer • A system-level design language • Used for HW/SW codesign • Based on native C++ • Different abstraction levels: TLM to RTL SC_MODULE(nand2) { sc_in<bool> A, B; sc_out<bool> F; void do_nand2() // a C++ function { F.write( !(A.read() && B.read())); } SC_CTOR(nand2) { SC_METHOD(do_nand2); sensitive << A << B; } };

  27. SystemC Primer: Concepts • Modules • Containers for other SystemC elements incl. modules • Channels • Communication means of modules • Ports • Connection point of modules to channels • Interfaces • Connection point of channels to modules • Method processes • Non-blocking code parts triggered on events they’re sensitive to • Thread processes • Independent flows of executions • May call wait • Events • Basic means of synchronization • Shared variables • Same as C++

  28. SystemC Primer: Scheduler • Non-deterministic specification: unspecified order • Non-preemptive • Delta cycles used to imitate concurrency • True parallelism on the real system

  29. Properties of relations • Reflexivity: • aRa • Symmetry • aRb => bRa • e.g. “is equal to” • Totality • aRb or bRa • Transitivity • aRb and bRc => aRc • e.g. “is ancestor of” • Transitive closure • e.g. all ancestors in a community

  30. References Blanc, N., Kroening, D. and Sharygina, N., 2008, “Scoot: A Tool for the Analysis of SystemC Models”, TACAS, 2008, 467-470. Helmstetter, C., Maraninchi, F., Maillet-Contoz, L. and Moy, M., 2006, “Automatic Generation of Schedulings for Improving the Test Coverage of Systems-on-a-Chip”, Verimag Research Report, TR-2006-06. Helmstetter, C., Maraninchi, F. and Maillet-Contoz, L., 2007, “Test Coverage for Loose Timing Annotations”, Formal Methods: Applications and Technology, 4346/2007, 100-115.

More Related