1 / 18

Towards A C++ based Design Methodology Facilitating Sequential Equivalence Checking

Towards A C++ based Design Methodology Facilitating Sequential Equivalence Checking. Venkat Krishnaswamy, Calypto Design Systems, Inc. & Philippe Georgelin, STMicroelectronics. Enable EDA tools to unambiguously infer hardware intent.

tryna
Download Presentation

Towards A C++ based Design Methodology Facilitating Sequential Equivalence Checking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards A C++ based Design Methodology Facilitating Sequential Equivalence Checking Venkat Krishnaswamy, Calypto Design Systems, Inc. & Philippe Georgelin, STMicroelectronics

  2. Enable EDA tools to unambiguously infer hardware intent Maintain the performance and productivity benefits of C/C++ coding Methodology Goals Maximize code reuse across models

  3. Talk Outline • C++ Modeling • Achieving Code Reuse • Modeling Hardware Intent • Target Model • Experimental Results • Conclusions and Future

  4. Hardware Modeling in C/C++ • Many applications for C/C++ models • Algorithm exploration • System prototyping • Architecture partitioning • Performance tuning • RTL verification • SW development • Simulation speed and functional accuracy primary concerns • Tools and methods around these models largely home-grown Code reuse across models improves productivity and reduces the chance of model functionality diverging

  5. Cycle accurate Communication Computation Achieving Code Reuse • Module Computation and inter-module Communication are distinct tasks • Separating computation from communication is necessary for reuse • computational code can be reused from model to model • communication detail changes depending on the application of a specific model • The level of detail in inter-module communication is a first order determinant of overall event-simulation speed • events are generated in communication • the lower the level of communication detail the more events generated Transactional Communication Computation

  6. Computational Model Terminology • Slave mode • no attempt at communication from within the body of computational code • complete execution in zero time • no explicit parallelism • communication detail can be specified in wrappers • Master mode • may communicate from within the computational code • communication is at an API level • lends itself to TLM styles • implementation of communication API’s determines level of communication

  7. Classes Enable multiple instantiations w/ low effort Globally Scoped Functions Model state with statics Implementing Slave Mode Computation • Two methods to model computation in slave mode typedef int16 sc_int<16>; class fir_filter { public: fir_filter () {} virtual ~fir_filter () {} int16 run(int16 in, int16 coeffs[8]) { for (int i = 7; i > 0; --i) regs[i] = regs [i - 1] regs[0] = in; int tmp = 0; for (int i = 0; i < 8; ++i) tmp = coeffs[i] * regs[i]; return (tmp >> 16); } private; int16 regs[8]; typedef int16 sc_int<16>; void fir_filter (int16 in, int16 coeffs[8], int16& out) { static int16 regs [8]; for (int I = 7; i > 0; --i) regs[i] = regs[i-1]; regs[0] = in; int tmp = 0; for (int i = 0; i < 8; ++i) tmp = coeffs[i]* regs[i]; out = tmp >> 16;

  8. Coding Computational Master Models • TLM techniques should be used • considerable work in literature on defining and implementing TLM API’s • OSCI TLM working group - tlm-group@systemc.org • TLM methodologies well suited to scale across levels of communication detail • PV • PVT • CC

  9. Modeling Communication • Wrappers can be written around slave mode functions to implement communication • level of detail in wrappers adjusted to purpose for whichmodel is intended • SystemC provides excellent facilities with which to implement wrappers at different levels of detail • Level of communication dictates detail • Architecture exploration : untimed with function interfaces • SW prototyping : coarsely timed with API level interfaces • RTL verification : detailed timing with pin level interfaces • For models coded using master mode, TLM techniques should be used • considerable work in literature on defining and implementingTLM API’s • Ghenassia et al

  10. wrapper Computational Slave Model 1 2 bool uns int Out_C In_A 12 10 int int Out_D In_B readData writeData Using Wrappers to Model Communication • Wrappers for C Models • Provide “exo-skeleton” • Clock, reset, other timing related ports as required • Pin level I/O which can be mapped with RTL model • Intent is to avoid touching computation code • C code can be modified in isolation from wrappers • Wrappers are written for Throughput=1, Latency=1

  11. Functional Communication Computation Transactional Communication Computation Cycle accurate Communication Computation RTL Enabling a System to RTL design flow • EDA tools to must be able to infer hardware intent from C/CC++ models • Sequential Equivalence Checkers (SEC) • High level synthesis (HLS) • Static Analysis • SEC reasons about equivalence of hardware models and C/C++ models • Model compilation • Hardware intent extraction • Static reasoning

  12. Hardware Intent in C++ • It is necessary to define a set of rules for writing computational code • SEC statically creates an abstract HW model • Extraction of efficient HW models • Similar to existing behavioral synthesis rules • Not required for communication code outside SEC wrappers • Rules preclude use of certain common programming idioms • Dynamic memory allocation/de-allocation • Pointer aliasing • Standard library & header files • Statically indeterminate loop bounds

  13. Dynamic allocation of memory to size “a” The size of “a” could depend on runtime parameters Static allocation Integer “a” is an array of 100 elements Dynamic Memory Allocation (malloc) //In this code fragment, a is statically //sized and therefore can be reasoned // about in hardware inferencing //In general, very large arrays that are //sparsely populated can impact // simulation performance. It is therefore //a good idea to isolate such memories // to modules which can be hierarchically // isolated int a[100]; int *a; //This is a runtime call to the OS //to ask for memory allocation from //the kernel. //In general, it is impossible to reason // about size statically a = (int*) malloc (100 * sizeof (int)); ... free (a);

  14. Occurs if a single pointer points to multiple memory locations over its lifetime Some tools support a limited form of aliasing Single array indexing Linked list traversal example Pointer Aliasing int *x; int a[100], int b[100]; for (x = a; x < (a+100); ++x) *a = 0xf; for (x = b; x < (b+100); ++x) *b = 0xf; //x points to several locations over its //lifetime. Another example is list traversal //In general, impossible to statically //determine what x points to my_struct *a, *first, a = first; while (a != NULL) a = a-->next; //linked list traversal example // //impossible to extract a HW abstraction //from a complex instance of aliasing

  15. Video Pipe Subsystem Untimed C model slave mode functions Four algorithm blocks:DCT, IDCT, QUANT, IQUANT functionality defined by sequential top level calls RTL hierarchy for each block Case StudyModeling Style and Design • Coding Style • Adherence to separation of computation and communication • Algorithm blocks proven by realistic vectors • RTL created from C code using behavioral synthesis • SEC tool (SLEC) driver files automatically created

  16. Case StudyEquvalence Checking and Results • SEC done at block level • Pin accurate wrappers created for each block • Coarse clock notion: Throughput=1 and Latency=1 • Initial SEC runs generated counter examples • Wrapper issues • Throughput and latency mismatches • Final SEC runs successful • Full formal equivalence proofs • Runs under 30 minutes per block

  17. Case Study – Benefits from a structured C/C++ methodology • Model reuse • Efficiency between design phases and designers • C/C++ for HW design • Ability to quickly make system trade offs • Automated RTL generation from HLS • Comprehensive verification • Leverage C/C++ verification directly onto RTL • No testbench development using SEC • Exhaustive simulation of these models would have taken years.

  18. Conclusions • Separation of computation and communication are essential to achieving re-use • Proper attention to such separation enables teams to build models at different levels with freedom to choose simulation infrastructure • Most model writers should only care about writing computation blocks • Enabling a multi-hierarchy communication infrastructure involves more C++ expertise than an algorithmic designer/architect has time for • Coding with hardware intent in mind enables tools and methods such as sequential equivalence checking and high level synthesis • Value brought to bear by these tools is worth the effort of coding within guidelines • It is important to continue to expand the range of constructs from which hardware intent can be inferred in a tool independent manner

More Related