510 likes | 697 Views
Software-Hardware co-design for Real Time Systems. Marko Bertogna ReTiS Lab. Scuola S.Anna, Pisa. Introd u ction. Overview. What is Co-design? Co-design typical instruments VHDL SystemC Reconfigurable Devices CSoC Co-design for RT Systems. Introd u ction. Co-design types.
E N D
Software-Hardware co-design for Real Time Systems Marko Bertogna ReTiS Lab. Scuola S.Anna, Pisa Marko Bertogna - Sw/Hw Co-design
Introduction Overview • What is Co-design? • Co-design typical instruments • VHDL • SystemC • Reconfigurable Devices • CSoC • Co-design for RT Systems Marko Bertogna - Sw/Hw Co-design
Introduction Co-design types • Mechanical vs electrical design • Analog/digital • Control vs computing • Sw/Hw • time vs space programming • centralized vs distributed computing • sequential vs parallel behaviour Marko Bertogna - Sw/Hw Co-design
Software programming c=a+b; result=c/2; Hardware implement. Introduction What is a task in hardware? a c + b shifter Assembler expansion: ldr r0,a ldr r1,b add r0,r0,r1 mov r0,LSR r0 str r0,result result 5 operations All in one clock cycle! Marko Bertogna - Sw/Hw Co-design
VHDL VHDL – Verilog • Very High Speed Integrated Circuit Hardware Description Language • formal model for the behaviour of a system • simulation • synthesis: automatic transformation refinement from a less detailed description… until existing components • design reuse Marko Bertogna - Sw/Hw Co-design
VHDL VHDL features • Abstraction, modularity, hierarchy o<=i1+i2*i3 after 100 ns Behavioural RTL … U6: ND2 port map(A=>n3, B=>n9, Z=>I7); U7: IVP port map(A=>n13, B=>n19); U8: ND2 port map(A=>u8, B=>u1, Z=>n4); … Logic Layout Marko Bertogna - Sw/Hw Co-design
VHDL VHDL synthesis steps • Specification (“paper and pencil”) • System level: behaviour • Logic design: all synthesis aspects • Gate level: mapping to ASIC library or FPGA logic blocks. Automatic synthesis Netlist • Layout VHDL design Validation at each step! Marko Bertogna - Sw/Hw Co-design
VHDL VHDL synthesis Brehavioural synthesis Logic synthesis (use gate libraries) Behavioural VHDL RTL VHDL Netlist VHDL Layout functional timing: “after 10s signal A switches to 1” gate delays path delays clock, functions, events Placement and route Back annotation Σlongest path (gate delay) < Tck Marko Bertogna - Sw/Hw Co-design
Entity/Architecture Components Configuration Process Library Subprogram (functions and procedures) Package/Package Body Signals, Testbench entity HALFADDER isport( A, B: in bit; SUM, CARRY: out bit);end HALFADDER;architecture RTL of HALFADDER isbegin SUM <= A xor B; CARRY <= A and B;end RTL;-- VHDL'93: end architecture RTL ; VHDL VHDL structural elements Marko Bertogna - Sw/Hw Co-design
VHDL VHDL synthesis example Library IEEE;use IEEE.Std_Logic_1164.all;entity IF_EXAMPLE isport (A, B, C, X : instd_ulogic_vector..; Z : outstd_ulogic_vector..);end IF_EXAMPLE; architecture A of IF_EXAMPLE isbeginprocess (A, B, C, X)begin if ( X = "1110" ) then Z <= A;elsif (X = "0101") then Z <= B;else Z <= C;end if;end process;end A; Marko Bertogna - Sw/Hw Co-design
VHDL VHDL optimization examples Refinement Refinement Marko Bertogna - Sw/Hw Co-design OUT1<=IN1+IN2+IN3+IN4+IN5+IN6 OUT2<=(IN1+IN2)+(IN3+IN4)+(IN5+IN6)
SystemC SystemC • Integration with C++ • Provides: • hardware timing (clock and delay) • concurrency support (modules) • reactive behaviour (events) • signal-based communication support • new data types (logic values, bit vectors, etc.) • No need to translate to HDLs Marko Bertogna - Sw/Hw Co-design
SystemC SystemC Design Methodology SystemC Design Methodology: Current system design methodology: Marko Bertogna - Sw/Hw Co-design
SystemC SystemC features • Implemented as a C++ class library (libsystemc.a) • Inherits all hierarchy features • Built-in simulation environment • Easy refinement and reworking • Lightweight Marko Bertogna - Sw/Hw Co-design
SystemC SystemC core language • Modules • Processes • Clocks, custom wait() calls • Support for events, sensitivity list, watching() construct • Signals Marko Bertogna - Sw/Hw Co-design
SystemC Modules • Basic building block • Map functionality of Hw/Sw blocks • Derived from class sc_module • Possibility to use hierarchy constructs and sub-modules • Interface each other via ports/interfaces/channels Marko Bertogna - Sw/Hw Co-design
SystemC Modules //my_module.h SC_MODULE(my_module) { //port declarations //process declarations SC_CTOR(my_module) { //process configuration //initialization code } }; Marko Bertogna - Sw/Hw Co-design
SystemC Ports/Channels/Interfaces • Ports provide communication functions to modules • Interfaces connect ports to channels • Typical channel: signal Marko Bertogna - Sw/Hw Co-design
SystemC Processes • Provide module functionality • Implemented as C++ member functions • Run concurrently between each other • Execute statements sequentially • Three kinds: • SC_METHOD • SC_THREAD • SC_CTHREAD Marko Bertogna - Sw/Hw Co-design
SystemC SC_METHOD //my_module.h SC_MODULE(my_module) { sc_in<bool> id; sc_in<sc_uint<3> > in_a; sc_in<sc_uint<3> > in_b; sc_out<sc_uint<3> > out_c; void my_method(); SC_CTOR(my_module) { SC_METHOD(my_method); sensitive << a << b; } }; //my_module.cpp void my_module::my_method() { if (id.read()) out_c.write(in_a.read()); else out_c.write(in_b.read()); }; Marko Bertogna - Sw/Hw Co-design
SystemC SC_THREAD //my_module.h SC_MODULE(my_module) { sc_in<bool> id; sc_in<bool> clock; sc_in<sc_uint<3> > in_a; sc_in<sc_uint<3> > in_b; sc_out<sc_uint<3> > out_c; void my_thread(); SC_CTOR(my_module) { SC_THREAD(my_thread); sensitive << clock.pos(); } }; //my_module.cpp void my_module:: my_thread() { while(true) { if (id.read()) out_c.write(in_a.read()); else out_c.write(in_b.read()); wait(); } }; Marko Bertogna - Sw/Hw Co-design
SystemC Channels • The most common type is signal • Signal can be traced: waveform dumping produces .VCD output file • Other channels: • sc_fifo • sc_mutex • sc_semaphore Marko Bertogna - Sw/Hw Co-design
SystemC SystemC scheduler • Similar to HDL scheduler • Two different time steps: • Discrete simulation cycle • “Delta cycle” • “Evaluate then update” semantic • Order of process resumption unknown • Event objects extend sensitivity Marko Bertogna - Sw/Hw Co-design
Reconfigurable Devices Co-design for embedded systems • “Programming in Space” versus “Programming in Time” • Key design choices: • Computational units and their granularity • Interconnect Network • (Re)configuration time and frequency • Formal verification • Automatic synthesis Marko Bertogna - Sw/Hw Co-design
Flexibility vs efficiency Reconfigurable Devices Marko Bertogna - Sw/Hw Co-design
Reconfigurable Devices Reconfigurable devices advantages • Efficiency AND Flexibility • Time to market • Easier upgrade • Lower cost (on scale production) • Reusable IP • Customable interface Marko Bertogna - Sw/Hw Co-design
Reconfigurable Devices Reconfigurable devices parameters • Block granularity • Density • Reconfiguration time • Compile-Time Reconfiguration (CTR) vs Run-Time Reconfiguration (RTR) • Partial or Total reprogramming Marko Bertogna - Sw/Hw Co-design
Reconfigurable Devices FPGA • SRAM-based Field Programmable Gate Array • Basic block is the Logic Element (LE) • Capacity from 1k to 100k LEs • Configurable Interconnect • Need for optimized CAD or pre-binded design libraries Marko Bertogna - Sw/Hw Co-design
Reconfigurable Devices FPGA CSL organization: Basic Logic Element: Marko Bertogna - Sw/Hw Co-design
CSoC CSoCConfigurable Systems on Chip • RISC processor • FPGA block • On-chip memories • External memories • Peripherals • DIP switches and connectors • Debug support Marko Bertogna - Sw/Hw Co-design
PRISM (Brown) PRISC (Harvard) DPGA-coupled uP, Raw processor (MIT) V-IRAM, GARP, Pleiades, etc. (UCB) OneChip (Toronto) REMARC (Stanford) NAPA (NSC) E5, A7 etc. (Triscend) Chameleon Quicksilver Excalibur (Altera) Virtex+PowerPC (Xilinx) PIM Processor (Sun) CSoC Research on CSoC Marko Bertogna - Sw/Hw Co-design
CSoC CSoC companies • Xilinx Triscend (50% market in PLDs and FPGA) • Altera • many others Triscend and Altera boards available in our lab Marko Bertogna - Sw/Hw Co-design
CSoC The Triscend A7S Board • TA7S20-60Q CSoC • SDRAM 32Mb • Flash 2Mb • Memory sockets • 2 serial connectors • 7 segment LED • Oscillator for CK • Debug facilities Marko Bertogna - Sw/Hw Co-design
CSoC The Triscend A7S chip Marko Bertogna - Sw/Hw Co-design
CSoC Triscend Fastchip 2.4 • FPGA optimized module library • IO Editor • Generate file.h • Bind (placement and route) file.csl • Config file.cfg • Download Marko Bertogna - Sw/Hw Co-design
CSoC Triscend Fastchip modules Marko Bertogna - Sw/Hw Co-design
Co-design and real-time • RTOS Booster (Lindh et al.): • hardware fixed-priority scheduler • no need for clock tick administration • interprocess communication, mutex and semaphores • Beware to bus bottlenecks! • SoC Lock Cache (Lee) • Configurable Hardware scheduler (GIT) • Online scheduling of Hardware RT tasks to Partially Reconfigurable Devices (Thiele et al.) Marko Bertogna - Sw/Hw Co-design
Hardware RTOS: the RTU Lindh et al., RTU (Real Time Unit): - Accelerator Interface - Scheduler Unit - Message, Semaphore and Delay Handler - Intelligent Interrupt Handler - Real-Time Control - General and Technology Dependent Bus Interface Marko Bertogna - Sw/Hw Co-design
Drawbacks of centralized computing • Moore’s law is going the wrong way for power consumption • A memory access consumes far more then a CPU local operation • Chip area= logic + MEMORY • Under 100nm many problems: • Increasing leakage current • Difficult interconnect • Litho and process variaibility Marko Bertogna - Sw/Hw Co-design
Power delivery and dissipation Marko Bertogna - Sw/Hw Co-design
Power efficiency Marko Bertogna - Sw/Hw Co-design
Road to distributed computing • Concurrent programming • Compilers that can exploit parallelism • High-level debuggers • Algorithm for intermediate levels of granularity (between C++ and HDLs) • New benchmarking methods and metrics (MOPS/$ or MOPS/kg W) Marko Bertogna - Sw/Hw Co-design
Cell processor(IBM, Sony, Toshiba) (from IMEC – Hugo de Man) Marko Bertogna - Sw/Hw Co-design
Grazie per l’attenzione! Fine! Marko Bertogna - Sw/Hw Co-design
SystemC SystemC layers No notion of time (processes and data transfers) Functional verification Algorithm validation + formal + time Notion of time (processesand data transfers) Coarse benchmarking Architectural analysis + pin acc. + cycle acc. Cycle accuracy, signal accuracy Detailed benchmarking Microarchitectural analysis + HW mapping Marko Bertogna - Sw/Hw Co-design
SystemC UnTimed Functional (UTF) model // adder.h SC_MODULE(adder) { {sc_fifo_in<float> input1, input2; sc_fifo_out<float> output; SC_CTOR(adder) { SC_THREAD(adding());} void adding() { while (true) { output.write(input1.read() + input2.read()); }}} // constgen.h SC_MODULE(constgen) { {sc_fifo_out<float> output; SC_CTOR(constgen) { SC_THREAD(generating());} void generating() { while (true) { output.write(0.7); }}} Marko Bertogna - Sw/Hw Co-design
SystemC Timed Functional (TF) model // constgen.h SC_MODULE(constgen) { {sc_fifo_out<float> output; SC_CTOR(constgen) { SC_THREAD(generating());} void generating() { while (true) { wait(200, SC_NS); output.write(0.7); }}} // constgen.h SC_MODULE(constgen) { {sc_fifo_out<float> output; SC_CTOR(constgen) { SC_THREAD(generating());} void generating() { while (true) { output.write(0.7); }}} refining Marko Bertogna - Sw/Hw Co-design
SystemC Bus Cycle Accurate (BCA) model // euclid.cpp void euclid::compute() {unsigned int tmp_a = 0, tmp_b; // reset section while (true) { c.write(tmp_a); // signaling output ready.write(true); wait(); // moving to next cycle tmp_a = a.read(); // sampling input tmp_b = b.read(); ready.write(false); wait(); // moving to next cycle while (tmp_b != 0) { // computing unsigned int r = tmp_a; tmp_a = tmp_b; r = r % tmp_b; tmp_b = r;}}} // euclid.h SC_MODULE (euclid) { sc_in_clk clock; sc_in<bool> reset; sc_in<unsigned int> a, b; sc_out<unsigned int> c; sc_out<bool> ready; void compute(); SC_CTOR(euclid) { SC_CTHREAD(compute, clock.pos()); watching(reset.delayed() == true); } }; Marko Bertogna - Sw/Hw Co-design
SystemC Register Transfer Level (RTL) model - RTL level: signal accurate, cycle accurate, resource accurate - Can not use abstractions (functional units, communication infrastructures, …) Marko Bertogna - Sw/Hw Co-design
SystemC RTL adder // counter.cpp #include "counter.h“ void counter::counting() { if (clear) countval = 0; else if (load.read()) countval = (unsigned int)din.read(); else countval++; dout.write((sc_uint<8>)countval); } // counter.h SC_MODULE(counter) { sc_in<bool> clk; sc_in<bool> load; sc_in<bool> clear; sc_in<sc_uint<8> > din; sc_out<sc_uint<8> > dout; unsigned int countval; void counting(); SC_CTOR(counter) { SC_METHOD(counting); sensitive << clk.pos(); } }; Marko Bertogna - Sw/Hw Co-design