600 likes | 738 Views
HOPES : Embedded Software Development Environment for MPSoC. Feb. 14, 2006 Soonhoi Ha Seoul National University. Contents. Introduction Proposed Design Flow Key Techniques Status Conclusion. Deep Submicron Era. Increased chip density
E N D
HOPES: Embedded Software Development Environment for MPSoC Feb. 14, 2006 Soonhoi Ha Seoul National University
Contents • Introduction • Proposed Design Flow • Key Techniques • Status • Conclusion
Deep Submicron Era • Increased chip density • MPSoC (Multiprocessor System on Chip), NoC (Network on Chip) • Increased chip density • Increased NRE cost • Increased manufacturing cost • Platform based design • Platform: common (HW/SW) denominator over multiple applications • Pre-built and verified (HW/SW) architecture • SW design and system verification are major challenges
SoC (System-on-a-Chip) • Assembly of “prefabricated component” • Maximize VC(IP) reuse: over 90% • New economics: fast and correct design > optimum design • Design and Verification at the system level • interface between VCs • SW becomes more important software up up memory graphics video unit DSP custom coms
HOPES Project • Project • Title: Development of Embedded software design and verification techniques for MPSoC, Period: 2005.3.1 - 2008.2.29, • Amount: $4,200,000 • Sponsor: Korean Ministry of Information and Communication, • Mission • Develop an integrated embedded software development environment based on model-based programming techniques and multi-stage validation techniques for MPSoC (Multi-Processor System on Chip)
Motivation • Current SoC-related projects (in Korea) focus on hardware design and verification • Software design on MPSoC • Embedded software with timing and resource constraints • Parallel programming for heterogeneous multiprocessors • Fast virtual prototyping
Current Practice (in Korea) • Virtual prototyping • Coware ConvergenSC, ARM(Axys) MaxSim • Manual software and hardware design • TLM (transaction level modeling) simulation: 100K inst./sec • Model-based programming • UML based tool: (ex) Telelogic TAU • Limited capability: documentation, code structure. • No tool is available for parallel programming • Manual programming
HOPES Proposal KPN UML Model-based Programming (PeaCE model) Common Intermediate Code (CIC) Generic API CIC (w/ API) translator Static C code analysis API lib. Per-processor code 2 SW Platform 1 Perf. & power estimation Virtual prototyping Target HW platform Processor ISS
Key Techniques • SW design techniques • Model-based specification • Partitioning and automatic code generation from specification • Parallel programming (task parallelism, data parallelism) • Generic APIs (independent of OS and target architecture) • SW verification techniques: 3-phase verification • At specification level: static analysis of program specification and functional simulation • After code generation: static analysis of C code • At simulation time: debugging on virtual prototyping environment
Contents • Introduction • Proposed Design Flow • Key Techniques • Status • Conclusion
Proposed design flow • Model-based programming • PeaCE model (dataflow + FSM + task model) • ESUML (embedded system UML) model • CIC (Common Intermediate Code) • OpenMP pragma + generic API • Static Analysis • Buffer overrun, memory leak, null dereference, stack size • Virtual prototyping • Performance and power estimation • with source-level debugging capability
Software Module Interface results GUI SW Platform ESUML_Modeler PeaCE_Modeler Document generation Block library DB PeaCE_Partitioner ESUML_Verifier ESUML_Partitioner Common Intermediate Code (CIC): task_graph.cic Generic parallel API Generic API proc. code with generic API parallel API HW component DB OpenMP translator OS API Translator Comm. API Lib. Code Synthesizer C-code static analyzer {processorX.c} proc. code with OS proc. code without OS Verification OS Power analysis module ARM processor simulation Performance analysys module MPI Library virtual prototype system MPSoC {prototype system}
{Application_graph.xml} user command App_classDiagram.xml App_seqDiagram.xml Action semantics results GUI SW Platform ▼공동 1 Application_graph.xml user command ESUML_Modeler PeaCE_Modeler User_command sched.xml Document gen. sched.xml Task_graph.xml Block perf. DB PeaCE_Partitioner ESUML_Verifier TimeCost.xml ESUML_Partitioner archi.xml Common Intermediate Code (CIC): task_graph.cic
Common Intermediate Code (CIC): task_graph.cic openMP pragma Generic API proc. Code(generic API) parallel API HW component DB OpenMP translator OS API Translator comm API Lib. Code Synthesizer C-code static analyzer {processorX.c} code w/ OS code w/o OS Verification OS processorX.elf Power estimation ARM ISS Perf. estimation MPI Library virtual prototype system Function symbol table Cycle-level execution trace MPSoC {prototype system} processor parameter
Demo 1. HOPES Design Flow • Model-based programming (ex: divx player) • PeaCE Model • Dataflow + (FSM) + task model • Fault simulation • sample rate inconsistency (Y;U;V= 3:1:1) • deadlock error • Functional simulation • Host machine
Demo 1. HOPES Design Flow (2) • C code static analysis • Buffer overrun error detection • Fault simulation: wrong array size in “AviFileReader task” • Runtime verification • Simulation with Realview ARM debugger • Fault simulation • Logical error insertion
Demo 2. HOPES (single task) • Single task example: H.263 decoder • C-code static analysis • Performance and power estimation • Custom-designed ARM Simulator • Per-function estimation • Runtime verification • Source level debugging • Fault simulation : logical error insertion
HOPES GUI Xml files (topology, property) perf. And power estimation PeaCE code gen. - CIC(xml) - CIC(.cic) ARM ISS C code static analysis CIC code with Generic API, OpenMP pragma Executable With target API OpenMP translator CIC code With Generic API CIC xml Generic API tranformer Code Synthesizer - C code, Executable C codes With Generic API HOPES flow for single processor Manual for now
Demo 3. UML 2.0 –based Modeling • ESUML : Embedded Software modeling with UML 2.0 feedback Target MPSoC Embedded S/W Needs Analysis & Design Verification & Validation Code Gen. & Optimization ESUML Simulator ESUML Statis analyzer ESUML Modeler
ESUML • Model – Driven Application Development Requirements Capturing PIM Modeling Structural model Behavioral model Use-Case model Model Verification Static Analysis Model Simulation PSM Modeling H/W architecture model Code Generation Physical model CIC code Java code PIM : Platform Independent Model PSM : Platform Specific Model
ESUML Characterisitcs • Light-Weight Development Methodology Requirement Capturing PIM and PSM modeling Analysis Code Generation Real-Time Embedded Software Interaction-based Behavior Modeling Large-Scale System Concepts Process ESUML Event-Based Modeling Hierarchical Decomposition Modeling Guidelines and Tips Use Case Diagram Class Diagram Interaction Overview Diagram Sequence Diagram Hardware Architecture Diagram Rules Notations
UCD A use case 1 Scenarios use case 2 IOD of UC1 use case 3 X Global IODs B C IOD of UC3 IOD of UC2 Global behavioral view Y SDs of X SDs of Z SDs of Y SDs of X ESUML Characteristics • Hierarchical Modeling • Interaction-based Behavior Modeling
PSM Modeling (To-Be) • PSM Modeling : Physical Model Automatically Recommended Partition + Modeler Enforced Partition
C JAVA C++ ESUML Code Gen. (To-Be) • Platform Independent Code Generation PIM/PSM Models CIC Code Generator
64 Rules ESUML Verification and Validation • Rule-based consistency analyzer • Intra-model, inter-model, action-language
ESUML Simulator Simulation result MicroOven.XMI File (4) output (1) input (3) output CFG Generation Simulation Engine (5) save (2) input SimMicroOven.txt (5) modify model (if needed) Define Scenario Feedback to Modeler
Contents • Introduction • Proposed Design Flow • Key Techniques • PeaCE Modeling • CIC & Generic API • Static Analysis • Status • Conclusion
PeaCE Model • Top model: Task model • Computation task model: Dataflow model • Extended SDF model: SPDF (Synchronous Piggybacked Data Flow) and FRDF (Fractional Rate Data Flow) • Control task model: FSM model • Hierarchical and concurrent FSM model: fFSM (flexible Finite State Machine)
PeaCE Project • PeaCE • Ptolemy extension as a Codesign Environmentbased on Ptolemy classic • open-source research platform • Officially released in DAC 2005. (version 1.0) • PeaCE home page • http://peace.snu.ac.kr/research/peace
2 2 3 3 3 3 3 Block Lib. 2 1 8 7 4 5 3 6 3 3 3 Seamless HW/SW Codesign Flow 1 1 Architecture Func . Algorithm spec. templates Simulation 4 4 3 3 3 3 Graph Analysis Profiling TimeCost.xml . clustered.xml DB PE Selection Partition & mapping 5 3 3 5 sched.xml 6 3 6 3 7 3 7 3 3 8 8 3 PE Code generation + OS API VHDL - code C - code C - code Comm. arch DSE 6 3 6 3 Archi.xml IF Code generation 7 3 3 7 3 3 8 8 Coverification & Prototyping
H.264 Decoder if-then-else construct
SPDF Model • Basic SDF Model • A node represents a function block (ex: FIR, DCT) • A block is fireable (executable when it receives all required number of input samples) • Statically scheduled at compile-time: sample rate inconsistency, deadlock condition • Our extensions • Generic message type data -> code generation only • FRDF: fractional rate data flow • SPDF: synchronous piggybacked dataflow
1/99 1/99 The macro block is 1/99 of the video frame in its size. For each execution of the ME node, it consumes a macro-block from the input frame, and produces a macro-block output. Fractional Rate Dataflow ME EN 1 1 1 16x16 Schedule : 99(ME,EN)
FRDF (Fractional Rate DF) 1/99 1 Read From Device Motion Estimation Macro Block Encoding Variable Length Coding 1/99 1 1 1 1 1/99 1/99 Macro Block Decoding Motion Compensation Write To Device 1 1 1 1 1/99 Reference Code SDF FRDF Buffer size 361KB 686KB 225KB No. of frame Type data 5 8 3
State name 1st iteration 2nd iteration “gain” 10 20 …. C Synchronous Piggybacked Dataflow • Global data structure with limited access • Each data sample is piggybacked with an SU request • A block updates its internal parameters depending on applicability of SU request SU “gain”= 10 “gain”= 5 PD A B piggyback C data
x c x c PB A B A B MUX 1 0 Mux y y SPDF (Synchronous Piggybacked DF) • Piggybacking the control variable • Special super-block definition for the body of DC
a B a ac’ a AC BC A B bc’ A BC A c/u b C D ac bc a’c b’c b/v c C D a b/v b’c/u AD BD b/v BD b fFSM (flexible FSM) Model • Hierarchical FSM • Concurrent FSM
start suspend stop 1 text exit fFSM + SPDF suspend==1 H.263 Decoder divxread divx run divx suspend AVI Reader MP3 Decoder script in divxrun {run divx} script in divsuspend {suspend divx} exit start==1 divx scripts in divx {deliver ui filename divxread filename} {run divx} stop==1/ tostart=1 divx text==4 script in divxstop {stop divx} exitDivx ==1 divxstop start script in exit state {get divx exit exitDivx} tostart==1
Task Model • Task execution semantics • periodic, sporadic, functional • Port properties • data rate : static or dynamic • data size : static or variable • port type : queue or buffer Execution Type periodic port semantic H.263 decoder Basic Task Block (SDF or FSM) AviReader MP3 decoder Task Wrapper
CIC Code Generation • Application tasks • Task in task-level specification model task_name.cic • Generic API • Partitioning considering data parallelism • openMP programming void h263decoder_go (void) { ... l = MQ_RECEIVE("mq0", (char *)(ld_106->rdbfr), 2048); ... # pragma omp parallel for for(i=0; i<99; i++) { //thread_main() .... } // display the decode frame dither(frame); }
CIC Format Architecture Task Code Hardware Kernel code _init() _go() _wrapup() Constraints Structure Generic API Parallel API
DivX player CIC xml (Hardware) <?xml version="1.0" ?> <CIC_XML> <hardware> <processor name="arm926ej-s0"> <index>0</index> <localMem name=“lmap0”> <addr>0x0</addr> <size>0x10000</size> // 64KB </localMem> <sharedMem name=“shmap0”> <addr>0x10000</size> <size>0x40000</size> // 256KB <sharedWith>1</sharedWith> </sharedMem> <OS> <support>TRUE</support> </OS> </processor> <processor name="arm926ej-s1"> <index>0</index> <localMem name=“lmap1”> <addr>0x0</addr> <size>0x20000</size> // 128KB </localMem> <sharedMem name=“shmap1”> <addr>0x20000</size> <size>0x40000</size> // 256KB <sharedWith>0</sharedWith> </sharedMem> <OS> <support>TRUE</support> </OS> </processor> </hardware>
LocalMem TCM P1 P2 P1 P2 P3 Shared Mem SM1 SM2 Hardware Architectures 0x0 0x0 shared=NO shared=NO 0x10000 0x20000 shared=YES shared=YES 0x50000 0x60000
DivX player CIC xml (Constraint) <constraints> <memory>16MB</memory> <power>50mWatt</power> <mode name="default"> <task name="AviReaderI0"> <period>120000</period> <deadline>120000</deadline> <priority>0</priority> <subtask name="arm926ej-s0"> <execTime>186</execTime> </subtask> </task> <task name="H263FRDivxI3"> <period>120000</period> <deadline>120000</deadline> <priority>0</priority> <subtask name="arm926ej-s0"> <execTime>5035</execTime> </subtask> </task> <task name="MADStreamI5"> <period>120000</period> <deadline>120000</deadline> <priority>0</priority> <subtask name="arm926ej-s0"> <execTime>13</execTime> </subtask> </task> </mode> </constraints>
DivX player CIC xml (Structure) <structure> <mode name="default"> <task name="AviReaderI0"> <subtask name="arm926ej-s0"> <procMap>0</procMap> <fileName>AviReaderI0_arm926ej_s0.cic</fileName> </subtask> </task> <task name="H263FRDivxI3"> <subtask name="arm926ej-s0"> <procMap>0</procMap> <fileName>H263FRDivxI3_arm926ej_s0.cic</fileName> </subtask> </task> <task name="MADStreamI5"> <subtask name="arm926ej-s0"> <procMap>0</procMap> <fileName>MADStreamI5_arm926ej_s0.cic</fileName> </subtask> </task> </mode> <queue> <name>mq0</name> <src>AviReaderI0</src> <dst>H263FRDivxI3</dst> <size>30000</size> </queue> <queue> <name>mq1</name> <src>AviReaderI0</src> <dst>MADStreamI5</dst> <size>30000</size> </queue> </structure> </CIC_XML>
Task Code: Structure • Task kernel code • _init(): before main loop • _go(): in the main loop • _wrapup(): after main loop (example) h263dec.cic // header file // global declaration and definition // procedure definition h263dec_init() { … } h263dec_go() { … } h263dec_wrapup() { … }
Generic API • Generic API • OS-indepednent API • API translator to OS-specific APIs • Generic API Definition • Abstraction from IEEE POSIX 1003.1-2004 • OS services: • Process, memory, synchronization, file system, networking, real-time, thread, I/O, timer, etc. • C library functions based on use frequency • stdio.h, stdlib.h, time.h, signal.h
OS API Translator • OS API Translator CIC’s Task Code APIs, Patterns, and Rules register modify OS API Translator Translation pattern and rules Code Analysis • Pattern mapping • - 1-to-1, 1-to-m, n-to-m • - initialize/finalize • - acquire/release • parameters • - type, number, value • Variable names • - naming of additional variables Generic APIs Transformation Rules Symbol Table Construction POSIX APIs Non-POSIX APIs Transformation OS-Specific C Code