460 likes | 600 Views
Introduction to Multiprocessor System-on-Chip. Prof. Jan Madsen Informatics and Mathematical Modeling Technical University of Denmark Richard Petersens Plads, Building 321 DK2800 Lyngby, Denmark. bit-pattern. 001010100101101 101011101101010 001010011101101 110101001010011
E N D
Introduction to Multiprocessor System-on-Chip Prof. Jan Madsen Informatics and Mathematical Modeling Technical University of Denmark Richard Petersens Plads, Building 321 DK2800 Lyngby, Denmark
bit-pattern 001010100101101 101011101101010 001010011101101 110101001010011 101010101010001 111101010111101 010111101101010 mem CPU rom func if ... then ... else ... for { ... ..} Embedded systems io (c) Jan Madsen
Embedded systems • Systems which use a computer to perform a specific function, but are neither used nor perceived as a computer • They are embedded within larger electronic devices • Repeatedly carrying out a particular function • Often completely unrecognized by the device’s user (c) Jan Madsen
Several design groups hardware software hardware model software model validation validation hardware prototype software prototype Problems arise at a very late point in the design process Embedded systems design Separated validations Prototype realization (c) Jan Madsen
CPU void UnitControl() { up = down = 0; open = 1; while (1) { while (req == floor); open = 0; SW synthesis Interface synthesis ASIC if (req > floor) { up = 1;} else {down = 1;} while (req != floor); open = 1; delay(10); } } } HW synthesis Principples of Codesign void UnitControl() { up = down = 0; open = 1; while (1) { while (req == floor); open = 0; if (req > floor) { up = 1;} else {down = 1;} while (req != floor); open = 1; delay(10); } } } (c) Jan Madsen
Overview • Technology • Processors • IC fabric • Codesign for speed-up • component execution timing (SW and HW) • Building sub-system • Hardware/software partitioning • Building system • System-level issues of codesign (c) Jan Madsen
func if ... then ... else ... pe for { ... ..} Software • Elements of computation • Store data • Transform data • Move data (c) Jan Madsen
Processor func if ... then ... else ... for { ... ..} • Architecture components • Processing elements – transform data • Memories – store data • Interconnect – move data (c) Jan Madsen
inst mem controller datapath data mem ir cu func reg * pc +/- Processor: General Purpose func if ... then ... else ... for { ... ..} • Availability • Low cost (mass production) • Simple design flow • High flexibility (c) Jan Madsen
A[i] Processor: General Purpose - example p1 func if ... inst mem controller datapath data mem then ... else ... ir cu func for { ... ..} reg * pc +/- x = x + A[i] * p1 5 cycles (c) Jan Madsen
controller datapath cu mem + * +/- Processor: Custom (ASIC) func if ... then ... else ... for { ... ..} • High performance • Low power • Complex design flow • No flexibility (c) Jan Madsen
Processor: Custom (ASIC) – example p1 func if ... controller datapath then ... else ... cu mem A[i] for { ... ..} + * +/- x = x + A[i] * p1 1 cycle (c) Jan Madsen
inst mem controller datapath data mem ir cu func reg + * pc +/- Processor: Semicustom (ASIP) func if ... then ... else ... for { ... ..} • Costumized datapath – 16, 8 or 4 bit • Optimized for particular class of programs - MACC • ”Simple” design flow • High flexibility (c) Jan Madsen
Processor: Semicustom - example p1 func if ... inst mem controller datapath data mem then ... else ... ir cu func A[i] for { ... ..} reg + * pc +/- x = x + A[i] * p1 2 cycles (c) Jan Madsen
IC fabrics • IC is an interconnection of transistors following one of several possible styles – fabrics • The fabric defines how and when transistors are composed • ”the material of processors” • IC fabrics differ in terms of customizability and generality (c) Jan Madsen
IC fabrics: Custom • Exact implementation of processor components • High NRE cost – mask set ~ 1M$ (c) Jan Madsen
IC fabrics: Semicustom • Several semicustom fabrics • Library of standard cells • Cell arrays (sea-of-gates) • Most processing steps are pre manufactured (high volume) (c) Jan Madsen
IC fabrics: Programmable • Set of interconnected modules • Set of modules programmed to implement different components • FPGA • Programmable logic modules, storage and interconnect (c) Jan Madsen
Chips: Implementing IC fabric (c) Jan Madsen
func if ... then ... else ... for { ... ..} Hardware/software codesign? • Many possible mappings • Processor may not exist yet! • Exploring the design space • Need to estimate (c) Jan Madsen
Hardware/Software Codesign • Optimizing • Timing (high performance, hard deadlines) • Area (cost) • Power consumption • Flexibility • Reliability • ... • We will focus on timing (c) Jan Madsen
func if ... then ... else ... for { ... ..} Processing element timing • Execution path • Control data dependent • Input data dependent • Function implementation • Component architecture • Compiler or synthesis (c) Jan Madsen
å = × t ( F,pe ) (b ,pe ) c(b ) t j j i i pe pe I • bibasic block or program segment • tpe(bi,pej)execution time of bi on processing element pej • c(bi)execution frequency of bi • worst/best case timing bounds Formal execution path timing analysis b1 if ... b3 b2 else { ... } then ... for { ... ..} b4 (c) Jan Madsen
(b ,pe ) + * * t + * * j i pe software + model + - - hardware + * + * - Formal execution path timing analysis b2 then ... (c) Jan Madsen
PE D$ I$ SDRAM Flash RAM Memory models • Access time • Control overhead • Burst access (packets) • Cache • hit/miss time overhead • Based on execution history (c) Jan Madsen
Advanced architectures • Modern high performance processors includes architectural features which complicates timing analysis • Dynamic instruction scheduling • Speculative execution • Though fast, it makes • the processor very power hungry • tight bounds on timing very difficult • Computation less predictable • Issues which are important for embedded systems (c) Jan Madsen
processor ASIC Building sub-systems func if ... then ... else ... for { ... ..} • Initial codesign problem • Hardware/software partitioning • the LYCOS cosynthesis tool • Automatic partitioning from C (subset) and VHDL (single process) • Developed at DTU (c) Jan Madsen
Architectural choices • Which processor should be selected and how fast should it be? • Which ASICtechnology should be chosen and how fast should the ASIC be? • How large an ASIC can we afford and which functions should it execute? • How should the processor and ASIC communicate? (c) Jan Madsen
BB Specification Model SW HW Partitioning Model • Determines granularity and simplifying assumptions w.r.t. communication, HW sharing, etc (c) Jan Madsen
SW HW Lib Lib t t S H SW HW Estimator a a Estimator S H t C Com Com a Lib Estimator C Estimation SW HW (c) Jan Madsen
s(bi) sent data in bi r(bi) received datain bi c(bi) execution frequency of bi Communication time s(bi) and r(bi) determined by • data volume • Data encoding • Communication protocol Process communication b1 if ... b2 b4 else { send(...); receive(...);... } then ... for { ... ..} b3 (c) Jan Madsen
Solving the Partitioning Problem SW HW 1 2 3 4 5 6 Just try all combinations... (c) Jan Madsen
SW HW SW HW SW HW 1 1 1 2 2 2 3 4 3 3 4 4 5 5 5 6 6 6 7 Solving the Partitioning Problem Interleaved communication additive areas Parallel execution non-additive areas No communication interleaved exec. additive areas Knapsack Stuffing Large scale linear/nonlinear integer programming Heuristics needed! (c) Jan Madsen
LYCOS Design Flow Specification Functional Require Translate Analysis CDFG SW SW Estim. Model HW Partitioning HW Estim. Model Comm. Comm. Estim. Model CDFG SW Comm. HW Synthesis Synthesis Synthesis Assembler SW/HW Netlist (c) Jan Madsen
M P M M DSP M P CoP Building Systems • Platform architectures are heterogeneous • Different processing element types • Different interconnection networks and communication protocols • Different memory types • Different scheduling and synchronization strategies (c) Jan Madsen
Managing HW platform complexity • Development of APIs to hide complexity from application programmer and improve portability • Specialized RTOS to control resource sharing and interfaces • aComplex multi-level HW/SW architecture (c) Jan Madsen
application Software HW/SW Plattform CPU Timer Hardware Timer Periphery I/O Int Bus- CTRL Software architecture pe1 mem private application RTOS RTOS-APIs shared private private private drivers Cache Bus ce1 (c) Jan Madsen
Platform design challenges • Integration • Design process integration • Heterogeneous component and language integration • Design space exploration and optimization • Verification (c) Jan Madsen
Complex run-time interdependencies • Run-time dependencies of independent components via communication • Influence on timing and power • Need to handle resource sharing • Process/task scheduling • Communication scheduling • Scheduling strategies (static, dynamic, time or priority driven) PE PE CoP (c) Jan Madsen
PE Interdependency example • Complex non-functional interdependencies • Periodic task executing on PE • Task writes to bus at the end of each periodic execution Short execution time ahigh bus load long execution time alow bus load Local decision on improving performance may impact the global system performance (c) Jan Madsen
io router processor memory System-on-Chip challenge (c) Jan Madsen
a c M M d b M Network-on-Chip • Multi-hop • Segmented communication • Concurrency • Multiple simultaneous communications (c) Jan Madsen
a c M M d b M Network-on-Chip • Multi-hop • Segmented communication • Concurrency • Multiple simultaneous communications • Sharing • Quasi-simultaneous resource usage • Multiple communication events occupying some or all resources in an interleaved fashion (c) Jan Madsen
Platform-based design platform design specification IP platform re-design Mapping re-configure New design paradigme ... (c) Jan Madsen
thank you! (c) Jan Madsen