330 likes | 381 Views
Partitioning in Hardware/Software Co-Design. Introduction Overview Of A Partitioner Issues Nature of Application Target Architectures Interplay Of Granularity and Estimation Closeness Metrics Cost Function Partitioning Tools Cosyma Lycos Case Study. Overview Of a Partitioner.
E N D
Introduction • Overview Of A Partitioner • Issues • Nature of Application • Target Architectures • Interplay Of Granularity and Estimation • Closeness Metrics • Cost Function • Partitioning Tools • Cosyma • Lycos • Case Study
Issues Involved during Partitioning Process • Nature of Application • Target Architectures • Interplay Of Granularity and Estimation • Closeness Metrics • Cost Function
Nature Of Application • Computation oriented systems • Workstations, PC’s or scientific parallel computers • Control Dominated Systems reacts to external events • Data-Dominated Systems • Complex transformation or transportation of data • Eg DSP or Router • Mixed Systems • Eg Mobile Phone or Motor Control
Architecture for control dominated systems • Each FSM mapped to a process • Small Variable set – FSM state • Short Program segments – FSM transitions • Explosion of states and transitions – Issue of Code Size • Shared Memory architecture • Optimizations – bit manipulations, few operation per state transition . • E.g.. 8051,Motorolla MC68332 , Siemen’s 80C166
Architecture for Data Oriented Systems • Emphasis on high throughput than short latency deadline • Large data variables – Memory optimization • Periodic behaviour of system parts • Static schedule • Transformations for high concurrency such as loop unrolling • Specialize control,data path and interconnect function units • Priori known address sequences and operations – Memory and address unit specialization • Eg: DSP Applications–ADSP21060,TMS320C80
Mixed Systems • Interconnected data and control dominated functions • Approaches • Heterogeneous systems – Independently controlled communicating specialized components • Computation application without specific specialization potential. • E.g. Printer or Scanner controller • Tailoring of less specialized systems to an application domain – Eg. Minimize power consumption or cost for a required level of performance • E.g.: ARM family , Motorolla Cold Fire family
Modern Embedded Architectures Highly multiplexed data path processors. • ASIPs. • Optimized for speed, performance, power characteristics of the application and can be reused and provide cost. • VLIW processors. • Network of horizontally programmable execution unit. • Commercial programmable DSPs( Harvard Arch). • Separate program and data memories. • Instruction set is tuned to multiply-accumulation Op.
Granularity Level • Coarse Grain Partitioning • Task / Process or Function level • Fine Grain Partitioning • Operator ,Statement or Basic Block Level • Even lower level of Assembly Language not useful – Based upon processor details
Fine Grain Granularity • Becomes important as processor performance and system software increases. • Less obvious , more difficult and time consuming and can have high overheads. • Communication time overhead. • Communication area overhead – May require buffers or memories. • Interlocks. • Change in efficiency of compiler optimizations , pipelines and concurrent units utilizations.
Coarse Grain Granularity • Limits parallelism • Reduces time and error during estimations • Better suited for manual partitioning
Closeness Metrics • Measures the likelihood that two pieces of specification are mapped on to the same system component. • Metrics. • Connectivity. • Measures no. of wires shared between two behaviours. • Communication. • Measures amount of data transferred between two behaviours. • Constrained Communication. • Measures communication metric between those behaviours with given performance constraints.
Common accessors. • Grouping of behaviours(or variables) accessed via subroutine calls and variable read/write by many of same behaviours reduces inter component communication. • Sequential Execution. • If two behaviours are defined sequentially in specification , mapping on to same processor does not affect performance. • Hardware Sharing. • Measures the amount of hardware that two behaviours can share. • Balanced Size. • Achieves a final partition of groups that are roughly balanced in hardware size.Otherwise above metrics lead to a single group.
Structural/Functional Partitioning • Functional Partitioning. • Partitions a functional specification into smaller sub-specifications and synthesizes structure for each. • Isolates a function to one part. • Reduces I/O. • Prevents critical path from crossing parts thus reducing clock period. • Yields simpler hardware , reducing clock period. • Complete control over I/O allowing tradeoff with performance. • Reduces synthesis tool times and memory usage.
Structural Partitioning. • A structure is synthesized for the entire specification and then partitioned. • Size and Delay can be estimated quickly and accurately. • It cannot satisfy both size and I/O constraints. • Placement and Routing can be done. more efficiently. • Not suitable for large systems.
Partitioning Algorithms • Random Mapping • Multistage Clustering • Hierarchical Clustering • Group Migration • Ratio Cut • Simulated Annealing • Genetic Evolution • ILP Formulation
Cosyma • Target Architecture • standard RISC processor core • a fast RAM for program and data with single clock cycle access time • an automatically generated application specific coprocessor. • Peripheral units must be inserted by the designer. • Processor and coprocessor communicate via shared memory in mutual exclusion
Granularity • Partitioning works at the basic block level. • Since communication between basic blocks of a process is implicit , partitioning requires communication analysis. • Simulate on an RT-level model of the target processor to obtain profiling and software timing information
Hardware/Software Partitioning • Input to partitioning are the ESG with profiling (or control flow analysis) information, the CDR-file and synthesis directives which include channel mapping directives, partitioning directives, and component selection. • Starts with an all software solution and tries to extract hardware components iteratively until all timing constraints are met. • The partitioning goals are • meet real-time constraints • minimize hardware costs • minimize the CAD system response time
Algorithm & Cost function • It uses Simulated Annealing, a stochastic optimization algorithm. • The total (estimated) costs of a single basic block b - assumed that it is moved from software to hardware - amounts to :
Continued…. • tsw(b)is estimated with a local source code timing estimation based on simulation data. • thw(b) is estimated with a list scheduler • tcom(Z U b)is estimated by data flow analysis
LYCOS • Supports an easy inclusion of new design tools and algorithms and new design methods. • It is built as a suite of tools centered around an implementation independent model of computation called Quenya, based upon communicating CDFGs.
LYCOS Partitioning Tool • Input Specification is in form of CDFG. • Granularity is chosen by the user interactively. • Different processor architectures whose technology files are present can be selected. • Dedicated hardware units are selected by loading the hardware library file which contains area,delay,latency,provided operations,storage capabilities etc.
Software execution time is estimated using CDFG and selected processor technology file. • Hardware execution time is estimated using a dynamic list based scheduling algorithm. • Partitioning is done using any of the selected algorithms. • Allows better design space exploration.