230 likes | 239 Views
CS244-Introduction to Embedded Systems and Ubiquitous Computing. Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010. CS244 – Lecture 5. Hardware/Software Co-design. Cost. Improving cost is desired. Improving performance beyond threshold Is a waste. Better.
E N D
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010
CS244 – Lecture 5 Hardware/Software Co-design Winter 2010- CS 244
Cost Improving cost is desired Improving performance beyond threshold Is a waste Better Better Performance Improving quality beyond threshold is desired Better Thresholds Quality Review: Design Objectives Winter 2010- CS 244
Co-design Flow Refine System Model Informal Specification System Simulation Algorithmic Design Hardware/Software Partitioning Partitioned Model Schedule HW/SW Co-simulation Partitioned Model & Sch. Winter 2010- CS 244
Co-design Flow Partitioned Model + Sch. Communication Synthesis Refine Software Model Hardware Model HW/SW Co-simulation Compilation Synthesis Binary Exec. Model Gate-level Model HW/SW Co-simulation Winter 2010- CS 244
Co-design Flow Refine Binary Exec. Model Gate-level Model Emulate or Prototype Fabrication Winter 2010- CS 244
Informal Specification & System Level Model • Informal Specification loosely defines high level behavior, constraints, and optimization objectives of the system • Algorithmic and implementation details absent • Performance estimates not present • System level model formally captures behavior, constraints, and optimization objectives • Can be simulated to obtain early performance estimates • Feedback to refine the system specification • Can serve as a golden model for validation of intermediate or final stages • Algorithmic design Winter 2010- CS 244
F {F1, F2, F3 … Fn} … P1 P2 P3 PM … Hardware Software Partitioning • Decompose (i.e., partition) the function F of the system into N sub-functions F1, F2, F3 … FN • Decompose the constraints and design objectives of the system into sub-constraints and design sub-objectives • Cluster F1, F2, F3, …, Fn into M partitions to run on M processors Winter 2010- CS 244
Scheduling • Scheduling is to obtain an execution sequence such that dependencies are obeyed • Static • During design time the schedule is fixed (the common case) • Dynamic • During execution time, the schedule is determined (reconfigurable computing) F1 F2 F4 F5 F6 F7 F3 F8 P1: F1 F2 F8 P2: F4 F5 P3: F3 F6 P4: F7 Winter 2010- CS 244
Scheduling • A deadline D for the entire schedule • An execution time for each Ti for each Fi • ASAP (as soon as possible) • ALAP (as late as possible) 3 3 F1 F2 F4 6 F5 4 2 F6 F7 3 F3 1 F8 3 P1: F1 F2 F8 P2: F4 F5 P3: F3 F6 P4: F7 Winter 2010- CS 244
Partitioning (Clustering) • Given: • F = { F1, F2, F3 … FN } • P = { P1, P2, P3 … PM } • Find a lowest cost partition (cluster), as computed by an objective function • Exhaustive approach O(MN) • Heuristics • Constructive partitioning (based on closeness function) • Random (good for seeding iterative approaches) • Cluster Growth • Hierarchical clustering • Iterative partitioning • Start with a partition and improve • Gradient search • Controlled random search • Modified Kernighan/Lin and FM algorithm • Partitions a set of nodes (functions) into two bins (processors) • Minimize edges between bins (communication cost, wires, etc.) • Cost function for moving a node from one partition to another • ILP • Genetic evolution • Simulated annealing Winter 2010- CS 244
Partitioning (Clustering) • Given: • F = { F1, F2, F3 … FN } • P = { P1, P2, P3 … PM } • Find a lowest cost partition (cluster), as computed by an objective function • Exhaustive approach O(MN) • Heuristics • Constructive partitioning (based on closeness function) • Random (good for seeding iterative approaches) • Cluster Growth • Hierarchical clustering • Iterative partitioning • Start with a partition and improve • Gradient search • Controlled random search • Modified Kernighan/Lin algorithm • Partitions a set of nodes (functions) into two bins (processors) • Minimize edges between bins (communication cost, wires, etc.) • Cost function for moving a node from one partition to another • ILP • Genetic evolution • Simulated annealing Winter 2010- CS 244
Iterative Partitioning Algorithms • The computation time in an iterative algorithm is spent evaluating large numbers of partitions • Iterative algorithms differ from one another primarily in the ways in which they modify the partition and in which they accept or reject bad modifications
Kernighan-Lin (Min-Cut) Algorithms • Two-way partitioning example • Start with 2 equal subgraphs • Exchange k pairs in each iteration • Continue until no further improvement • Gain function • f(internal – external) cost
Hierarchical Clustering – Example Winter 2010- CS 244
Alternate Partitioning Techniques • Start with all functionality in software and move portions into hardware which are time-critical and can not be allocated to software (software-oriented partitioning) • Start with all functionality in hardware and move portions into software implementation (hardware-oriented partitioning) Winter 2010- CS 244
More Partitioning Issues • Partitioning into hardware and software affects overall system cost and performance • Hardware implementation • Provides higher performance via hardware speeds and parallel execution of operations • Incurs additional design expense • Software implementation • Lower performance • Incurs high cost of developing and maintaining (complex) software Winter 2010- CS 244
Functional Co-simulation • Some of the M processors are single-purpose (e.g., those with a single function mapped on to them), others are general purpose • Functions mapped onto the general-purpose processors are implemented in software and simulated on virtual machines with performance models • Functions mapped onto the single-purpose processors are simulated at the behavioral level with performance models • Communication is done via abstract channels • Feedback is used to refine the partitioning and scheduling tasks Winter 2010- CS 244
Communication Synthesis & Bus-accurate Co-simulation • Abstract channels A1, A2 … An are mapped onto a set of communication channels C1, C2 … Cm • Similar to functional partitioning • Similar to hardware/software scheduling • Channels correspond to physical artifacts of the architecture • Hardware and software models are annotated with detailed communication constructs • A hardware model and software model is obtained and co-simulated • Communication synthesis (or possibly higher levels of design) are refined Winter 2010- CS 244
Compilation & Synthesis & Cycle-accurate Co-simulation • Compiler used to generate binary executables for general-purpose processors • Synthesis used to generate gate-level models of single-purpose processors • Synthesis used to generate gate-level models of general-purpose processors • Cycle accurate co-simulation of the entire system • Note: mixed level co-simulation is common Winter 2010- CS 244
Emulate/Prototype and Fabrication • Use hardware (e.g, FPGAs) to emulate a system as fast as possible (relative to real-time) • Fabrication • Place & route • Mask design • Chip testing • Manufacturing fault models • Test vector generation • Packaging Winter 2010- CS 244
Conclusion • Satisfying performance, cost, and quality metrics of a system entails hardware and software codesign • Partitioning is at the heart of codesign • Functional • Communication • Scheduling • Partitioning techniques • Constructive • Iterative • Heuristics often used to bound the running time Winter 2010- CS 244