580 likes | 735 Views
ICCAD ’ 03 Review. CSE 597B Lin Li. Outline. Overview Archive download URL Best paper award Paper from our group Interesting tutorial Paper in related areas Power and energy optimization Interconnect-centric SoC design Reliable issue Performance optimization
E N D
ICCAD’03 Review CSE 597B Lin Li
Outline • Overview • Archive download URL • Best paper award • Paper from our group • Interesting tutorial • Paper in related areas • Power and energy optimization • Interconnect-centric SoC design • Reliable issue • Performance optimization • Simulation at the nanometer scale • Other areas in ICCAD
Archive Download URL • Papers and presentation slides can be downloaded from: http://www.iccad.com/archive.html
Best Paper Award • 6C.1 - Noise Analysis for Optical Fiber Communication Systems • Alper Demir • KOC University, Sariyer-Istanbul, Turkey • 8B.1 - Block-Based Static Timing Analysis with Uncertainty • Anirudh Devgan, Chandramouli Kashyap • IBM Research at Austin, IBM Microelectronics
Paper from Our Group • 1A.1 - Adaptive Error Protection for Energy Efficiency • Lin Li, N. Vijaykrishnan, Mahmut Kandemir, Mary Jane Irwin • 3C.1 - Array Composition and Decomposition for Optimizing Embedded Applications • Guilin Chen, Mahmut Kandemir, Ugur Sezer, Avanti Nadgir
Interesting Tutorial • 2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology • Kerry Bernstein, Ching-Te Chuang, Rajiv V. Joshi, Ruchir Puri • IBM T.J. Watson • 11B.1 - Formal Methods for Dynamic Power Mangement • Rajesh K. Gupta, Sandeep Shukla, Sandy Irani • UCSD, UCI, and VT
2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology • Introduction • CMOS device scaling • New devices for high-performance logic • Planar device structures • Partially-depleted (PD) SOI • Fully-depleted (FD) SOI • Strained-Si & high-k gate • Emerging technologies • Double-gate MOSFETs • 3D integration and interconnects • Carbon Nanotube Transistor (CNT) • Molecular computing • CAD challenges • Challenges of Advanced device technologies • Major issues • Power crisis • Coping with Variability
2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology (Cont’d)
11B.1 - Formal Methods for Dynamic Power Mangement • Overview the formal methods that have been explored in solving the system-level Dynamic Power Management (DPM) problem. • Show how formal reasoning frameworks can unify apparently disparate DPM techniques. • Approaches that treat the DPM problem as one of stochastic optimization with probabilistic guarantees on performance.
Power and Energy Optimization • Using dynamic voltage scaling in embedded systems (Section 1B) • Using software techniques in embedded systems (Section 3C) • Energy issues in systems design (Section 7B) • Power-aware design (Section 8C)
1B.1 - Generalized Network Flow Techniquesfor Dynamic Voltage Scaling in Hard Real-Time Systems • Vishnu Swaminathan, Krishnendu Chakrabarty ECE@Duke • Energy consumption must be carefully balanced with real-time responsiveness in hard real-time systems. • Present an optimal offline dynamic voltage scaling (DVS) scheme for dynamic power management in such systems.
lij, uij, Cij, mij i j Jobs Speeds Intervals s1h c1h,c1h,Vh2,1 0, ,0,1 j1 D1 0,1,Vl2c1l-Vh2c1h,c1l-c1h 0, ,0,1 s1l 0, D1,0,1 0,1,Vi2c1i-Vh2c1h,c1i-c1h c1h,c1h+1,Vh2,1 D2 . . . . 0, D2,0,1 . . . . s1i t s snh D2n-2 snl jn sni D2n-1 Generalized Network Flow Models for the DVS problem
1B.2 - Approaching the Maximum Energy Saving on Embedded Systems with Multiple Voltages • Shaoxiong Hua, Gang Qu ECE@UMCP • For a multiple-voltage DVS system to serve a set of applications {(ei, di, pi): i=1, 2, …, n} without missing their deadlines, • if the system has m voltages {v1, v2,… ,vm}, determine the value of each vi to minimize the energy consumption. • determine m and the value of each vi.
1B.2 - Approaching the Maximum Energy Saving on Embedded Systems with Multiple Voltages (Cont’d) • Voltage set-up is the fundamental problem for multiple-voltage DVS system. • application-specific • 2-voltage DVS system: analytic solutions and a linear search algorithm • m-voltage DVS system: analytic solution does not exist, an approximation method • Multiple-voltage can be very close to the maximal energy saving by DVS.
1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems • Le Yan, Jiong Luo, Niraj K. JhaEE@Princeton • New scheduling algorithm that combines DVS and adaptive body biasing (ABB) to simultaneously optimize both dynamic power consumption and leakage power consumption for real-time distributed embedded systems.
1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems • A novel two-phase approach Phase I Optimal tradeoff between supply and threshold voltages Phase II Trade off energy consumption and clock period
1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems Initializations Phase I No Extensible tasks exist? Yes Return Allocate slack to reference task Phase II Reference task: highest energy_derivative Allocate slack to each other task energy_derivative: higher than reference level No EST+WCET>LFT? Yes Invalidate this slack allocation
3C.3 - Energy Optimazation of Distributed Embedded Processors by Combined Data Compress ion and Functional Partitioning • Jinfeng Liu, Pai H. Chou ECE@UCI • Goal • Energy minimization for distributed embedded processors • Combined optimization • Selection of optimal compression algorithm • Functional partitioning
PROC1 150MHz A bad partitioning scheme that produces extra I/O load,without compression N1 SEND1 RECV1 IDLE D PROC2 150MHz SEND2 RECV2 N2 IDLE D Non-optimal without compression D However, it could turn out optimal with compression, if the data from N1 to N2 can be compressed well. N1 SEND1 RECV1 80MHz DECO1 COMP1 PROC1 IDLE D SEND2 RECV2 80MHz N2 Optimal with compression DECO2 PROC2 COMP2 IDLE D 3C.3 - Energy Optimazation of Distributed Embedded Processors by Combined Data Compress ion and Functional Partitioning
3C.4 - Energy-Aware Fault Tolerance in Fixed-Priority Real-Time Embedded Systems • Ying Zhang, Krishnendu Chakrabarty, Vishnu Swaminathan ECE@Duke • Goal: low power, fault-tolerant real-time systems • Fault tolerance is achieved via checkpointing • Power management is carried out using dynamic voltage scaling (DVS).
7B.1 - A Game Theoretic Approach to Dynamic Energy Minimization in Wireless Transceivers • Ali Iranli, Hanif E. Fatemi, Massoud PedramEE@USC • A hierarchical formulation for energy optimization of wireless transceivers is proposed • A game theoretic approach to solve this energy minimization is proposed by which the energy consumption is reduced by 15% for BER = 10-5 • The proposed hierarchical frame work can be used in general for energy optimization of server-client systems
Transmitter Leader Receiver Follower Transmit Power& Modulation level Leader’sPolicy Leader’scost function Overall energy consumption Follower’sPolicy Truncation length Follower’scost function Receiver's energy consumption 7B.1 - A Game Theoretic Approach to Dynamic Energy Minimization in Wireless Transceivers Transceiver Energy Optimization Stackelberg Game
7B.2 - Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization • Girish V. Varatkar, Radu MarculescuECE@CMU • Recent work in ES community: performance and energy are crucial! • Voltage selection • Task scheduling algorithm should use the foresight that voltage selection is going to follow the scheduling step • Schedule should provide the maximum slowing down potential • This work brings the communication aspect into the picture • A ‘communication-centric’ approach • A ‘voltage selection’ approach
7B.3 - LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches • Praveen G. Kalla, Xiaobo Sharon Hu, Joerg Henkel CSE@Notre Dame • LRU to LRU-SEQ (Sequential LRU) • Constraining sequential fetches to the same bank (same way) avoids bank transitions. • It also increases the sleep time for the banks over-coming break-even time requirements. • LRU nature has to be maintained, else associativity is lost !! (hit-ratio is affected) • Distance between the last fetched line and the present line is a parameter that will affect the performance of this policy.
FOR (every cache access) DO IF (access == HIT) THEN P_way = C_way ELSE dist = abs(Curr_Addr, Prev_Addr); IF ( dist <= SEQ_DST) THEN C_way = P_way ELSE C_way = LRU_Way END END Update LRU state for access. END P_( ) : Previous_( ) C_( ) : Current_( ) 7B.3 - LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches State Holder 1: P_way (entire cache) State Holder 2 : P_line (each cache way)
7B.4 - Compiler-Based Register Name Adjustment for Low-Power Embedded Processors • Peter Petrov, Alex Orailoglu CSE@UCSD • Compiler-driven register name adjustment for low-power was proposed • Register names reassigned without incurring any performance or power overhead • No hardware support required whatsoever • Efficient algorithm for Register Name Adjustment proposed with additional frequency skew enhancing phase
8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches • Nam S. Kim, David Blaauw, Trevor N. MudgeEECS@UMICH • Cost- effective # of VTH for cache leakage reduction • depending on the target access time, but 1 or 2 high VTH’s is enough for leakage reduction • Cache leakage • another design constraint in processor design • trade-off among delay / area / leakage • Incorporating w/ realistic cache miss statistics for the leakage optimization
8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches Using high-k dielectric reduces gate-oxide leakage ITRS 2002 projections with doubling of # of transistors every two years
8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches cache sub-bank organization bit-line pair VTH2 Circuit model based on CACTI word-line VTH1 70nm Berkeley predictive technology model VTH3 decoder memory cell Abus buffer w/ repeater Interconnect R/C annotated repeaters used to minimize interconnect delay sense-amp w/ I/O circuits VTH4 Dbus buffer w/ repeater
8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips • Krishna Sekar, Kanishka Lahiri, Sujit Dey ECE@UCSD • Described design techniques for dynamically customizing a general-purpose configurable platform • Dynamic platform management helps combine benefits of general-purpose & application-specific approaches • Benefits • Improved application performance • More efficient platform resource usage • Improved energy efficiency
Platform Customization Techniques Customized Platforms 8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips General-Purpose Processors General Purpose Configurable Platforms Improving flexibility, time-to-market, engg. cost, time-in market, Domain Specific Platforms ASIC, Custom SoC Improving performance, power, size
Performance Objectives, Data Properties Performance Objectives, Data Properties Performance Objectives, Data Properties Power Constraints Application 1 Application 2 Application 3 Processing Requirements Processing Requirements Processing Requirements Dynamic Platform Management Optimized Platform Configuration General-purpose Configurable Platform Programmable Voltage Regulator Embedded processor PLD Programmable PLL On-chip communication architecture Flexible on-chip SRAM Re- configurable Cache Parameterized co-processor 8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips
1A.2 - SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips • Ruibing Lu, Cheng-Kok Koh ECE@Purdue • Single Arbitration, Multiple Bus Accesses • Automatically delivers multiple bus transactions • High bandwidth • Bus transactions can be performed even without explicit bus access grant from the arbiter • Communication latency increases only slightly even with high arbitration latency
M1 M2 M3 M4 Forward Sub-bus Backward Sub-bus 1A.2 - SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips Two sub-buses
1A.3 - The Y-Architecture for On-Chip Interconnect: Analysis and Methodology • Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng et.al. CSE@UCSD • The Y-architecture for on-chip interconnect is based on pervasive use of 0-, 120-, and 240-degree oriented semi-global and global wiring. • Communication capability (throughput of meshes) better than Manhattan architecture and X-architecture. • Better total wire length compared to both H and X clock tree structures and better path length compared to the H tree. • Achieve 8.5% less IR drop than an equally-resourced power network in Manhattan architecture.
1A.3 - The Y-Architecture for On-Chip Interconnect: Analysis and Methodology 7 x 7 meshes with different interconnect architectures.
3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation • Sanjay Pant, David Blaauw, Savithri SundareswaranUMICH, Motorola • Power Supply Integrity Issues • Functional Failure • Voltage fluctuations inject noise in the circuit • Performance Failure • Gate delay becoming increasing sensitive to supply voltage • ±10% variation in supply can result in 30% delay increase • Proposed Approach • Vectorless • Conservative in estimating worst-case drop/delay increase • Takes into account both IR and LdI/dt drops
Power Grid Worst- Case Timing Input Vectors Worst Voltage Drop Library Charac. STA i/p Vector Search Simulator 3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation • Voltage Drop Estimation • Worst Drop highly dependent on input vectors • Slow simulation times allow only a few vectors to be tried • Worst-Case Voltage Budget Analysis • Highly conservative • Worst-case drop is localized • Ignores voltage shifts between distant driver-receiver pairs
i(t) V(t) VDD Gate Delay Characterize POWER GRID V(t) VDD GND Variables GND GROUND GRID 3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation Divide Chip Into Blocks Compute Unit Pulse Response Express Delay/Voltage Using Spatial/Temporal Superposition Formulate Delay/Voltage Max. As Linear Optimization
5B.2 - Fault-Tolerant Techniques for Ambient Intelligent Distributed Systems • Diana Marculescu ECE@CMU • Novel techniques for harnessing redundancy as a way for increasing fault-tolerance • Assume a large number of networked devices • Idle devices can act as surrogates for failing ones via application migration or remapping • Scheduling techniques for optimizing system lifetime • Determine optimal migration schedule, under realistic battery models
8C.2 - Dynamic Fault-Tolerance and Metrics for Battery Powered, Failure-Prone Systems • Phillip Stanley-Marbell, Diana MarculescuECE@CMU • Introduce the concept of adaptive fault-tolerance management for failure-prone systems, and a classification of local algorithms for achieving system-wide reliability.
5B.1 - Cache Optimization For Embedded Processor Cores: An Analytical Approach • Arijit Ghosh, Tony Givargis CS@UCI • An efficient algorithm to directly compute cache parameters satisfying desired performance criteria.
5B.3 - Performance Efficiency of Context-Flow System-On-Chip Platform • Rami Beidas, Jianwen Zhu ECE@Toronto • A new programming model, called context-flow, that is simple, safe, highly parallelizable yet transparent to the underlying architectural details.
7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation • Iris Bahar, Joseph Mundy, Jie Chen Brown • Based on Markov random fields • Propose a new architectural framework designed to handle faulty processes prevalent with nanoscale devices • Dynamically defect tolerant • Adapts to errors as a natural consequence of probability maximization • Removes need to actually detect faults • Can handle both structure- and signal-based faults
On Junction Off Junction Carbon Nanotubes 7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation • Carbon Nanotubes (CNTs) • Excellent conductors • Diodes, FETs, and memory arrays using CNTs have been demonstrated • Physical placement of CNTs is an issue • Alumina substrates have been proposed to fabricate arrays of CNTs
7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation • Molecular devices • Direct use of molecules and their electronic states • Conduction achieved by changes in physical configuration or electronic state • Diodes and memory have been demonstrated additional electron switch on