780 likes | 1.31k Views
Lower Power VLSI Design Research Trends. VLSI Algorithmic Design Automation Lab. At SKKU J.D. Cho. Low power application to system design (L. Claesen).
E N D
Lower Power VLSI DesignResearch Trends VLSI Algorithmic Design Automation Lab. At SKKU J.D. Cho
Low power application to system design (L. Claesen) • introduce design considerations for battery powered portable embedded systems, including micromachined sensors, mixed analog/digital interfaces, • embedded processing and data processing and wireless communication.
Low Power Systems Design(J. Shin) • - Personal Communicator • - Wireless Devices • - Network Devices
CAD tools for power optimization • - Synopsys • - Cadence • - USC power estimation tools (M. Pedram) • PUPPET (PUrdue Power Estimation Techniques) (K. Roy) • OFFIS (D. Rabe)
Power modelling II (J. Figueras) • II.1 Models of Power consumption at RT Level. Examples. • II.2 Models of Power consumption at Algorithmic Level. Examples. • II.3 Hierarchical Models: Trends and applications.
Low power circuit techniques (Ch. Svensson) • Power modelling on circuit level. Node activity. Speed and supply voltage. Flip-flops • and latches. Logic. Driving large loads. Clocking and clock distribution, Low swing • circuit techniques.
Low power design at the logic level (Ch. Piguet) • reducing activity, parasitic capacitances as • well as supply voltage and operating frequency. At the cell level, branch-based logic, gate decomposition and asynchronous design of • flip-flops will be described. At the block level, gated clock synchronous machines, • asynchronous machines and logic parallelization of counters, memories and shift • registers will be demonstrated.
Power Analysis at System Level: Notebook Computers (K. Roy) • Comparative analysis of power drawn by subsystems (CPU, hard disk, • display, and standby) of several commercially available notebooks
Asynchronous design for low power (S. Furber) • Latch optimization is different depending on whether or not the data flowing through the pipeline is highly correlated. compare their power efficiency, or be set the challenge of designing the most power-efficient 32-bit pipeline latch.
Low-power embedded system design (S.Furber) • low-power embedded applications: PDAs, mobile phones, etc. power-efficient processor cores(ARM) • cache/memory organization for low power • power management • embedded system chips
High level optimization for low power (E. Macii) • use of parallel and/or pipelined structures, the choice of data representations, the exploitation of signal correlations, the synchronization of signals for glitching minimization, and an accurate analysis of the shared resources. At the algorithmic-level, on the other hand, power savings can be obtained by applying arithmetic and logic transformations to the block diagram specification
Adiabatic techniques (Ch. Svensson) • Basic idea of adiabatic computing. Energy dissipation in logic. T-gate and retractile • logic. Power regenerators. Reversible pipelines.
System-level lower power MPEG-2 decoder, H. Deman • H.263, 2-D motion estimation, MPEG-2 storage management for lower power
VLSI Signal Processing Design Methodology • DSP Preliminaries: Sampling, A/D and D/A conversion, Data-flow graphs , Transfer functions, Filter structures, finite-precision analysis • Adaptive Filtering Basics • VLSI Preliminaries: RTL, Logic and Transistor level modelling, Design • Verification, Physical Verification, Power, Area, Speed and Testability
VLSI Signal Processing Design Methodology :ECE497 • pipelining, parallel processing, retiming, folding, unfolding, look-ahead, relaxed look-ahead, and approximate filtering • bit-serial, bit-parallel and digit-serial architectures, carry save architecture • redundant and residue systems • Viterbi decoder, motion compensation, 2D-filtering, and data transmission systems
Applications • Signal Compression: HDTV Standard • Signal Compression: ADPCM, Vector Quantization • Digital Communications: Shaping Filters, Equalizers • Digital Communications: Viterbi decoders, Reed-Solomon decoders
SKKU VADA Lab’s 1999 Research Area VLSI Algorithmic Design Automation Lab. Mar. 1999 J.D.Cho
Low Power DCT/IDCTProcessor for H.261 • David Johnson, Venkatesh Akella, and Brett Stott, “Micropipelined Asynchronous Discret Cosine Transform (DCT/IDCT) Processor,”IEEE Transactions on very large scale integration (VLSI) systems, vol. 6, no. 4, december 1998
Lower Power FIR filter • Optimization Algorithms, 비동기화, Pipe-line 등 • Mahesh Mejendale, Sunil D. Sherlekar, G. Venkatesh “Low-Power Realization of FIR Filters on Programmable DSP’s” IEEE Transations on very large scale integration (VLSI) system, Vol. 6, No. 4, December 1998
비동기 비터비 복호기 • T.K.Troung, Ming-Tang Shin, Irving S.Reed, E.H.Satorihs, “A VLSI Design for a Trace-Back Viterbi Decoder”, IEEE Trans. Commun., vol.40, Mar. 1992 • Fettweis, G.H. Meyr, “High-Speed Parallel Viterbi Decoding Algorithm and VLSI-Architecture”, IEEE Communications, May. 1991
CDMA 모뎀의 복조부 ASIC 설계 및 통합설계방법 • W. Ye, R. Ernst, T. Benner, and J. Henkel, "Fast Timing analysis for Hardware-Software Co-synthesis," Proc. of ICCAD, pp. 452-457, 1993. • CDMA Searcher
Lower Power ADC • Image converter • Error Correction logic • Track and amplifier comparator • Lower power with maintaining speed constraint
Adiabatic switching technique을 사용한 Low Power 회로 • W.Athas, L.Sevensson, J.G.Koller, N.Tzartzanis, and E.Y.C. Chou, “Low-power digital systems based on adiabatic-switching principles”, IEEE Trans. On VLSI Systems, Vol, No.4 Dec.1994 • A.G.Dickinson and J.S.Denker, “Adiabatic dynamic logic”, IEEE J. Solid-State Circuits, vol.30, pp.311-355, Mar. 1995 • Chun-Keung Lo and Philip C.H Chan “ Design of Low-power differential logic using Adiabatic Switvhing Technique”, IEEE ISCAS. 1997.
Low power Asynchronous 연산회로 • 나눗셈기는 연산 과정상 recursive한 특징을 지니기 때문에 Asynchronous 방식으로 구현할 경우 면적과 전력 소모에서 효과적 • 1.Bellaouar Elmasry, “Low-Power Digital VLSI Design Circuit and Systems.” • 2.R. Puri. “Design of a Asynchronous Logic Circuits”, IBM T. J. Waterson Rearch Center, 1988.
고속연산과 전력 소모 감소를 위한 Multiplier • [1]R. Fried, “Minimizing Energy Dissipation in High-Speed Multipliers”, ACM, 1997 • [2]M. Suzuki et-al, “A 1.5 ns 32-b CMOS ALU in Double Pass-Transistor Logic”, IEEE JSSC vol. 28 no. 11, 1993
비동기 Cell Library(DCVSL)를 사용한 Self-Timed Multiplier • Y. Pang, W. Sit et-al, “An Asynchronous Cell Library for Self-Timed System Designs” ICICE Trans. Inf. & Syst.vol. E870 D, no. 3, Mar 1997 • L. Lavagno, Algorithms for Synthesis and testing of Asynchronous Circuits, Kluwer Acardemic Publishers, Boston 1993
Low-Voltage, High Speed Circuit Design for Gigabit DRAM’s • .Abdellatif Bellaouar and Mohamed I. Elmasy, “Low-Power Digital VLSI Design,” Kluwer academic, pp. 313. • .T. Yamagata et al.,”Low voltage circuit design techniques for battery-operated and/or Giga-scale DRAMs,” IEEE J. Solid-State Circuits, pp.1183-1188, Nov. 1995.
Wave-pipelining on FPGA, with 이재형 • Pipeline의 문제점 • Balanced partitioning • Delay element overhead • Tclk > Tmax - Tmin + clock skew + setup/hold time • Area, Power, 전체 지연시간의 증가 • Clock distribution problem • Wavepipelining = high throughput w/o such overhead =Ideal pipelining
FPGA on Wavepipeline(WP) • LUT의 delay는 다양한 logic function에서도 비슷하다. • 동일delay를 구성할 수 있다. • FPGA element delay (wire, LUT, interconnection) • Powerful layout editor • Fast design cycle
WP advantages • Area efficient - register, clock distribution network & clock buffer 필요 없음. • Low power dissipation • Higher throughput • Low latency
Disadvantage • Degraded performance(lower throughput) in certain case • difficult to achieve sharp rise and fall time in synchronous design • Layout is critical for balancing the delay • Parameter variation - power supply and temperature dependence
Future Work • WP multiplier는 delay를 조절하기 위한 LUTs의 추가가 많아서 전력소모 면에서 큰 이득은 보지 못했다. • FPGA에서 delay를 조절하기 위해 LUTs나 net delay를 사용하지 않고 별도의 delay 소자를 사용하면 보다 효과적 • 또한, 동일한 level을 가지는 multiplier를 설계하면 WP 구현이 용이하고 pipeline 구조보다 전력소모나 면적에서 큰 이득을 얻을 수 있을 것이다.
Reducing of bus transitions with bit-swapping for DSP w/ 김상규 • Programmable DSP에서 전력 소비를 감소시키기 위한 새로운 bit-swapping 알고리즘을 제안한다. • DSP같은 파이프라인 프로그래머블 프로세서에서 전력소비의 주성분을 형성하는 것은 외부/내부 버스이므로 모두에서 어드레스 전력 감소를 해야한다. • ALU의 입력버스에서의 전력소비 감소 알고리즘을 제안
Register 전력 모델 Power(Register)≡switching(x)Cin,Register+ switching(y) Cout,Register
Motivation • Bus switchings/transition은 bus-based system에서 power dissipation의 주성분을 형성한다. off-chip 구동에서의 power consumption은 전체 chip power의 70% • CPU와 메모리 에서 레지스터와 ALU같은 Datapath 사이의 상호연결을 하는 클럭라인과 내부버스를 포함한다. ALU 입력으로 제공되는 bus에서 signal transition을 감소시키는 bus에서의 전력 손실을 감소시킬 뿐만 아니라 ALU자체의 전력 소비를 감소시킨다
bitwise commutativity property • Y1 = 01000010 + 11001101 = A1 + B1 • Y2 = 10111001 + 01101010 = A2 + B2 • C1 = Y1(A2) EXOR Y2(B2) • bit-wise swapping Y2(A2) and Y2(B2) When bit in C1 is “1”
A New Bit Swapping • Y1 = 01000010 + 11001101 = A1 + B1 • Y2 = 10111001 + 01101010 = A2 + B2 • C1 = Y1(A1) EXOR Y2(B2) • C2 = Y1(B1) EXOR Y2(A2) • C3 = C1 NOR C2 • bit swapping Y2(A2) and Y2(B2)
실험 결과 • Alg1: 29% Alg2: 35% reduction • signal transition은 감소로 전체 power dissipation • register입력 bus에서의 signal transition 감소가 ALU에서의 power 영향 • 추가된 회로로 인한 전체 회로에 미치는 다른 문제점등을고려 • 응용분야의 적용부분 및 적용가능성 여부 검증. • simulation에서의 신뢰도를 높이기 위해 많은 testvector로 test.
Lower Power Data Encoding by Minimizing Switching Activity • 허프만 부호화 알고리즘에 의하여 발생된 압축률을 유지하면서 허프만코드를 재구성하여 스위칭 동작 횟수를 줄이는 방법 • 공통된 서브 시퀀스를 많이 갖는 서브 스트림에 그레이 코드와 같은 스위칭 횟수가 적은 부호화 방식을 채택하는 것이다. • RISC 인스트럭션 어드레싱 방식중 바이너리코드 어드레싱 방식에 비해서 그레이코드 어드레싱 방식을 사용할 경우 50%까지의 전력감축 효과를 나타낸다
Gray Code • 두 개의 n 차원(n bit)벡터 U = u_1, u_2, … , u_n 과 V = v_1, v_2, … , v_n 의 해밍 거리를 h(U,V) = SUM from i=1 to n (u_i, v_i ) 로 정의하자. 여기서 (u_i v_i ) 는 u와 v의 bit 값이 다르면 1이 되고 그렇지 않으면 0이 된다. 이것은 n차원 hypercube G의 변을 따라갈 때의 거리로 표현 할 수도 있다. Gray code = shortest path in G • 허프만 코드는 문자의 코드 길이가 다를 수 있으며 prefix-free코드를 유지하여야 하기 때문에 정확한 그레이 코드로 변환하는 것은 불가능하며 비트 변화량을 최소화하기 위한 압축 부호화가 필요하게 된다.
2-D Traveling Salesman Problem • 제안된 문제는 문자의 인접 빈도수가 많은 문자쌍에 해밍 거리가 작은 코드쌍을 할당하는 문제이기 때문에 두 개 이상의 TSP를 동시에 처리하는 새로운 문제로 표현된다. • 효과적인 Heuristic을제안함. • 10% reduction in switching activity for random • uncorrelated data