1 / 130

Low Power Multimedia Reconfigurable Platforms

Explore the comprehensive elements essential for implementing high performance systems, focusing on high speed, reduced swing logic, low power consumption, and advanced technologies like deep submicron and low voltage channel engineering. Learn effective design methodologies to minimize power consumption in digital circuits, including voltage regulation, optimal clocking strategies, and logic design considerations. Discover techniques such as reducing switching activity, optimizing transistor usage, and maximizing energy efficiency while enhancing system performance.

Download Presentation

Low Power Multimedia Reconfigurable Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low Power Multimedia Reconfigurable Platforms Young-Chul Kim Chonnam National Univ. Dept. of ECE, IT SoC Lab. http://soc.chonnam.ac.kr

  2. High Performance System 구현을 위한 제반 요소 High Performance System High Speed High Density Reduced Swing Logic Deep Submicron Technology Low Power per Gate Low Voltage Channel Engineering Low Capacitance Low VT Advanced Technology

  3. 전력 소모에 대한 고찰 • Digital 회로에서 전력 소모의 구성 성분 • Dynamic power가 전력 소모에 있어 가장 큰 부분을 차지한다. • Library가 주어진 상태에서 설계자가 조절할 수 있는 요소는 activity, VDD, frequency, routing capacitance 네가지 이다.

  4. 전력 소모를 줄일 수 있는 설계 방법 • 공급 전압을 조절하는 방법 • IC 내에서 high speed가 필요한 곳에만 높은 전압을 사용한다. • 사용하지 않는 block에 대해서는 sleep mode로 전력 소모를 줄인다. • 동작 주파수를 낮추는 방법 • Parallel processing으로 같은 throughput을 얻으면서 동작 주파수는 낮춘다. 이로 인한 면적의 증가는 필연적이다. • 큰 clock buffer의 사용을 피한다. • Phase Locked Loop (PLL)을 사용하여 필요한 곳에만 주파수를 높여 사용한다.

  5. 전력 소모를 줄일 수 있는 설계 방법 • Parasitic capacitance를 줄이는 방법 • Critical node에 짧은 배선을 사용한다. • 3배 이상의 fan-out을 피한다. • 낮은 전압 사용시 배선의 폭을 줄인다. • 가능한 한 작은 크기의 transistor를 사용한다. • Switching Activity를 줄이는 방법 • Bit 수를 감소시킨다. • Dynamic 회로보다는 static 회로를 사용한다. • 전체 transistor 수를 줄인다. • 가장 active한 node는 internal node로 결정한다.

  6. 전력 소모를 줄일 수 있는 설계 방법 • Switching Activity를 줄이는 방법 • 각 node 에서 주파수와 capacitance의 곱의 합이 최소가 되도록 logic을 설계한다. 즉, switching activity가 통계적으로 최소가 되도록 한다. • Logic tree를 결정할 때, 입력 신호의 activity가 높을수록 VDD또는 ground에서 멀리 위치시킨다. • Activity가 큰 cell은 dynamic으로, activity가 작은 cell은 static으로 설계한다. • Data가 변하지 않는 flip-flop의 clock을 off 시킨다. • 항상 사용하지 않는 cell의 clock을 disable시킬 수 있도록 한다.

  7. ERE Framework • ERE illustrate the performance-energy tradeoffs by concurrently considering the performance improvement, energy savings, and resource-efficiency of a system. • i=base configuration with 1 resource • j=new configuration with N resource • ERE=•  (=fraction of the energy saved) ( =normalized efficiency) ={E(1, i)-E(N, j)}/ E(1, i) =S(N,j)/j•N S(N,j)=T(1,i)/T(N,j) ERE suggests 4 DSPs whereas EDP suggests 12DSPs without considering the efficiency

  8. ERE Framework values

  9. NoC (network on chip) U.C. Berkeley • 단일 반도체 칩 상에 통신망 구조를 이식 • OSI model에 의해서 전송 프로토콜을 정의 • DSP/microprocessor/Memory 등을 H/W-S/W co-design 이용 단일 칩 내에서 연결 • 코드 최적화 및 저전력 software IP 라이브러리 구축  • 모듈간 연결을 위한 버스 구조 • 구성 요소 • Region: 특수한 토폴로지/네트워크 구조를 허용하는 영역 • Backbone • Wapper : 전송되는 메시지를 적절한 형태로 변환, 복잡하다 • 복잡하고 대형 시스템에 적합

  10. 스위치 네트워크: CLICHE • OSI 모델을 데이터 전송 프로토콜로 사용 • 칩에 집적된 네트워크 (Network on Chip) • 패킷 데이터 전송 • 대형 시스템이 구성 요소 • 이종 구성 요소의 칩 레벨 집적에 유리하다.

  11. Scalability Efficiency Utilisation Fault tolerance Result quality (accuracy) Responsiveness Materials Structural Licencing Functional Production Control Effort Time Risk Applicability Coupling Cohesion Configurability Modularity NoC 의 figure of Merit Computation Energy consumption Storage Communication Functionality Capacity Performance System Quality Implementation Complexity Cost Variability Development Volume Flexibility Modifiability Lifetime Usability Manufacturability Programmability

  12. NoC의 저전력 문제

  13. NoC기반의 응용 분야 Low Power communication systems High-perforrmance communication systems Baseband platform High-capacity communicationsystems Personal assistant Database platform Data collection systems BACKBONE Multimedia platform Entertainment devices PLATFORMS Virtual reality games SYSTEMS

  14. NoC 설계 flow R. Marculescu

  15. Structural layers of NOC System control, product behaviour Product Network management, allocation, operation modes Configuration Applications Resource management, diagnostics, applications Functions Execution control, functions Executables RTOS, code, HW configurations Hardware units Processors, memorires, configurable HW, logic Resources Resource types, buses, IO Regions Region types, switches, network interfaces Communication Channels and protocols

  16. Application System/Session Transport Network Data link Physical Network protocol • Physical • 신호 전압, 타이밍, 버스 폭, 신호 동기 • Data link • 오류 검출 정정 • Arbitration of physical medium • Network • IP protocol • 데이터 라우트 • Transport • TCP 프로토콜 • End –to-end connection

  17. NOC Platform development • Scaling problem • How big NOC is needed? What are the application area requirements? • Region definition problem • What kind of regions are needed? What kind of interfaces between regions? What are the capacity requirements for the regions? • Resource design problem • What is needed inside resources? Internal computation type and internal communication? • Application mapping flow problem • What kind of languages, models and tools must be supported? How to validate and test the final products?

  18. NOC Application Development • Mapping problem • How to partition applications for NOC resources? How to allocate functionality effectively? Is the performance adequate? Is the resource usage in balance? • Optimisation problem • How to perform global optimisation of heterogenuous applications? How to define right optimisation targets? How to utilise application/resource type specific tools? • Validation problem • Are the contraints met? Are the communication bottlenecks or power consumption hot spots? How to simulate 10000 GIPS system? How to test all applications?

  19. Network on Chip alternatives NOC = Network of computation and storage resources NOC parameters: Number of resources Types of resources GPU DSP Memory Configurable HW Coprocessors Any combination Communication capability

  20. Layered Radio Architecture

  21. IFU mesh Smart Crossbar IFU mesh 스위치 네트워크 Srikanteswara • Stallion processor • Cross bar – circuit switching과 유사 • 패킷 데이터 전송 • 계층화된 전송 구조 Stallion device from Virginia Tech

  22. Advantages in the Layered Architecture • Defines the methodology to design multimode radios using hardware paging • Provides the framework for building a flexible soft radio at the expense of the overhead for packetizing data. • Excellent hardware reusability • Build libraries of hardware functions much like software’s • Good data flow properties and simple interface between the processing layer modules.

  23. Stream-based design Processing Processing Stream Packet Stream Packet Stream Packet Element 1 Element 2 Configuration Application Layer Software Pipeline Re- Constr. I/O Layer Interpret Processing Packet Pipeline Configuration Layer Packet Bypass Pipeline Processing Layer

  24. Bus-Based vs. P2P Communication R. Marculescu Buses Interconnections become dominant in DSM Huge bandwidth requirements (tens of Gb/s for some applications) (buses are not scalable!) Expanding market of mobile and other low-power applications Increasing cooling costs (buses consume too much power!) P2P Communication Faster; no bus contention, no bus arbitration Low-power solution Can be independently optimized May need more wiring resources

  25. System Inputs R. Marculescu A set of IPs: Hard IP (Width*length, provided by different IP providers) Soft IP (Size provided by synthesis or estimation) Communication Task Graph (CTG)

  26. Target Platform R. Marculescu

  27. MPEG-2 Video Encoder R. Marculescu

  28. Energy Comparison R. Marculescu

  29. Packet-Based On-Chip Communication: Regular Architecture R. Marculescu

  30. Energy-Aware Mapping for Tile-based Architectures R. Marculescu Objective: minimize the total communication energy consumption Constraint: meet the communication performance constraints (specified by designer) For a 4X4 tile architecture, 16! mappings

  31. Tile-based Architecture Platform R. Marculescu

  32. Network-centric Power Management R. Marculescu • Ability to make better predictions about the future workloads • Network power management adds very few overhead packets to the overall communication stream between cores • Amount of energy wasted while the core is idle is reduced, as the local PM knows ahead of time that no requests are arriving in near future

  33. NoC protocols must be tolerant to common faults R. Marculescu • Data upsets: Crosstalk, EMI • Buffer overflows • Node/link failures • Synchronization errors

  34. Wires-Centric Design • Exploits logic structure to reduce wire loads • Enables use of advanced circuits • wire properties and crosstalk known early and well characterized • Gives a stable design • key wire loads don’t change with small logic changes

  35. Wires dominate - power, area, delay • Problem - Contemporary tools leave wires as an afterthought • result is lack of structure, visibility, and control • Solution 1 - wires first design • route key wires, then place gates • Solution 2 - route packets, not wires • on-chip networks • global wires fixed before the design starts

  36. Wires-first design

  37. Replace dedicated global wiring with a shared network On-Chip Interconnection Networks Dedicated wiring Network

  38. Most Wires are Idle Most of the Time • Don’t dedicate wires to signals, share wires across multiple signals • Route packets not wires • Organize global wiring as an on-chip interconnection network • allows the wiring resource to be shared keeping wires busy most of the time • allows a single global interconnect to be re-used on multiple designs • makes global wiring regular and highly optimized

  39. Dedicated wires vs. Network

  40. Power consumption of CMOS circuits P =  · CL · f · Vdd2 +  · ISC · tsc·f · Vdd + IDC · Vdd + ILEAK · Vdd Charging & discharging Crowbar current Static current Subthreshold leakage current

  41. Vdd, power, and current trend 200 500 2.5 2.0 Voltage Power 1.5 1.0 0.5 0.0 Current Voltage Power per chip [W] VDD current [A] 0 0 1998 2002 2006 2010 2014 Year International Technology Roadmap for Semiconductors 1998 update

  42. New Computing Platforms • SOC power efficiency more than 10GOPs/w • Higher On Chip System Integration: COTS: 100W, SOAC:10W (inter-chip capacitive loads, I/O buffers) • Speed & Performance: shorter interconnection,fewer drivers,faster devices,more efficient processing artchitectures • Mixed signal systems • Reuse of IP blocks • Multiprocessor, configurable computing • Domain-specific, combined memory-logic

  43. Power-distribution in integrated PicoRadio (total: 100 mW) Jan M. Rabaey

  44. Web browsing is slow with 802.11 PSM Son! Haven’t I told you to turn on power-saving mode. Batteries don’t grow on trees you know! • Users complain about performance degradation But dad! PerformanceSUCKS when I turn on power-saving mode! So what! When I was your age, I walked 2 miles through the snow to fetch my Web pages!

  45. LOW Power Methods

  46. Hardware-software partitioning, System Power down Complexity, Concurrency, Locality, Algorithm Regularity, Data representation Parallelism, Pipelining, Signal correlations Architecture Instruction set selection, Data rep. Circuit/Logic Sizing, Logic Style, Logic Design Threshold Reduction, Scaling, Advanced packaging Technology SOI Level of Expected Saving Abstraction Algorithm 10 - 100 times 10 - 90% Architecture 20 - 40% Logic Level Layout Level 10 - 30% 10 - 30% Device Level Levels for Low Power Design

  47. System Level Power Optimization • Algorithm selection / algorithm transformation • Identification of hot spots • Low Power data encoding • Quality of Service vs. Power • Low Power Memory mapping • Resource Sharing / Allocation

  48. Flow • C/C++ Compilation • Program Execution • Building design representation • Loading profiling data • Setting constraints • Power estimation • Identification of Hot Spots

  49. Power-hungry Applications • Signal Compression: HDTV Standard, ADPCM, Vector Quantization, H.263, 2-D motion estimation, MPEG-2 storage management • Digital Communications: Shaping Filters, Equalizers, Viterbi decoders, Reed-Solomon decoders

  50. Clock Network Power Managements • 50% of the total power • FIR (massively pipelined circuit): video processing: edge detection voice-processing (data transmission like xDSL) Telephony: 50% (70%/30%) idle, 동시에 이야기하지 않음. with every clock cycle, data are loaded into the working register banks, even if there are no data changes.

More Related