330 likes | 497 Views
ASIC Multimedia Chips and a Short Review of Section for Low Power Multimedia in ISSCC 2006. Mentor: Dr. Fakhraii By: Masoud Rostami,. Agenda. PRAMs Multimedia ASIC Chips Multimedia Processors in ISSCC06 Summery References. Phase-Change Memory.
E N D
ASIC Multimedia Chips and a Short Review of Section for Low Power Multimedia in ISSCC 2006 Mentor: Dr. Fakhraii By: Masoud Rostami,
Agenda • PRAMs • Multimedia ASIC Chips • Multimedia Processors in ISSCC06 • Summery • References
Phase-Change Memory • PRAM (Phase-Change Random Access Memory) is attracting great interest as the candidate for the next generation of non-volatile memory devices. • The cell material used in PRAM isa Chalcogenidealloy (Ge2Sb2Te5 or GST( which takes either low resistivity polycrystalline phase (SET State for ‘0’) or high resistivity amorphous phase ( RESET state for ‘1’) • Conversion between two phases is realized by resistive heating. • To write a GST cell to RESET state, GST compound is heated above the melting point and quenched rapidly. To write a GST cell to SET state, GST is heated to a temperature between the crystallization and melting point, for a period of time which is long enough to crystallize the GST. • Note: Chalcogenide is the same material utilized in re-writable optical media (such as CD-RW and DVD-RW).
Multimedia ASIC Chips • Due to rapidly Changing Standards and Technologies • Adaptation to new standards is the key factor ofSuccess • The life span of each HW is shorter and shorter. • It might be a feast this year but a famine next year. • When a new processor released for new standard, it makes huge profits. • The product (old processor) is out of date and a new processor for latest standards is still under development. • Therefore, the flexible and Versatile hardware is required.
Example • SAA7215, SAA7216, SAA7221, SAA7214 by Philips semiconductors (QFB208): It was announced in January 2001 and it was discontinued in March 2002.
Solution: Configurable Video Processors • Since the standards keep changing: • The solution might be powerful core DSP together with Flexible parts. • Look for stable core and flexible components so that they can survive during minor revisions of majorrevisions. • The ideal goal is to survive during Digital Video revisions. (DVB-S, DVT-T, DVB-S2, HDTV,…) • We should let consumer get access to any kind of peripheral that is possible: Ethernet, USB, IDE, UART, IrDA,… • It should support as much as standards and stream that is possible.
Philips: PNX8526 [3]
Continued.. • It was designed in 0.12 um technology. • TriMedia is a Philips internal microprocessor core with a proprietary architecture. It is a VLIW with an instruction set optimized for digital media processing. One implementation is the Philips-internal TM3270 synthesizable RTL core. • Nexperia is a product line of chips based on a Trimedia processor with specific-application targeted peripherals. Nexperia chips have part numbers beginning with PNX [3]
Philips Chips • PNX8526:analog/digital television chip including a 266 MHz MIPS CPU processor core and a 240 MHz TriMedia processor core supporting demux and decoding of SDTV MPEG-2 Main profile and Main level and HDTV MPEG-2 Main profile and High level, with scaling and de-interlacing up to 1920x1080 resolution at 60 interlaced fields/second or 1368x720 resolution at 60 progressive scan frames per second. • PNX010x: portable audio and multimedia player chip based on ARM7 or ARM9 processors with a NAND flash memory and hard drive interfaces. • PNX1500: media processor based on the TriMedia TM2360 VLIW processor core running at 300 MHz with an LCD display controller and ethernet interface.
Continued • PNX1700: with features similar to the PNX1500 but based on the TriMedia TM5250 CPU core with software support for H.264, MPEG-4 (SP, MVP, ASP), WMV9, DivX, and MPEG-2 with support for HDTV resolution decode of MPEG-2, WMV9, and DivX (but not H.264). • PNX4103: software programmable mobile multimedia processor, capable of H.264 (unspecified Profile) decode at D1 (SDTV) resolution with stacked DRAM and support for direct and RAM-buffered display interfaces. • PNX7100:DVD recorder chip with MPEG-2 encoding and decoding for interlaced video includes a MIPS Technologies 133 MHz MIPS32 system controller processor core with additional support for progressive scan video, fabbed in a Philips 0.12 um process.
others • NEC: • uPD61126: MPEG-2 decoder supporting multiple streams at standard television resolution with noise filters and a range of standard video interfaces based on 2 MIPS Technologies 4Kc cores with enhanced security features for set-top boxes • BroadCom: • BCM2722: Video Core II Multimedia Processor, used in the Apple Video iPod, is capable of MPEG-4 video encode and decode with design for low power consumption for battery powered devices. The package contains a stacked 32 megabit SDRAM, a USB 1.1 slave interface, a camera interface for up to 5M pixels, and an LCD controller interface among other interfaces. The BCM2722 is manufactured in a 0.13um process technology. • BCM3560:
Low Power Multimedia Section of ISSCC06 • With the availability of increasing data bandwidth, there is a greater demand for much more advanced multimedia processing capabilities, which in turn translates to higher computational and storage requirements on these devices. Compounding this challenge is the ever increasing demand for mobility, dictating that these multimedia functions be performed at the lowest levels of power consumption. • The seven papers in this session focus on recent advances in low power multimedia processing integrated circuits that deliver advanced functionality, such as 3D graphics, high resolution still and video encoding/decoding, and high fidelity audio playback. Results from these papers demonstrate that smart architecture design and implementation techniques, in conjunction with advanced process technology, can deliver very high performance multimedia functionalities at very low power consumption levels.
6.33mW MPEG Audio Decoding ona MultimediaProcessorin 0.18u Technology • Techniques to realize a Low Power Multimedia: • A parallel processing DSP for low voltage operation • Multi-Power Domain • A conditional pre-charge FF.
Pipelining =>Low-frequency => Low-voltage • By making use of hardwired functional blocks and parallel and pipelined processing, the required operating frequency for MPEG decoding can be lowered to 30MHz. As a result, the voltage supply for MPEG decoding can be reduced to 1.1V from 1.8 and 1.3V, which is especially effective at reducing the dynamic power dissipation. The dynamic power is reduced by 62.7%. [4]
Multi-Bus Architecture • To obtain the high bandwidth data flow necessary for multimedia signal processing, a multiple-bus architecture is applied. The multiple-bus is comprised of one high-speedbus and 3 peripheral buses. The main bus connects data transferextensive blocks, such as the hardwired dedicated DSP, memory card IF, USB2 PHY, etc. )288 MB/s.) The peripheral buses connect serial ports, timers, ADC, etc. )72 MB/s(. With this multi-bus architecture, high-capacity data can be effectively transferred without causing any conflicts with slow data. External memories are connected via an external memory controller.
A conditional pre-charge FF • In this circuit structure, the clock signal (CLK) is gated by the input signal (D, Db) so that there are only a minimumnumber of node changes even if data changes as shown inFig. 22.7.3. With a conventional flip-flop, a lot of nodes changeuniformly when the clock signal is toggled, and as a result, large power is dissipated. Therefore the proposed conditional precharged flip-flop can reduce the dynamic power dissipation associated with the clock signal compared with the conventional flipflop. [4]
A conditional pre-charge FF • the power dissipation of the flip-flop consists of two parts: • the power dissipation owing to the transition of clocksignal(CK) • the power dissipation owing to transition ofthe data signal (D) [4]
Multi-Power Domain • This processor has a multi power domain that is divided into 6 parts. Each part is connected to an individual 1.1V power supply that can be turned off. For example, in the case of AAC decoding, three power domains are turned off. [4]
Chip Micrograph [4]
A 5mW MPEG4 SP Encoder with 2D Bandwidth- Sharing Motion Estimation for Mobile Applications • MPEG-4 codec designs [1-2] have been reported that address the low power requirements demanded by mobile devices. • Three sources consume most of the power in an MPEG-4 encoder: • Motion estimation (ME) consumes more than a half of the total power, in general, because of its high memory access requirements. • Secondly, the discrete cosine transform/inverse discrete cosine transform (DCT/IDCT) consumes power because of complex computations. • Data buffering between motion estimation/motion compensation (ME/MC) and quantization/variable length code (Q/VLC) consumes power because of the SRAM accesses.
System Architecture • At the module level, the design focuses on ME and DCT designs to reduce power consumption. At the system level, the design reduces the amount of data buffering between Q/VLC. [5]
DCT Architecture • Most DCT coefficients become zero after quantization, so the precision of these coefficients is less important. These can be calculated with less precision to save power, and ideally little drop in quality. A DCT design is adopted that depends on the content to decide the required precision. It consumes less power for lower-precision calculations reducing the total power consumption. [5]
DCT & Zero Marker Scheme • A classifier circuit decides the allocation of calculation resources. It is based on the value of the pixel-to-pixel amplitude (PPA) and the quantization parameter (QP). After classification, the number of calculation bits is decided. Both clock and combinational circuits are shut down for any unused additional bits. The quality degradation due to reduced precision is less than 0.1dB compared with a normal DCT. • A zero marker scheme is adopted to reduce the data access of the SRAM buffer between stages. The buffered data for VLC is quantized, and they are mostly zero. For every four entities stored in SRAM, a one bit register is used to record if they are all zeros. If this occurs, no reading and writing is required. This mechanism avoids most buffer accesses between the Q stage and VLC stage. It can save 86% of data buffering in low bit rate and 62% in high bit rate mode depending on the sequences
Characteristics [5]
Die Micrograph [5]
A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications [6]
A 120Mvertices/s Multi-threaded VLIW Vertex Processor for Mobile Multimedia Applications [7]
A 120Mvertices/s Multi-threaded VLIW Vertex Processor for Mobile Multimedia Applications [7]
Summery • PRAMs seems to be a promising field for non-volatile memories • To survive in multimedia ASIC industry, we must give to consumer some flexibility (configurability) and also some versatility. • In ISSCC06, these techniques have been used for lowering the power consumption while not violating the performance (in Multimedia Section): • Multi-power domains • parallel processing for low voltage operations • Conditional pre-charge DFFs • Multi-threading • zero-marker scheme • precision-aware DCT/IDCT block • …
References • S. Kang, et al, “A 0.1um 1.8V 256 Mb 66MHz Synchronous Burst PRAM”, ISSCC2006 • H. R. Oh, et al, “Enhanced Write Performance of a 64Mb Phase-change Random Access Memory”, ISSCC2005 • “PNX8526 Datasheet”, Philips Semiconductors • Y. Ueda, et al, “6.33mW MPEG Audio Decoding on a Multimedia Processor”, ISSCC2006 • C. P. Lin, et al, “A 5mW MPEG4 SP Encoder with 2D Bandwidth- Sharing Motion Estimation for Mobile Applications”, ISSCC2006 • T. M. Llu, et al, “A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications”, ISSCC2006 • C. H. Yu, et al, “A 120Mvertices/s Multi-threaded VLIW Vertex Processor for Mobile Multimedia Applications”, ISSCC2006