110 likes | 230 Views
Harmonic Inc. Real Time, Single Chip High Definition Video Encoder! December 22, 2004. Agenda. TVP2000 Processor overview Architecture highlights Performance Benchmarks Software tools Status Roadmap. Telairity’s Market. Leading supplier of Video chips
E N D
Harmonic Inc. Real Time, Single Chip High Definition Video Encoder! December 22, 2004
Agenda • TVP2000 Processor overview • Architecture highlights • Performance Benchmarks • Software tools • Status • Roadmap
Telairity’s Market Leading supplier of Video chips for the Broadcast, Professional and Digital Imaging markets
EagleEye - Video Encoder Module DIMM MODULE Interrupt Reset SPI 148.5 – 135 Mbps Serial Compressed Video Out TVP2000 Video Processor Video In 20 Bit YCbCr 20 Bit YCbCr Reconstructed Video Out Clk 74.25 – 67.5 MHz DRAM 512MB DDR2 Voltage Regulator +5 Volts
Video Controller TVP2000-Video Processor Processor P0 TVP400 Processor P1 TVP400 Processor P2 TVP400 Processor P3 TVP400 Processor P4 TVP400 Bit Packing Unit DMA & SDRAM Controller 128 bit 512 MB DRAM (8-DDR2)
TVP400 – Vector DSP Core 0 1 2 Scalar Registers Vector Units 3 12GB/s 48GB/s Scalar Unit Vector Registers 24GB/s 4 KB D Cache 32KB I Cache 128KB VECTOR SRAM 8 KB Scratch 8GB/s 8GB/s I/O Interface DMA Controller PIO Controller 64 bit 8GB/s DRAM
H.264 Partitioning & Performance Budget • Sub-sample 2% • Motion estimation 40% • Transform & Quantization 5% • Transform size & rate control 6% • Reorder 2% • Entropy coding 20% • Inverse quantization & transform 5% • De-blocking filter 4% • Up-sample 2% • System control 4% • Total 90%
TVP Performance Benchmark • Motion estimation ~50% of problem • Typically implemented in a programmable machine • Hardwired approaches are not necessarily applicable • N-Step Search algorithm was chosen : • Exposes the need for a “Sum of Absolute Differences” compound instruction • Exposes the cache memory line splitting problem • Exposes the cache memory line replacement efficiency • Exposes the inherent parallelism available in the algorithm
TVP2000 - Entropy Coding • CABAC • Cycle count for Binarization of Arithmetic Encoding • 8 – 4*4 Transform blocks, 9 non zero coefficients • Benchmark done on • TVP2000 simulator @ 1GHz • Apogee C compiler only • and with vector intrinsics • AMD Opteron processor @ 2.4GHz • GCC-O2 compiler • Results • TVP2000 1276 cycles @ 1GHz C only • 201 cycles @1 GHz w/ Vector intrinsics • AMD Opteron 24000cycles @ 2.4GHz • TVP2000 chip is ~ 49 times more powerful than AMD Opteron chip for Binarization of CABAC encoding
Scalable Encoders Broadcast Applications Video Quality TVP2000 TVP2000 TVP2000 TVP2000 TVP2000 TVP2000 TVP2000 TVP2000 TVP2000 4:2:2, 8b 4:2:2, 10b 4:2:0, 8b
1 1 1 1 1 1 1 1 1 3 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 N-Step-Search Algorithm • This Algorithm is most widely known in its three-step form, the three-step-search (TSS).