140 likes | 475 Views
Performance Enhancement of Video Compression Algorithms using SIMD. Valia, Shamik Jamkar, Saket. Motivation. Understand the SSE architecture Understand the Video compression algorithm and identify the bottlenecks. Improve performance of Video Compression Algorithm using the SSE platform.
E N D
Performance Enhancement of Video Compression Algorithms using SIMD Valia, Shamik Jamkar, Saket
Motivation • Understand the SSE architecture • Understand the Video compression algorithm and identify the bottlenecks. • Improve performance of Video Compression Algorithm using the SSE platform
Components of Video Compression Algorithm • Motion Estimation • Motion Compensation and Image Subtraction • Discrete Cosine Transform • Quantization • Run Length Encoding • Huffman Coding
Bottleneck • Motion Estimation • It is the process of calculating motion vectors by searching image blocks from a reference image in a new target image • DCT • Technique to change from the time domain to spatial frequency domain • Highest energy compaction after KLT
SSE 2 Specifics • Intel C/C++ Compiler 8 • 3 coding styles • Intrinsics • Assembly • Vector Ops • Use of Intrinsics • _mm_sad_epu8 for __m128i datatype • _m_psadbw for __m64 datatype
Motion Compensated frames 16 x 16 8 x 8
Discrete Cosine Transform • 2-D DCT is extensively used in JPEG compression algorithm. • Highly computational intensive. • FOCUS • Exploring DCT implementation on SSE2. • Identify the DCT algorithm which is scalable with the SIMD Architecture
DCT hardware Accelerator • Distributed Arithmetic • Choice of DA implementation of DCT • Scalable with SSE platform. • 2-D 8x8 DCT operations can be performed as • Preprocessing • 1-D DCT (Using DA) • Transpose • 1-D DCT (Using DA) • Post Processing
1-D DCT on SSE2 using DA x0+ x7 x1+x6 x2+x5 x3+x4 x0-x7 x1-x6 x2-x5 x3-x4 4 DAP DAP DAP DAP DAP DAP DAP ROM ROM 16 0.5 16 + X2 X4 X6 X1 X3 X5 X7 16 16 • Total of 8 DAP structures. • Each DAP completes operations in 8 cycles • Scalable on various datapaths 16,32,64,128. • DAP subword dest,source R 0.25 16 X0
Work done • Accomplished • Motion Estimation coding and analysis • DCT hardware accelerator in Verilog • ISA extension for DCT implementation. • To be done • Synthesis to get delay and area estimate • Assembly code with SSE-DCT enhancements and its performance analysis