520 likes | 538 Views
MPEG-2 to H.264/AVC Transcoding Techniques. Jun Xin Xilient Inc. Cupertino, CA. Digital Video Transcoder. “A” and “B” may differ in many aspects: coding formats: e.g. MPEG-2 to H.264/AVC bit-rate, frame rate, resolution … features: error resilience features
E N D
MPEG-2 to H.264/AVC Transcoding Techniques Jun Xin Xilient Inc. Cupertino, CA
Digital Video Transcoder • “A” and “B” may differ in many aspects: • coding formats: e.g. MPEG-2 to H.264/AVC • bit-rate, frame rate, resolution … • features: error resilience features • contents: e.g. logo insertion Coded digital video bit-stream “A” Coded digital video bit-stream “B” Transcoder Digital Video Transcoding
Applications • Media Storage • Transcode broadcasting MPEG-2 video to H.264/AVC format: enable long-time recording • Effective for multi-channel recording • Home Gateway • Provide connection to IPTV set-top box • Box only supports H.264/AVC • Over wireless network with bandwidth limitation • Other potential uses: • Export to mobile • Internet streaming • … … Digital Video Transcoding
Goals and Challenges • H.264/AVC: latest video compression standard • Promises same quality as MPEG-2 at half the bit-rate • Is being widely adopted • HD Consumer Storage, e.g., HD-DVD and Blu-Ray • Mobile Devices, e.g., Apple iPod, iPhone, Sony PSP • Convert MPEG-2 video to H.264/AVC format • More efficient storage, export to mobile devices, etc. • Challenges • Yield similar quality as full re-encoding, but with much lower cost • Key to lower-cost/high-quality: how to intelligently reuse available information from the incoming bitstream • May be loosely considered as a “two-pass coder” • Could achieve better quality than full re-encoding given same complexity Digital Video Transcoding
Outline • Intra-only transcoding techniques • Efficient compressed domain processing • Inter transcoding techniques • Motion mapping / motion reuse Digital Video Transcoding
H.264 Entropy Coding Intra Prediction (Pixel-domain) Pixel Buffer Mode decision Intra Transcoder – Pixel Domain Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization IDCT: inverse discrete cosine transform HT: H.264/AVC 4x4 transform VLD/ IQ IDCT HT Q Inverse Q Inverse HT Digital Video Transcoding
H.264 Entropy Coding Intra Prediction (Comp-domain) Coeff Buffer Mode decision Compressed Domain Processing? Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization IDCT: inverse discrete cosine transform HT: H.264/AVC 4x4 transform VLD/ IQ Q Inverse Q Digital Video Transcoding
AVC 4x4 Transform • Motivation: • DCT requires real-number operations, which may cause inaccuracies in inversion • Better prediction means less spatial correlation – no strong need for real-number operations • H.264 uses a simple integer 4x4 transform • Approximation to 4x4 DCT • Transform and inverse transform • note: ½ in inverse transform represents right shift, so it is non-linear Digital Video Transcoding
Intra Prediction in H.264/AVC • Motivation: intra-frames are natural images, so they exhibit strong spatial correlation • Pixels in intra-coded frames are predicted based on previously-coded ones • Prediction can be based on 4x4 blocks or 16x16 macroblocks (or 8x8 blocks for high profile) • An encoded mode specifies which neighbor pixels should be used to predict, and how Digital Video Transcoding
4x4 Intra Prediction Example • Current block: • Prediction blocks: Vertical Horizontal Diagonal_Down_Right Digital Video Transcoding
Compressed Domain Processing? • Challenges • Different transforms • MPEG-2 uses DCT, floating point • H.264/AVC uses an integer transform • New prediction modes in H.264/AVC • Can prediction be performed in compressed domain? • Goals • Simpler computation and architecture Digital Video Transcoding
H.264 Entropy Coding Intra Prediction (Comp-domain) Coeff Buffer Mode decision Compressed Domain Processing? Input MPEG-2 Bitstream VLD: variable length decoding (I)Q: (inverse) quantization IDCT: inverse discrete cosine transform HT: H.264/AVC 4x4 transform VLD/ IQ Q Inverse Q Digital Video Transcoding
Entropy Coding Intra Prediction (HT-domain) Pixel Buffer Mode decision (HT-domain) Intra Transcoder – Proposed VLD: variable length decoding (I)Q: (inverse) quantization IDCT: inverse discrete cosine transform HT: H.264/AVC 4x4 transform Input MPEG-2 Bitstream VLD/ IQ DCT-to-HT conversion (S-Transform) Q Inverse Q Inverse HT Digital Video Transcoding
Techniques • DCT-to-HT conversion • Compressed (HT) domain prediction • Very simple for some prediction modes • Compressed domain distortion calculation in mode decision • Advantages • lower computational complexity • No quality loss Digital Video Transcoding
DCT-to-HT Conversion Digital Video Transcoding
DCT-to-HT Conversion:Transform Kernel Matrix Digital Video Transcoding
Fast Algorithm (1D) Digital Video Transcoding
Complexity Analysis • Transform-domain DCT-to-HT (S-Transform): 704 operations • 352 multiplications • 352 additions • Pixel-domain mapping (IDCT* followed by HT): 992 operations • 256 multiplications • 64 shifts • 672 additions • Advantage • 29% saving in total operations • Two-stage vs. six-stage implementation • Better performance: no intermediate rounding * W.H. Chen, C.H. Smith, and S.C. Fralick, ``A Fast Computational Algorithm for the Discrete Cosine Transform,'' IEEE Trans. on Communications, Vol. COM-25, pp. 1004-1009, 1977 Digital Video Transcoding
Entropy Coding Intra Prediction (HT-domain) Pixel Buffer Mode decision (HT-domain) Intra Transcoder – Proposed VLD: variable length decoding (I)Q: (inverse) quantization IDCT: inverse discrete cosine transform HT: H.264/AVC 4x4 transform Input MPEG-2 Bitstream VLD/ IQ DCT-to-HT conversion (S-Transform) Q Inverse Q Inverse HT Digital Video Transcoding
SATD Cost RD Cost Conventional Mode Decisions • Given all possible prediction modes, encoder needs to decide which one to use • Low-complexity mode decision rule (RDO_Off): or • High-complexity mode decision rule with rate distortion optimization (RDO_On): Digital Video Transcoding
Conventional RD Cost Computation • Entire encoding/decoding need to be performed for every mode Digital Video Transcoding
Motivation & Previous Approaches • RD_Cost based mode decision gives best performances, but very expensive to compute • Previous efforts in fast intra mode decisions • Directional field • Edge histogram • Other pixel-domain approaches • They all lead to lower coding performance • Our approach is based on transform domain processing – no loss in coding performance Digital Video Transcoding
Transform Domain RD Cost Computation • No inverse transform • Transformations of some prediction signals are easy to compute • Distortion calculated in transform domain Digital Video Transcoding
HT of DC Prediction HT • No HT needs to be performed • Pdc has only one non-zero elements Digital Video Transcoding
HT of Horizontal Prediction • Only one 1-D HT is needed • Ph has only four non-zero elements (the first column) Digital Video Transcoding
HT of Vertical Prediction • Only one 1-D HT is needed • Pv has only four non-zero elements (the first row) Digital Video Transcoding
Calculate Distortion in Transform Domain Distortion in pixel domain: Distortion in transform domain: Digital Video Transcoding
Ranking-based Fast Mode Decision • Two cost functions: SATD_Cost & RD_Cost • Observation: the best mode according to RD_Cost usually has smaller SATD_Cost • Proposed algorithm (mode reduction): to rank different modes using SATD_Cost, then calculate RD_Cost for top several modes • Algorithm can be conducted in transform domain Digital Video Transcoding
Verification Experiment • Count the percentage of times when the best mode according to RD_Costare within the best k modes ranked by SATD_Cost • k fixed as 3 in all simulations Digital Video Transcoding
Simulation Conditions • Three transcoders • PDT – reference pixel domain transcoder, with fast IDCT implemented • TDT – transform domain transcoder • TDT-R – transform domain transcoder with ranking-based mode decision • Test sequences • 100 frames, CIF size, 30 fps • Input: MPEG-2 all-I at 6Mbps Digital Video Transcoding
Simulation – “Mobile” Digital Video Transcoding
Simulation – “Stefan” Digital Video Transcoding
Complexity: Run-time Results Digital Video Transcoding
Summary of Intra Transcoding • Efficient transcoder architecture • Efficient mode decision • Transform domain distortion calculation • Ranking-based mode decision • Achieved virtually same quality as reference transcoder with significantly lower complexity Digital Video Transcoding
Prediction Transcoder Architecture entropy coding HT/Q Inverse Q/ Inverse HT MPEG-2 decoder Deblocking filter Decoded picture and macroblock data Pixel buffers Motion and modes Motion/mode mapping Digital Video Transcoding
Assumptions • Input • MPEG-2 frame pictures • Output • H.264/AVC baseline profile (no B slices) and main profile • Frame pictures, MBAFF not considered • Block partition sizes considered for motion compensation: 16x16, 16x8, 8x16 and 8x8 Digital Video Transcoding
Motion Mapping: Problems Digital Video Transcoding
Motion Mapping Algorithm • Field-to-frame mapping: convert MPEG-2 field motion vectors (if any) to frame vector • Reference picture mapping: for B to P frame type conversion • Block size mapping: map the MPEG-2 motion vectors to target H.264/AVC motion vectors of different block size • Algorithm: distance weighted average (DWA) • Motion refinement: (1+1/2+1/4) around estimated motion vectors for all block partitions • Note: for B slice output, the above mapping is performed for motion vectors of both directions Digital Video Transcoding
Field-to-frame Conversion Digital Video Transcoding
ti=3 Input I B B P Output I P P P to=1 MVcol MVi,forw MVi,back Input I B B P Output I P P P MVo Reference Picture Mapping Digital Video Transcoding
Block Size Mapping: 16x8 8x16 Digital Video Transcoding
Block Size Mapping: 8x8 Digital Video Transcoding
Simulation Conditions • Test sequences: • 1920x1080i, 30fps, 450 frames • MPEG-2 input: • 30 Mbps, (30,3) • H.264/AVC output: • UVLC, output bit-rate of interest ~10 Mbps • Baseline profile (needs to convert B pictures to P slices) & Main profile • Comparison points • Mapping algorithm • B slices • RD optimization Digital Video Transcoding
Baseline output: no B slices Digital Video Transcoding
Baseline output: no B slices Digital Video Transcoding
Main Output: with B slices Digital Video Transcoding
Main Output: with B slices Digital Video Transcoding
Complexity: Run-time Results Digital Video Transcoding