290 likes | 619 Views
JCTVC-A116 Video Coding Technology Proposal by Fraunhofer HHI. M. Winken, S. Boße, B. Bross, P. Helle, T. Hinz, H. Kirchhoffer, H. Lakshman, D. Marpe, S. Oudin, M. Preiß, H. Schwarz, M. Siekmann, K. Sühring, T. Wiegand. Outline. Overview
E N D
JCTVC-A116Video Coding Technology Proposalby Fraunhofer HHI M. Winken, S. Boße, B. Bross, P. Helle, T. Hinz, H. Kirchhoffer, H. Lakshman, D. Marpe, S. Oudin, M. Preiß, H. Schwarz, M. Siekmann, K. Sühring, T. Wiegand
Outline • Overview • Generalized Picture Partitioning for Prediction and Transform Coding • Signalling using two nested quad-tree structures • Codec components • Spatial intra prediction • Motion representation and coding • Merging of motion partitions • Sub-sample interpolation for inter prediction • In-Loop filtering • Transform coding of prediction residuals • Entropy coding • Encoder Control • Average Objective Coding Efficiency • Summary
Basic Approach & Summary • Generalization of concepts in H.264/AVC • Idea: use a simple structure to show potential
Overview • Hybrid video coding approach • Conceptual generalization of the H.264/AVC design • Simple individual building blocks (similar as in H.264/AVC) • Larger prediction and transform block • Flexible partitioning in prediction and transform block • Two nested quad-tree structures • Merging for inter-coded partitions • Spatial intra prediction • Motion-compensated prediction (non-adaptive filters) • Deblocking and adaptive in-loop filter • New entropy coding concept • Supports parallelized entropy decoding • Supports usage of VLC without compromising efficiency
Overview • High-level syntax is similar to H.264/AVC • NAL units • Sequence parameter sets • Picture parameter sets • Internal bit depth • Accuracy of 14 bit for • intra prediction signal • motion-compensated prediction signal • reconstructed residual signal • Rounding to 8 bits after reconstruction of a block • Reference pictures have an accuracy of 8 bits
Picture Subdivision for Prediction and Coding • Generalized picture plane grouping • Partitioning of the colour planes into plane groups(with the possibility of inter-plane prediction of parameters) • Same partitioning and coding parameters for a plane group • Submitted bitstreams: Single plane group (Y,U,V) • Quadtree-based partitioning of the plane groups • Division of a plane group into square blocks (“tree blocks”)of maximum block size • Maximum block size issignalled in slice header(64x64 for submitted streams) • Quadtree-based subdivision ofthe tree block into prediction andtransform blocks
Partitioning for Prediction and Transform • Two nested quad-tree structures • Partitioning into prediction blocks (intra or inter prediction) • Partitioning of prediction blocks into transform blocks(specifying the transform sizes)
Intra prediction • Spatial intra prediction using neighbouring samples (conceptually similar to H.264/AVC) • Generalization of H.264/AVC intra prediction for arbitrary block size • 8 directional intra prediction modes • DC prediction mode • Adaptive smoothing of neighbouring samples(signalled via a flag) • 3-tap filter (1,2,1)
Motion Representation • Generalized multi-hypothesis prediction • More than two motion hypothesis are supported • only up to two hypothesis are used for the submitted streams • Each motion hypothesis is specified by • a reference list index (into a single reference picture list) • a displacement vector • Displacement vector accuracy is selectable on a slice basis • Quarter-luma sample accuracy used in submitted bitstreams • Motion vector prediction and coding • Interleaved prediction and coding of horizontal and vertical displacement vector components • Motion partition merging for inference of motion information from neighbouring blocks • No Skipped or Direct blocks
Motion Vector Prediction and Coding • Interleaved motion vector prediction and coding • Coding of reference index • Selection of neighbouring blockswith same reference index • Prediction of vertical componentusing median prediction • Coding of vertical component of the difference vector • Selection of neighbouring motionvectors with minimum absolutedifference in vertical component • Prediction of horizontal componentusing selected motion vectors(single vector or median prediction) • Coding of horizontal component
Motion Partition Merging Concept Reduction of side information rate for motion information Adaptive inference of motion information from neighbouring inter-predicted partitions (prediction blocks) Signalling using up to two flags per inter prediction block R: region with the same motion information B: first block of R in the decoding order (transmission of motion information) For the remaining blocks of the region R only up to two flags specifying the merging information are coded B R
Signalling of Motion Partition Merging merge_flag transmitted if one or both neighbours are inter coded if equal to 1: block X is merged with one of the neighbours otherwise: motion data are transmitted for block X merge_left_flag transmitted if merge_flag is equal to 1 and both neighbours are inter-coded with different motion parameters (inferred otherwise) specifies whether current block is merged with left or top neighbour T X: current inter-coded prediction block L: left neighbour of current block X T: top neighbour of current block X L X
Sub-Sample Interpolation for Inter Prediction Overview Non-adaptive sub-sample interpolation Concept is based on interpolation with MOMS(Basic functions with Maximal Order and Minimal Support) Implementation in 16-bit integer arithmetic 2D separable IIR pre-filter (one coefficient) 2D separable FIR interpolation filter with short support (4-tap) Both the IIR and FIR filter steps are highly parallelizable Pre-filtered Reference Picture size: W x H Up-sampled Reference Picture size: 4W x 4H Reference Picture size: W x H Vertical Horizontal Vertical Horizontal IIR Filter IIR Filter FIR Filter FIR Filter Interpolation Prefilter
IIR Pre-filter 1D IIR Filter in horizontal direction Causal and anti-causal filtering Same IIR Filter in vertical direction Pole value (scaled by 215): z1 = -11726
FIR Interpolation Filter 4-tap FIR Filter (scaled by 215): Applied in horizontal direction on pre-filtered reference picture Applied in vertical direction on horizontal filtered picture Extendable for arbitrary motion vector accuracy e.g. 1/8, 1/12, 1/16 luma sample accuracy changing FIR kernel while maintaining the same IIR filter
Filtering inside Motion-Compensation Loop Deblocking filter Similar as in H.264/AVC Extended for larger block sizes Adaptive In-Loop Filter Separable Wiener filter vertical filtering followed by horizontal filtering Potentially different filters in horizontal and vertical direction Filter size is chosen by minimizing a Lagrangian cost functional supported filter sizes are 3, 5, 7, 9, and 11 Filters are separately estimated for luma and chroma planes Filters may be re-used for reducing the side information
Adaptive In-Loop filter Quad-tree based block-wise filter decision Quad-tree is independentof prediction partitioning Quad-tree is transmittedas side information Estimation of filter coefficients Estimate filter coefficients and filter size for entire picture Determine quad-tree based filter decisions Re-estimate filter coefficients and filter size for selected regions
Transform Coding of Prediction Residuals Segmentation of prediction blocks Segmentation into transform blocks using a quadtree Signalization of the partitioning into transform blocks Maximum and minimum transform size are signaled in slice header For quad-tree nodes between these bounds,subdivision flags are transmitted Transformsegmentationtree example max. transform size transmitted subdivision flags min. transform size
Transform Coding Transform kernels Separable NxN transforms Integer approximations of DCT-II kernels(obtained by scaling and rounding of DCT-II kernel) 32 bit integer implementation with multiplications and additions(employing symmetries of basis functions) Integer transform kernels haven’t been optimized for low-complexity implementations (using bit shifts and additions) Quantization Similar to H.264/AVC Uniform scalar quantization without extra dead-zone 52 quantizers with logarithmically increasing step size
Entropy Coding • Novel entropy coding concept • Binarization and context modelling as in CABAC of H.264/AVC • Modified coding of binary decisions (bins) • LPB probabilities are quantized (12 classes in implementation) • Separate bin encoders for each class (fixed LPB probabilities) • Supports high degree of parallelization • Supports variable length codeswithout compromising coding efficiency
Entropy Coding with Arithmetic Codes • Parallelization for large slice data NAL units • All arithmetic coders are operated at fixed probabilities • Arithmetic codewords for the different bin encoders are written to different partitions of the slice data NAL unit • Partitioning of the slice data NAL unit is signalled in header • Multiple arithmetic decoders can be operated in parallel • Remaining entropy coding process simply reads bins from multiple bin buffers • Disabling multi-codeword approach for small slices • Parallelization is not required for small slices • Overhead of partitioning information can be significant • Usage of conventional arithmetic coding engine for small slices(signalled in slice header) • Arithmetic coding is used in submitted bitstreams
Entropy Coding with Variable Length Codes • Alternative to arithmetic coding engines • Bin encoders/decoders operate at fixed probabilities • Arithmetic coding enginescan be replaced by simplevariable-length coders • Bin coders map a variablenumber of bin onto avariable-length codewordand vice versa • Potential termination ofbin sequences at the end of a slice(use shortest codeword) Example: VNB2VLC mapping for P=0.15 (0.25% overhead relative to entropy)
Entropy Coding with Variable Length Codes • Codeword interleaving • Interleaving of codewordswith any overhead • Codeword buffer at encoder • Instantaneous decoding • Low-delay interleaving • Specification of maximumbuffer delay • Codeword termination atencoder and decoder ifmaximum delay is achieved • Coding efficiency • Lossless transcoding of submitted bitstreams showed virtually the same coding efficiency • 0.18% rate savings with codeword interleaving • 0.10% rate savings with low-delay control (64 Byte)
Encoder Control • Coding structure • Hierarchical B pictures for constraint set 1 configuration • Low-delay hierarchical P pictures for constraint set 2 configuration • Motion estimation • Rate-constrained motion estimation (as in JM, JSVM, JMVM) • Fast integer sample motion search (same as in JSVM, JMVM) • Sub-sample refinement search • Quantization (for a transform block) • Rate-distortion optimized quantization (RDOQ) • Similar as in JM • Coding mode decision (for a prediction block) • Rate-constrained mode decision (as in JM, JSVM, JMVM) • Abort criterion for complexity reduction: • Intra modes are not test, if for the inter mode • all transform coefficient levels (RDOQ) are equal to 0 • all transform coefficients are below a certain threshold (depending on quantization step size)
Encoder Control • Selection of Prediction and Transform Segmentation • Use top-to-bottom and depth-first decision strategywith abort criterion • Decision is based onLagrangian costs • Same abort criterion as for coding mode selection • Smaller blocks are not tested, if • all transform coefficient levels (RDOQ) are equal to 0 • all transform coefficients are smaller than a threshold(threshold is depending on quantization step size) • Uses quad-tree structure for reducing the computational complexity of the partition selection process
Software Standard C++ Implementation Platform independent Compiles under Windows and Linux (32/64 bit) Focus on modular design and easy extensibility Slim code base Only ~55.000 LOC (vs. ~150.000 LOC for JM 17.0) Virtually no redundant code English naming of variables and comments No external libraries needed Optional multi-threaded encoding (boost-library needed) Parallel encoding of independent pictures (depends on the GOP string) No need to regard multi-threading related issues when changing the encoding algorithm inside of a picture E.g. CS1 bitstreams were encoded almost 8 times faster than single-threaded (on a computer with 8 cores)
Summary • Hybrid video coding approach • Generalization of H.264/AVC concepts • Support of larger block sizes for prediction and transform • Flexible quad-tree based partitioning into prediction blocks(with additional merging for inter-coded blocks) • Flexible quad-tree based partitioning into transform blocks • Spatial intra prediction • Motion-compensated prediction using non-adaptive filters • Deblocking and adaptive loop filter • Novel entropy coding approach • Average objective coding results • About 29-30% bit rate savings for high-delay cases • About 22% bit rate savings for low-delay cases