390 likes | 859 Views
Outline. OverviewNetwork Abstraction Layer (NAL)Video Coding Layer (VCL)Profiles and ApplicationsFeature HighlightsConclusions. Evolution of Video Compression Standards. H.261Video Telephony. H.262/MPEG-2Digital TV/DVD. MPEG-4 VisualObject-based Coding. H.263Video Conferencing. H.264 MPEG-4
E N D
1. Overview of the H.264/AVC Video Coding Standard T. Wiegand, G. Sullivan, G. Bjontegaard, and A. LuthraOverview of the H.264/AVC Video Coding Standard,IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, July 2003.
CMPT 820: Multimedia Systems
2. Outline Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
3. Evolution of Video Compression Standards MPEG-2: an enabling technology for digital television systems
initially designed to extend MPEG-1 and support interlaced video coding
Used by transmission of SD/HD TV signal and storage of SD video onto DVDs.
H.263/+/++: conversation/communication application
Support diversified networks and adapted to their characteristics, e.g. PSTN, mobile, LAN/Internet
Considering loss and error robustness requirements
MPEG-4 Visual: object-based video coding
Provide shape coding
MPEG-2: an enabling technology for digital television systems
initially designed to extend MPEG-1 and support interlaced video coding
Used by transmission of SD/HD TV signal and storage of SD video onto DVDs.
H.263/+/++: conversation/communication application
Support diversified networks and adapted to their characteristics, e.g. PSTN, mobile, LAN/Internet
Considering loss and error robustness requirements
MPEG-4 Visual: object-based video coding
Provide shape coding
4. H.264/AVC Coding Standard Various Applications
Broadcast: cable, satellites, terrestrial, and DSL
Storage: DVDs (HD DVD and Blu-ray)
Video Conferencing: over different networks
Multimedia Streaming: live and on-demand
Multimedia Messaging Services (MMS)
Challenge:
How to handle all these applications and networks
Flexibility and customizability H.264/AVC consider a variety of applications over heterogeneous networks, such as broadcast over unidirectional comm. Channel, optical/magnetic media that supports sequential playouts as well as random access (jump to some points), low bit rate video conferencing, possibly high bit rate video streaming, and digital video mails.
Often these applications impose different requirements, e.g., high/low bit rates, while the networks have different characteristics, e.g., low/high latency (satellites), low/high loss rate (wire/wireless) networksH.264/AVC consider a variety of applications over heterogeneous networks, such as broadcast over unidirectional comm. Channel, optical/magnetic media that supports sequential playouts as well as random access (jump to some points), low bit rate video conferencing, possibly high bit rate video streaming, and digital video mails.
Often these applications impose different requirements, e.g., high/low bit rates, while the networks have different characteristics, e.g., low/high latency (satellites), low/high loss rate (wire/wireless) networks
5. Structure of H.264/AVC Codec Layered design
Network Abstraction Layer (NAL)
formats video and meta data for variety of networks
Video Coding Layer (VCL)
represents video in an efficient way To address the heterogeneity of applications and networks, H.264/AVC adopt a layered design, where …
To address the heterogeneity of applications and networks, H.264/AVC adopt a layered design, where …
6. Outline Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
7. Network Abstraction Layer Provide network friendliness for sending video data over various network transports, such as:
RTP/IP for Internet applications
MPEG-2 streams for broadcast services
ISO File formats for storage applications
We present a few NAL concepts The main objective of NAL is to enable simple and effective customization of sending video data over different network systems.
The main objective of NAL is to enable simple and effective customization of sending video data over different network systems.
8. NAL Units Packets consist of video data
short packet header: one byte
Support two types of transports
stream-oriented: no free unit boundaries ? use a 3-byte start code prefix
packet-oriented: start code prefix is a waste
Can be classified into:
VCL units: data for video pictures
Non-VCL units: meta data and additional info NAL units are packets that carry video data.
NAL units have very short packet header, and most of its capacity is used for carrying payloads.
… We next present non VCL units
NAL units are packets that carry video data.
NAL units have very short packet header, and most of its capacity is used for carrying payloads.
… We next present non VCL units
9. Non-VCL NAL Units Two types of non-VCL NAL units
Parameter sets: headers shared by a large number of VCL NAL units
a VCL NAL unit has a pointer to its picture parameter set
a picture parameter set points to its sequence parameter set
Supplemental enhancement info (SEI): optional info for higher-quality reconstruction and/or better application usability
Sent over in-band or out-of-band channels Parameter sets units reduce the traffic amount by exploring the header redundancy among VCL NAL units
There are two type of parameter sets. Picture parameter set contains headers for one or a few pictures. Sequence parameter set contains (less frequently changed) headers for even more pictures.
SEI units are not mandatory and are used to improve the playout quality. For example, rate-distortion info can be carried by SEI units. Also, SEI is quite extensible, and applications can define their custom SEI messages.
Out-of-band often enables more flexibility, such as sending parameter sets with stronger FEC code, possible retransmissions, and higher routing priorityParameter sets units reduce the traffic amount by exploring the header redundancy among VCL NAL units
There are two type of parameter sets. Picture parameter set contains headers for one or a few pictures. Sequence parameter set contains (less frequently changed) headers for even more pictures.
SEI units are not mandatory and are used to improve the playout quality. For example, rate-distortion info can be carried by SEI units. Also, SEI is quite extensible, and applications can define their custom SEI messages.
Out-of-band often enables more flexibility, such as sending parameter sets with stronger FEC code, possible retransmissions, and higher routing priority
10. Access Units A set of NAL units
Decoding an access unit results in one picture
Structure:
Delimiter: for seeking in a stream
SEI: timing and other info
primary coded picture: VCL
redundant coded picture: for error recovery Now, we know the building blocks. But how to put NAL units together to represent ONE picture?
* Delimiter for seeking in byte-stream format
* SEI: timing information and other supplemental data that may enhance application usability.
* Consist of a set of VCL NAL units which together compose a primary coded picture.
* Several redundant slices for error recovery.
Redundant coded pictures are usually not decoded. Only when decoders fail to decode the primary coded picture, they will try to decode redundant coded picture.Now, we know the building blocks. But how to put NAL units together to represent ONE picture?
* Delimiter for seeking in byte-stream format
* SEI: timing information and other supplemental data that may enhance application usability.
* Consist of a set of VCL NAL units which together compose a primary coded picture.
* Several redundant slices for error recovery.
Redundant coded pictures are usually not decoded. Only when decoders fail to decode the primary coded picture, they will try to decode redundant coded picture.
11. Video Sequences and IDR Frames Sequence: an independently decodable NAL unit stream ? don’t need NALs from other sequences
with one sequence parameter set
starts with an instantaneous decoding refresh (IDR) access unit
IDR frames: random access points
Intra-coded frames
no subsequent picture of an IDR frame will require reference to pictures prior to the IDR frame
decoders mark buffered reference pictures unusable once seeing an IDR frame Intuitively, aggregating several assess units (pictures) gives us a sequence. But, video sequence carries other requirement in H.264/AVC.
Intuitively, aggregating several assess units (pictures) gives us a sequence. But, video sequence carries other requirement in H.264/AVC.
12. Outline Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
13. Video Coding Layer (VCL) (Like other) hybrid video coding: H.264/AVC represents pictures in macroblocks
Motion compensation: temporal redundancy
Transform: spatial redundancy
Small improvements add up to huge gain
Combining many coding tools together
Pictures/Frames/Fields
Fields: top/bottom field contains even/odd rows
Interlaced: two fields were captured at diff time
Progressive: otherwise Like prior video coding standards, H.264 is a hybrid coder, which use MC to exploit temporal redundancy, and transform to exploit spatial redundance.
These coding tools are not totally new. In fact, many of them have been proposed many years ago, but have been thought impractical due to computational complexity. They become acceptable today because of the advance of computer hardware.
Each picture contains a frame, where each frame has two fields. Interlaced Like prior video coding standards, H.264 is a hybrid coder, which use MC to exploit temporal redundancy, and transform to exploit spatial redundance.
These coding tools are not totally new. In fact, many of them have been proposed many years ago, but have been thought impractical due to computational complexity. They become acceptable today because of the advance of computer hardware.
Each picture contains a frame, where each frame has two fields. Interlaced
14. Macroblocks and Slices Fixed size MBs: 16x16 for luma, and 8x8 for chroma
Slice: a set of MBs that can be decoded without use of other slices
I slice: intra-prediction (I-MB)
P slice: possibly one inter-prediction signal (I- and P-MBs)
B slice: up to two inter-prediction signals (I- and B-MBs)
SP slice: efficient switch among streams
SI slice: used in conjunction with SP slices The only reason that we may require adjacent slices is for deblocking filter.
The main feature of SP-frames is that identical SP-frames can be reconstructed even when different reference frames are used for their prediction.
SI-frames are used in conjunction with SP-frames.
This property make them useful for applications such as random access, adapt to network condition, and error recovery/ resilience
We next talk about ordering of MBs.
The only reason that we may require adjacent slices is for deblocking filter.
The main feature of SP-frames is that identical SP-frames can be reconstructed even when different reference frames are used for their prediction.
SI-frames are used in conjunction with SP-frames.
This property make them useful for applications such as random access, adapt to network condition, and error recovery/ resilience
We next talk about ordering of MBs.
15. Flexible Macroblock Ordering (FMO) ROI: several foreground objects, and one left-over background
Checker-board, for error concealment.ROI: several foreground objects, and one left-over background
Checker-board, for error concealment.
16. Adaptive Field Coding Two fields of a frame can be coded as:
A single frame (frame mode)
Two separate fields (field mode)
A single frame with adaptive mode (mixed mode)
Picture-adaptive frame/field (PAFF)
frame/field decision is made at frame level
16% - 20% bit rate reduction over frame only
Macroblock-adaptive frame/field (MBAFF)
frame/field decision is made at MB level
14% - 16% bit rate reduction over PAFF
Why need different modes? In interlaced frames, two adj. row may have low correlation, because frames were taken at different time instances.
For regions with more movements, field mode is better, for regions with less movements, frame mode is better. This is why we want to have MBAFF.Why need different modes? In interlaced frames, two adj. row may have low correlation, because frames were taken at different time instances.
For regions with more movements, field mode is better, for regions with less movements, frame mode is better. This is why we want to have MBAFF.
17. Intra-frame Prediction In spatial domain, using samples to the left and/or on above to predict samples in a MB
Types of intra-frame prediction:
Intra_4x4: detailed luma block
Intra_16x16: smooth luma blocks
Chroma_8x8: similar to Intra_16x16 as chroma components are smooth
I_PCM: bypass prediction/transform, send samples
anomalous pictures, loseless, and predictable bit rate
Unlike MPEG-4, intra-frame prediction in H.264 is done in spatial domain.
There are four types of intra-frame prediction: 4x4 16x16, chroma, and I_PCM.
Why I_PCM: (1) anomalous shape (good coding efficiency), (2) loseless, and (3) predictable bit rateUnlike MPEG-4, intra-frame prediction in H.264 is done in spatial domain.
There are four types of intra-frame prediction: 4x4 16x16, chroma, and I_PCM.
Why I_PCM: (1) anomalous shape (good coding efficiency), (2) loseless, and (3) predictable bit rate
18. Intra_4x4 Prediction Samples in 4x4 block are predicted using 13 neighboring sample
8 prediction mode: 1 DC and 8 directional
Sample D is used if E-H is not available DC: use one value to predict the whole block.
For example: samples in mode 0 are copied into the 4x4 block.
E-H may be unavailable because of (1) decoding order, (2) outside the subject slice
First macroblock coded in a picture, we use DC prediction.
DC: use one value to predict the whole block.
For example: samples in mode 0 are copied into the 4x4 block.
E-H may be unavailable because of (1) decoding order, (2) outside the subject slice
First macroblock coded in a picture, we use DC prediction.
19. Sample Intra_4x4 Prediction Interpolation is used in some modes
20. Intra_16x16 Prediction 4 modes
Vertical
Horizontal
DC
Planer (Diagonal) Use 33 neighboring samples.
Good for smooth/flat areas.
Use 33 neighboring samples.
Good for smooth/flat areas.
21. Inter-Prediction in P Slices Two-level segmentation of MBs
Luma
MBs are divided into at most 4 partitions (as small as 8x8)
8x8 partitions are divided into at most 4 partitions
Chroma – half size horizontally and vertically
Maximum of 16 motion vectors for each MB Why do we need different partition sizes? Motion vector are expensive
Large partition ?- smooth area
Small partition: ?- detailed area
Note: motion vectors are coded using predictive codingWhy do we need different partition sizes? Motion vector are expensive
Large partition ?- smooth area
Small partition: ?- detailed area
Note: motion vectors are coded using predictive coding
22. Examples of Segmentation
23. Inter-Prediction Accuracy ¼-pixel for luma, 1/8-pixel for Chroma
Half-pixel samples:
6-tap FIR filter
Quarter-pixel samples:
average of neighbors
Chroma predictions: bilinear
interpolation The granularity of motion vectors
finite impulse response (FIR) filter
J can be computed in two ways…
Actual computations are done with addition, bit-shift, and integer arithmetics
The granularity of motion vectors
finite impulse response (FIR) filter
J can be computed in two ways…
Actual computations are done with addition, bit-shift, and integer arithmetics
24. Multiframe Inter-Prediction in P Slices More than one prior reference pictures (by diff. MBs)
Encoders/decoders buffer the same reference pictures for inter-prediction
Reference index is used when coding MVs
MVs for regions smaller than 8x8 uses the same index for all MVs in the 8x8 region
P_skip mode:
Don’t send residual signals nor MVs nor reference index
Use buffered frame 0 as the reference picture
Use neighbor’s MVs
Large areas with no change or constant motion like slow panning can be represented with very few bits. Prior standards often couple the playout order with reference order. E.g., a B frame use two adjacent I or P frames for inter-prediction. This is not true anymore in H.264/AVC.
Several reference frames can be used, and are organized as a buffered frame list
Full sample, half sample and quarter sample predictions represent different degrees of low pass filtering, which is chosen automatically by ME.
P_skip is a mode of P-predicted macroblock (16x16)
Prior standards often couple the playout order with reference order. E.g., a B frame use two adjacent I or P frames for inter-prediction. This is not true anymore in H.264/AVC.
Several reference frames can be used, and are organized as a buffered frame list
Full sample, half sample and quarter sample predictions represent different degrees of low pass filtering, which is chosen automatically by ME.
P_skip is a mode of P-predicted macroblock (16x16)
25. Multiframe Inter-Prediction in B Slices Weighted average of
2 predictions
B-slices can be used
as reference
Two reference picture
lists are used
One out of four pred. methods for each partition:
list 0
list 1
bi-predictive
direct prediction: inferred from prior MBs
The MB can be coded in B_skip mode (similar to P_skip)
26. 4x4 Integer Transform Why smaller transform:
Only use add and shift, an exact inverse transform is possible ? no decoding mismatch
Not too much residue to code
Less noise around edge (ringing or mosquito noise)
Less computations and shorter data type (16-bit)
An approximation to 4x4 DCT:
Ringing is still there, just they are smaller, and harder to see.Ringing is still there, just they are smaller, and harder to see.
27. 2nd Transform and Quantization Parameter 2nd Transform: Intra_16x16 and Chroma modes are for smooth area
DC coefficients are transformed again to cover the whole MB
Quantization step is adjusted by an exponential function of quantization parameter ? to cover a wider range of QS
QP increases by 6 => QS doubles
QP increases by 1 => QS increases by 12% => bit rate decreases by 12% 2nd transform is needed to take advantage of spatial redundancy in lat areas.
In addition to cover a wider range, exponential quantization step also simplifies the problem of rate control. 2nd transform is needed to take advantage of spatial redundancy in lat areas.
In addition to cover a wider range, exponential quantization step also simplifies the problem of rate control.
28. Entropy Coding Non-transform coefficients: an infinite-extent codeword table
Transform coefficients:
Context-Adaptive Variable Length Coding (CAVLC)
several VLC tables are switched dep. on prior transmitted data ? better than a single VLC table
Context-Adaptive Binary Arithmetic Coding (CABAC)
flexible symbol probability than CAVLC ? 5 – 15% rate reduction
efficient: multiplication free Previous standards usually have SEPARATE VLC tables for different elements, because they tend to have different probability characteristics
In H.264/AVC a simple and efficient infinite-extent codeword table is shared by all elements
For different element, only a mapping to this single codeword book is required.
Arithmetic coding: non-integer number of bit to each symbol
Previous standards usually have SEPARATE VLC tables for different elements, because they tend to have different probability characteristics
In H.264/AVC a simple and efficient infinite-extent codeword table is shared by all elements
For different element, only a mapping to this single codeword book is required.
Arithmetic coding: non-integer number of bit to each symbol
29. In-loop Deblocking Filter Operate within coding loop
Use filtered frames as ref. frames
? improve coding efficiency
Adaptive deblocking, need to determine
Blocking effects or object edges
Strong or weak deblocking
Intuitions
Large difference near a block edge -> likely a block artifact
If the difference is too large to be explained by the QP difference -> likely a real edge
In-loop is better than post-filter, because it can improve the quality.In-loop is better than post-filter, because it can improve the quality.
30. Hypothetical Reference Decoder (HRD) Standard receiver buffer models ? encoders must produce bit streams that are decodable to HRD
Two buffers
Coded picture buffer (CPB)
models the bit arrival and removal time
Decoded picture buffer (DPB)
models the frame decoded and output time in reference frame lists So, if a designer mimics the behavior of HRD, his decoder will work
So, if a designer mimics the behavior of HRD, his decoder will work
31. Outline Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
32. Profiles and Applications Defines a set of coding tools and algorithms
Conformance points for interoperability
3 Profiles for different applications
Baseline – video conferencing
Main – broadcast, media storage, digital cinema
Extended – streaming over IP (wire/wireless)
15 Levels
pic size
decoding rate (MB/s)
bit rate
buffer size Baseline: low latency real-time
Baseline: low latency real-time
33. Outline Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
34. Feature Highlights -- Prediction Variable blocksize MCs
Quarter-sample accurate MCs
MVs over pic. Boundaries
Multiple reference pictures
Weighted Bi-directional prediction
Decoupling of referencing from display orders
Decoupling prediction mode from reference capability (uses B frames as reference)
Improved Skip/Direct modes
Intra prediction in Spatial domain
In-loop deblocking filter
35. Feature Highlights -- Transform Small block-size transform
2-level block transform (repeated DC transform)
Short data-type transform (16-bit)
Exact inverse transform
Context-adaptive entropy coding
Arithmetic entropy coding
36. Feature Highlights -- Network Parameter set structure (efficient)
NAL unit syntax structure (flexibility)
Flexible slice size
Flexible macroblock ordering (FMO)
Arbitrary slice ordering (ASO)
Redundant pictures
Data partitioning (unequal error protection)
SP/SI switching pictures
37. Outline Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
38. Conclusions Key improvements
Enhanced prediction (intra- and inter-)
Small block size exact match transform
Adaptive in-loop deblocking filter
Enhanced entropy coding method