1 / 62

Data Formats and Codecs

INF 5070 – Media Storage and Distribution Systems. Data Formats and Codecs. 30/8 – 2004. Why codecs and formats?. Codecs (coders/decoders) Determine how information is represented Important for servers and distribution systems Required sending speed Amount of loss allowed Buffers required

joannem
Download Presentation

Data Formats and Codecs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INF 5070 – Media Storage and Distribution Systems Data Formats and Codecs 30/8 – 2004

  2. Why codecs and formats? • Codecs (coders/decoders) • Determine how information is represented • Important for servers and distribution systems • Required sending speed • Amount of loss allowed • Buffers required • … • Formats • Determine how data is stored • Important for servers and distribution systems • Where is the data? • Where is the data about the data?

  3. Media data

  4. Medium: "Thing in the middle“ here: means to distribute and present information Media affect human computer interaction The mantra of multimedia users Speaking is faster than writing Listening is easier than reading Showing is easier than describing Media data

  5. Time-independent media Text Graphics Discretemedia Time-dependent media Audio Video Continuousmedia Interdependant media Multimedia "Continuous" refers to the user’s impression of the data, not necessarily to its representation Combined video and audio is multimedia - relations must be specified Dependence of Media

  6. Dependence of Media • Defined by the presentation of the data, not its representation • Discrete media • Text • Graphics • Video stills (image displayed by pausing a video stream) • Continuous media • Audio • Video • Animation • Ticker news (continuously scrolling text) • Multimedia • Multiplexed audio and video

  7. Properties of a Multimedia System • Flexibility • Provide mechanisms to handle all kinds of media, in particular, discrete and continuous media • A VCR and a desktop publishing system for text and graphics are no multimedia systems • An editor with voice annotation is a multimedia system • Integration • Independent media storage • Computer-controlled media combination • Definition A multimedia system is characterized by theintegrated computer-controlled handling of independent discrete and continuous media

  8. Coding for distribution

  9. Compression - Necessity • E.g., video sequence • 25 images/sec. • PAL standard • 3 byte/pixel • YUV (luminance + 2 chrominance values) • RGB (red-green-blue values) • Image resolution 640 * 480 pixel • Data rate = 640 * 480 * 3 Byte * 25/s = 23040000 byte/s ~ 22 MByte/s • Approx. 1/16 stream over Ethernet • Approx. 1/2 stream over Fast Ethernet • Compression is necessary

  10. Compression – General Requirements • Dependence on application type: • Dialoguemode • Retrievalmode

  11. Compression – Mode Dependent Requirements • Dialogue and retrieval mode requirements: • Synchronization of audio, video, and other media • Dialogue mode requirements: • End-to-end delay < 150ms • Compression and decompression in real-time • Symmetric • Retrieval mode requirements: • Fast forward and backward data retrieval • Random access within 1/2 s • Asymmetric • We look mainly at retrieval mode!

  12. Compression Categories

  13. Basic Encoding Steps

  14. Run-Length Coding • Assumption • Long sequences of identical symbols • Example

  15. End of plane No 0s before a 1 Bit-Plane Coding • Assumption • Even longer sequences of identical bits • Example 10,0,6,0,0,3,0,2,2,0,0,2,0,0,1,0, … ,0,0 (absolute) 0,x,1,x,x,1,x,0,0,x,x,1,x,x,0,x, … ,x,x (sign bits) 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, … ,0,0 (MSB) 0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0, … ,0,0 (MSB-1) 1,0,1,0,0,1,0,1,1,0,0,1,0,0,0,0, … ,0,0 (MSB-2) 0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0, … ,0,0 (MSB-3) (0,1) (2,1) (0,0)(1,0)(2,0)(1,0)(0,0)(2,1) (5,0)(8,1) • Up to 20% savings over run-length coding can be achieved

  16. Huffman Coding • Assumption • Some symbols occur more often than others • E.g., character frequencies of the English language • Fundamental principle • Frequently occurring symbols are coded with shorter bit strings

  17. Huffman Coding • Example • Characters to be encoded: • A, B, C, D, E • Probability to occur: • p(A)=0.3, p(B)=0.3, p(C)=0.1, p(D)=0.15, p(E)=0.15

  18. Huffman • Table and example of application to data stream

  19. JPEG • “JPEG”: Joint Photographic Expert Group • International Standard: • For digital compression and coding of continuous-tone still images: • Gray-scale • Color • Since 1992 • Joint effort of: • ISO/IEC JTC1/SC2/WG10 • Commission Q.16 of CCITT SGVIII • Compression rate of 1:10 yields reasonable results

  20. JPEG • Very general compression scheme • Independence of • Image resolution • Image and pixel aspect ratio • Color representation • Image complexity and statistical characteristics • Well-defined interchange format of encoded data • Implementation in • Software only • Software and hardware

  21. JPEG • Sequence of compression steps • Different resolutions possible • Lossy or lossless mode • lossless compression factor ~1,6:1 • Symmetrical codec

  22. JPEG – Baseline Mode: Quantization • Use of quantization tables for the DCT-coefficients • Map interval of real numbers to one integer number • Allows to use different granularity for each coefficient

  23. JPEG – 4 Modes of Compression

  24. Motion JPEG • Use series of JPEG frames to encode video • Pro • Lossless mode – editing advantage • Frame-accurate seeking – editing advantage • Arbitrary frame rates – playback advantage • Arbitrary frame skipping – playback advantage • Scaling through progressive mode – distribution advantage • Min transmission delay = 1/framerate – conferencing advantage • Supported by popular frame grabbers • Contra • Series of JPEG-compressed images • No standard, no specification • Worse, several competing quasi-standards • No relation to audio • No inter-frame compression

  25. International Standard Video codec for video conferences at p x 64kbit/s (ISDN): Real-time encoding/decoding, max. signal delay of 150ms Constant data rate Intraframe coding DCT as in JPEG baseline mode Interframe coding, motion estimation Search of similar macroblock in previous image and compare Position of this macroblock defines motion vector Difference between similar macroblocks H.261 (px64)

  26. International Standard: Compression of audio and video for playback (1.5 Mbit/s): Real-time decoding Sequence of I-, P-, and B-Frames: Random access at I-frames at P-frames: i.e. decode previous I-frame first at B-frame: i.e. decode I and P-frames first MPEG (Moving Pictures Expert Group)

  27. MPEG-2 • From MPEG-1 to MPEG-2 • Improvement in quality • From VCR to TV to HDTV • No CD-ROM based constraints • Higher data rates • MPEG-1: about 1.5 MBit/s • MPEG-2: 2-100 MBit/s • Evolution • 1994: International Standard • Also later known as H.262 • Prominent role for digital TV in DVB (digital video broadcasting) and DVD (digital video disk) • Commercial MPEG-2 realizations available

  28. MPEG-2 • Beyond MPEG-1: • Higher quality encoding • Higher data rates • Interleaved modes • Use cases • Broadcast quality production • DVB-T: Terrestrial • DVB-S: Satellite • DVB-C: Cable • Program Stream • for post-processing, storage, and DVD distribution • Transport Stream • for broadcasting, error resilience • Scaling: • Signal to Noise Ration (SNR) scaling - progressive compression error correcting codes • Spatial scaling - several pixel resolutions • Temporal scaling - frame dropping

  29. MPEG-4 • MPEG-4 (ISO 14496) originally • Targeted at systems with very scarce resources • To support applications like • Mobile communication • Videophone and E-mail • Max. data rates and dimensions (roughly) • Between 4800 and 64000 bits/s • 176 columns x 144 lines x 10 frames/s • Further demand • To provide enhanced functionality to allow for analysis and manipulation of image contents

  30. MPEG-4 • Hence: find standardized ways to • Represent units of aural, visual or audiovisual content • audio/visual objects" or AVOs • object coding independent of other objects, surroundings and background • natural and synthetic objects • Compose these objects together • i.e. creation of compound objects that form audiovisual scenes • Multiplex and synchronize the data associated with AVOs • for transportation over network channels providing a QoS (Quality-of-Service) • Interact with the audiovisual scene generated at the decoder’s site

  31. MPEG-4: Scope • Definition of • „System Decoder Model“ • specification for decoder implementations • Description language • binary syntax of an AV object’s bitstream representation • scene description information • Corresponding concepts, tools and algorithms, especially for • content-based compression of simple and compound audiovisual objects • manipulation of objects • transmission of objects • random access to objects • animation • scaling • error robustness

  32. MPEG-4: Scope • Targeted bit rates for video and audio: • VLBV core • „Very Low Bit-rate Video“ • 5 - 64 Kbit/s • image sequences with CIF resolution and up to 15 frames/s • Higher-quality video • 64 Kbit/s - 4 Mbit/s • quality like digital TV • Natural audio coding • 2 - 64 Kbit/s

  33. MPEG-4: Video and Image Encoding • Encoding / decoding of • Rectangular images and video • coding similar to MPEG-1/2 • motion prediction • texture coding • Images and video of arbitrary shape • as done in conventional approach • 8x8 DCT or shape-adaptive DCT • plus coding of shape and transparency information • Encoder • Must generate timing information • speed of the encoder clock = time base • desired decoding times and/or expiration times • by using time stamps attached to the stream • Can specify the minimum buffer resources needed for decoding

  34. MPEG-4: Composition of Scenes • Scene description includes: • Tree to define hierarchical relationships between objects • Objects’ positions in space and time • by converting the objects’ local coordinate system into a global coordinate system • Attribute value selection • e.g. pitch of sound, color, texture, animation parameters • Description based on some VRML concepts • VRML = „Virtual Reality Modeling Language“ • Interaction with scenes • e.g. change viewing point, drag object, start/stop streams, select language

  35. MPEG-4: Example of a Composition

  36. MPEG-4: Synthetic Objects • Visual objects: • Virtual parts of scenes • e.g. virtual background • Animation • e.g. animated faces • Audio objects: • „Text-to-speech“ • speech generation from given text and prosodic parameters • face animation control • „Score driven synthesis“ • music generation from a score • more general than MIDI • Special effects

  37. MPEG-4: Error Handling • Mobile communication: • Low bit-rate (< 64 Kbps) • Error-prone • MPEG-4 concepts for error handling: • Resynchronization • enables receiver to „tune in“ again • based on markers within bitstream • Data recovery • enables receiver to reconstruct lost data • encode data in an error-resilient manner • Error concealment • enables receiver to bridge gaps in data • e.g. by repeating parts of old frames

  38. Network-aware coding

  39. Network-aware coding • Adapt to reality of the Internet • Content • Is created once, off-line • Is sent many times, under different circumstances • No guarantees concerning • Throughput • Jitter • Packet loss • Sending rate • Must adhere to rules • Often: don’t send more than TCP would • Can’t send at the best available encoding rate

  40. Approaches • Simulcast • Scalable coding • SNR Scalability • Temporal Scalability • Spatial Scalability • Fine Grained Scalability • Multiple Description Coding

  41. 3 simulcast rates Simulcast • Choose a set of sending rates • During content creation • Encode content in best possible quality below that sending rate • During transmission • Choose version with the best admissable quality Best possible quality at possible sending rate Quality Single rate codec Sending rate

  42. Enhancement layer Best possible quality at possible sending rate Quality Base layer Sending rate Scalable coding • Typically used asLayered coding • A base layer • Provides basic quality • Must always be transferred • One or moreenhancement layers • Improve quality • Transferred if possible

  43. Temporal Scalability • Frames can be dropped • In a controlled manner • Frame dropping does not violate dependancies • Low gain example: B-frame dropping in MPEG-1

  44. 73 72 61 75 83 -1 2 -12 10 Spatial Scalability • Idea • Base layer • Downsample the original image (code only 1 pixel instead of 4) • Send like a lower resolution version • Enhancement layer • Subtract base layer pixels from all pixels • Send like a normal resolution version • If enhancement layer arrives at client • Decode both layers • Add layers Base layer Less data to code Enhancement layer Better compression due to low values

  45. DS DCT Q VLC - - DCT Q VLC + + DCT Q VLC Spatial Scalability raw video base layer DS enhancement layer enhancement layer 2 DS - downsampling DCT – discrete cosine transformation Q – quantization VLC – variable length coding

  46. SNR Scalability • SNR – signal-to-noise ratio • Idea • Base layer • Is regularly DCT encoded • A lot of data is removed using quantization • Enhancement layer is regularly DCT encoded • Run Inverse DCT on quantized base layer • Subtract from original • DCT encode the result • If enhancement layer arrives at client • Add base and enhancement layer before running Inverse DCT

  47. SNR Scalability DCT Q VLC raw video base layer - IQ + enhancement layer Q VLC DCT – discrete cosine transformation Q – quantization IQ – inverse quantization VLC – variable length coding

  48. Fine Grained Scalability • Idea • Cut of compressed tail bits of samples • Base layer • As in SNR coding • Enhancement layer • Use bit-plane coding for enhancement layerinstead of run-level coding • Cut tail bits off until data rate is reached

  49. Best possible quality at possible sending rate Goal of FGS Quality Sending rate Fine Grained Scalability MSB (0,1) MSB-1 (2,1) MSB-2 (0,0)(1,0)(2,0)(1,0)(0,0)(2,1) MSB-3 (5,0)(8,1) …

  50. Fine Grained Scalability DCT Q VLC raw video base layer - IQ + enhancement layer Q BC DCT – discrete cosine transformation Q – quantization IQ – inverse quantization VLC – variable length coding BC – bitplane coding

More Related