250 likes | 453 Views
An Introduction to the “Thor-like” Power of Ogg Vorbis !. Robert W. Ferguson III January 30, 2003. Xiphophorus. Xiphophorus is a freshwater fish genus comprised of 23 species.
E N D
An Introduction to the“Thor-like” Power of Ogg Vorbis! Robert W. Ferguson III January 30, 2003
Xiphophorus • Xiphophorus is a freshwater fish genus comprised of 23 species. • Since the 1920's its been known that one could make hybrids between the different species easily. In some cases, one simply had to place one Xiphophorus species next to another in an aquarium, and they would reproduce.
XIPH.COM • Xiphophorus is a non-profit organization responsible for the Ogg project. • Xiphophorus is GPL. • All cool companies have an X to start their name.
What Is Ogg Vorbis • The Ogg project is an open-source alternative to proprietary and patented codecs for digital media (for both audio and video). • The Vorbis project is responsible for the creation of a perceptual audio encoder similar to famous,inherently evil, proprietary codecs popularized by global, illegal file sharing.
It Is Not MP3 • Vorbis is in the same category as • MPEG-4 (AAC) • And similar to, but higher performance than • MPEG-1/2 audio layer 3 • MPEG-4 audio (TwinVQ) • WMA - Windows Media Audio • PAC
Classification • Vorbis I • Vorbis I is a forward-adaptive monolithic transform CODEC based on the Modified Discrete Cosine Transform. • The codec is structured to allow addition of a hybrid wavelet filter bank in Vorbis II to offer better transient response and reproduction using a transform better suited to localized time
Packets • Vorbis uses free-form packets that have no minimum size, maximum size, or fixed/expected size. Packets are designed that they may be truncated (or padded) and remain decodable.
Error Detection • Vorbis provides none of its own protection against errors. • It is solely a method of accepting input audio, dividing it into individual frames and compressing these frames into raw, unformatted 'packets'.
ATH – Absolute Threshold of Hearing • Most codecs assume volume is fixed during playback. Vobis assumes that volume can be adjusted.
Tone Masking • Tone masking is when louder frequencies mask out adjacent quieter ones. • Most codes use a psychoacoustics model to calculate what’s left as best as possible in given bit-rate limits. • Vorbis approximates the same thing using as many bits as it takes.
Coupling • Most sounds consist of many channels and have redundancy between these channels. This is exploited to lower the bit-rate if the channels are encoded in some joint representation. • The simplest example is to encode the average and the difference between channels (for a stereo sound) – this is called mid/side representation and it requires fewer bits for sections that are close to mono.
Channel Support • Vorbis supports up to 255 channels. • At the moment the encoder knows to use coupling for 2-channel files only, but eventually it will scale.
Vector Quantization • Vector Quantization (VQ) is a lossy data compression method where vectors are rounded off into encoding regions. • Basically if you group together numbers describing different channels, your channels become automatically coupled (normally a group would be picked from data describing a single channel, so channels would be approximated independently).
Vector Quantization… • The process of VQ introduces some vector quantization noise. The difference between the approximation (a limited number of these can be chosen) and the original group of numbers. • All codecs suffer from quantization problems. VQ should suffer less.
Memory Usage • The vector codebooks used in the first stage of decoding are packed, in their entirety into the Vorbis bit-stream headers. • In packed form, these codebooks occupy only a few kilobytes; The extent to which they are pre-decoded into a cache is the dominant factor in decoder memory usage.
Following the Standard • Any file that follows the decoding standard, regardless of encoding method follows the standard.
Headers • Identification Header • The identification header identifies the bitstream as Vorbis, Vorbis version, and the simple audio characteristics of the stream such as sample rate and number of channels. • Comment Header • The comment header includes user text comments ["tags"] and a vendor string for the application/library that produced the bitstream. • Setup Header • The setup header includes extensive CODEC setup information as well as the complete VQ and Huffman codebooks needed for decode.
Decoding Procedure • The decoding and synthesis procedure for all audio packets is fundamentally the same. • 5. decode residue into residue vectors • 6. inverse channel coupling of residue vectors • 7. generate floor curve from decoded floor data • 1. decode packet type flag • 2. decode mode number • 3. decode window shape [long windows only] • 4. decode floor
8. compute dot product of floor and residue, producing audio spectrum vector 9. inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I 10. overlap/add left-hand output of transform with right-hand output of previous frame 11. store right hand-data from transform of current frame for future lapping. 12. if not first frame, return results of overlap/add as audio result of current frame Decoding Procedure... Rearrangement of the synthesis arithmetic is possible.
Controversy • The entire probability model of the codec, the Huffman and VQ codebooks, is packed into the bitstream header along with extensive CODEC setup parameters (often several hundred fields). • It’s impossible to embed a simple frame type flag in each audio packet, or begin decode at any frame in the stream without having previously fetched the codec setup header. • Vorbis can initiate decode at any arbitrary packet within a bitstream so long as the codec has been initialized/setup with the setup headers.
Window Shape Decode • Vorbis frames use one of two PCM sample sizes specified during codec setup. In Vorbis I, legal frame sizes are powers of two from 64 to 8192 samples. Aside from coupling, Vorbis handles channels as independent vectors and these frame sizes are in samples per channel.
Overlapping Windows • Vorbis uses an overlapping transform, namely the MDCT, to blend one frame into the next, avoiding most inter-frame block boundary artifacts. The MDCT output of one frame is windowed according to MDCT requirements, overlapped 50% with the output of the previous frame and added. The window shape assures seamless reconstruction.
Dealing with Windows And slightly more complex in the case of overlapping unequal sized windows:
Inverse Monolithic Transform • The audio spectrum is converted back into time domain PCM audio via an inverse modified discrete cosine transform (MDCT). A detailed description of the MDCT is available in the paper The use of multirate filter banks for coding of high quality digital audio_, by T. Sporer, K. Brandenburg and B. Edler.