260 likes | 418 Views
Compression & Streaming. Serving, shrinking, and otherwise messing about with perfectly good audio files. Loudness and power. Loudness related to force with which a sound presses on your eardrum The more power, the louder the sound
E N D
Compression & Streaming Serving, shrinking, and otherwise messing about with perfectly good audio files Dr Paul Vickers
Loudness and power • Loudness related to force with which a sound presses on your eardrum • The more power, the louder the sound • Power is proportional to the square of a sound’s intensity (amplitude, or voltage) Dr Paul Vickers
Sampling error and noise • CD audio uses 44.1 KHz at 16 bit resolution • Sampled voltages quantised to –32768…32767 • Quantisation introduces error (through rounding) • Largest error is 0.5 which is 2-16 times as loud as the loudest sample value • Power related to square of amplitude so error has power 2-32 as loud as loudest signal • Ratio of signal to error (noise) is 232:1 • Or 96.3 dB (10 log10(232)) • = SNR of 96 dB Dr Paul Vickers
Signal to noise ratio • So, CD audio has SNR of 96 dB • 8-bit sampling has SNR of 48 dB • Therefore, 1 bit of resolution adds approx. 6 dB to the dynamic range • Threshold of pain is 120 dB so we need a 20-bit resolution to capture the dynamic range of human auditory system • Loud samples are rare, so noise is more noticeable than the theory would suggest Dr Paul Vickers
Coding • A standard .WAV file (no such thing) stores samples as 16-bit values. • These values are codes representing the voltages (amplitudes) of the signal • System called pulse code modulation (contrast with pulse amplitude modulation and pulse width modulation) • WAV format actually supports nearly 100 different coding systems Dr Paul Vickers
Compression • Lossless compression (e.g. LZW) does not work well on audio as there are very few repeating patterns • Sampled audio tends to have random noise in the least significant bits making very few bytes identical. • Winzip hardly compresses audio files at all • Try girl2.wav and 528 Hz.wav. Why does the second file compress 2.33:1? Dr Paul Vickers
Other techniques • Need some different compression techniques • Popular ones are: • Differential PCM (DPCM) • Adaptive DPCM (ADPCM) • A-Law • µ-Law • Logarithmic & non-linear codings • Perceptual codings Dr Paul Vickers
Differential PCM • Consider the differences in value between individual samples at rates of, say, 44.1 KHz • Usually fairly small • Small differences need fewer bits than the samples themselves • So, DCPM stores sample differences, hence the name • Leads to some inaccuracy and requires look ahead to balance things out Dr Paul Vickers
DCPM example • To reduce 8-bit sample values to 4-bit differences • Consider three samples of 17, 28, 30 • Differences: 11, 2 • 4-bit system only allows values -8…+7 (1000…0111) • Thus 11 overflows, therefore clipped at 7 • But decompressing would then give 17, 24, 26 • But if we look at diff. between decompressed sample and next actual: • 17-28 = 11 -> 7. 17 + 7 = 24. Diff. 24-30 = 6 • Give 7, 6 which, when decompressed gives 17, 24, 30 Dr Paul Vickers
Predictor based compression • Try to predict next sample on basis of previous samples • If correct, no need to store sample as decompressor uses same rules and so can work it out too • If prediction correct, output 1 else output 0 followed by actual sample Dr Paul Vickers
Adaptive DPCM • ADPCM uses prediction • Outputs predicted differences. If accurate then diff between actual and predicted samples has lower variance than actual samples and thus take fewer bits • Uses 4-bit codes representing predicted diff. between two 16-bit samples Dr Paul Vickers
Sub-band coding • Low frequencies have fewer cycles per second and thus lots of small differences • High frequencies have larger differences • Dividing signal into frequency bands allows low frequencies to be coded with fewer bits than high frequencies • Bands to which ear is less sensitive can be less accurately stored Dr Paul Vickers
Speech compression • Musical sound has little silence • Speech has many pauses and silences • These can be replaced by duration codes • Can reduce a signal by 50% by doing this Dr Paul Vickers
Checkpointing • Predictive techniques need knowledge of what has gone before • If a stream (e.g. love radio feed) is opened in the middle, this state information is unavailable • Therefore, insert checkpoints that contain • Uncompressed samples, or • Compressor state vector • Checkpoints allow decompressor to reset itself Dr Paul Vickers
Non-linear coding • High sample rate gives wide dynamic range • Reducing from 16 bits to 8 bits halves storage requirements, but reduces dynamic range by 63,000 times (96 dB down to 48 dB) • Standard PCM is linear • Sample value 50 is twice the amplitude of 25 • In 8-bit system, sounds less than 1/256th of loudest possible signal disappears Dr Paul Vickers
Non-linear coding • Ear is quite insensitive to small changes in loud sounds but very sensitive to same small change in quieter sounds • Linear coding ideal of computational manipulation but wasteful • Non-linear coding uses a logarithmic scale • Value of 1 may be much less than 1/50th of intensity represented by value of 50 • More bits for quiet sounds and fewer bits for very loud sounds Dr Paul Vickers
-Law & A-Law • -Law and A-Law uses logarithmic compression to convert linear-coded PCM samples into 8-bit codes • Provide greater accuracy for the small (quiet) samples that form bulk of an audio signal • Human auditory system has (approx) logarithmic response so these techniques give highest accuracy where most audible • Dynamic range is 14 bits & 13 bits respec. (84 dB and 78 dB) Dr Paul Vickers
Perceptual coding • DPCM, ADPCM, -Law & A-Law do not give high-enough compression for demanding multimedia and web applications • Using psychoacoustic models of our auditory system we can take information out of the audio signal without changing its perceptual characteristics (well, sort of) • Linear PCM captures sound as it is • Perceptual coding captures audio as it sounds Dr Paul Vickers
Perceptual coding • PC uses knowledge of the masking properties of the human auditory system and our sensitivity to different frequency bands • PC introduces significant noise into the signal… • … but in such a way as we don’t hear it. • MP3, ATRAC (mini disc), DCC use perceptual coding techniques Dr Paul Vickers
Masking • Part of an audio signal can be inaudible • A loud sound can mask a simultaneous quiet sound • A quiet sound immediately following a very loud sound may also be inaudible • E.g. you have to turn up the radio when your car goes faster • E.g. A handclap (normally loud) heard straight after a gun shot would sound quiet • PC assigns fewer bits to masked signals Dr Paul Vickers
MPEG audio • MPEG audio layer 1, 2, & 3 • Most commonly use layer 3, hence MP3 • A standard for coding an audio stream into a bit stream at various bit rates • The higher the bit rate, the more data • At a bit rate of 96 kpbs achieve bandwidth of about 15 KHz and compression of 16:1 • At 128 kpbs, get closer to 20 KHz and compression of about 12:1 Dr Paul Vickers
ATRAC • Mini disc uses adaptive transform acoustic coding • Compression of 5:1 • Like MP3 uses perceptual coding and sub-band compression • ATRAC uses three sub-bands, MP3 uses 32 Dr Paul Vickers
Streaming • Streaming is the process of sending an audio file as a continuous stream that can be played back the moment the stream starts • Avoids having to download the file first • suitable for live situations, e.g. web casts, internet radio, etc. • Need to know about network capabilities of client • e.g. no point sending 128 kbps MP3 audio to a 56 k modem client Dr Paul Vickers
Streaming • Smooth signal heard where transmitter sends data at least as fast as client can decode it • low bandwidth connections and • network congestion lead to low stream rate = either poorer quality audio, or glitches and pauses • Popular formats are Real audio, MS ASF, Apple Quicktime Dr Paul Vickers
Creating streamed content • Very simple • Connect a live feed to a streaming-enable media producer • Use tools such as Windows Media Encoder or Real’s Helix Producer to turn audio files into streamable files. Even Sound Forge can save as .ASF and .RM • Select required bit rate/bandwidth • Some services provide multiple bit rates Dr Paul Vickers
Example http://computing.unn.ac.uk/staff/cgpv1/music!.htm Dr Paul Vickers