Introducing Audio Signal Processing & Audio Coding

Introducing Audio Signal Processing & Audio Coding • Dr Michael Mason • Senior Manager, CE Technology • Dolby Australia Pty Ltd

Overview • Audio Signal Processing Applications @ Dolby • Audio Signal Processing Basics • Sampling • What is an audio signal? • Signal Processing Domains • Case Study – Headphone Virtualisation • Frequency Response • FIR filtering • Computational Complexity Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Applications @ Dolby

Audio Signal Processing Applications @ Dolby • Cinema • Delivering channel based audio - 5.1 – 7.1 • Distribute movies to multiple screens in a multiplex • Cinemas use speaker arrays – rather than single speakers – so processing required to fill the arrays from single channel feeds • Rendering immersive audio – Dolby Atmos • Cinema soundtrack is express as individual objects and locations - in every cinema the movie is renderer for that specific cinema’s speaker locations • Speaker equalization & protection • Process the audio sent to each speaker to compensate for the electro-acoustic properties of the speaker. (e.g., frequency response, distortion characteristics) • Ensure that audio sent to the speakers doesn’t over driver the speaker, which would damage them. • High channel count amplifiers Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Applications @ Dolby • Broadcast / Home Theatre • Compression of Audio for Streaming / DVD / Blu-ray Disc • Perceptual audio coding (case study later) • Matrix encoding (Pro-logic) • Multi-channel audio coding • Multiple languages • Multiple playback formats (stereo / 5.1 / etc) • Broadcast end-to-end • Capture, mixing, coding, transmission, playback • AV Receivers (AVRs), Set Top Boxes (STBs), Digital Media Adapters (DMAs) • Games consoles Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Applications @ Dolby • Personal Audio • Devices • Mobile phones (feature phones & smart phones) • Tablets • Music players • PCs • Same issues as Home Theatre, but usually more limited acoustic hardware (i.e. cheap speakers) • Headphone playback is a big use case (case study later) Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Applications @ Dolby • Voice Processing • Many of the ‘same’ basic challenges – but because speech has some specifically different characteristics from general audio, different solutions exist • Speech coders use different approaches than audio codecs • What makes a good codec is measured differently • The transmission bandwidths used for the data is much more limited • Conferencing & Telephony Introducing Audio Signal Processing & Audio Coding

DOLBY DIMENSION Introducing Audio Signal Processing & Audio Coding

The Products with Dolby processing Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Basics

Audio Signal Processing Basics • Sampling • Digital signals have samples which are discrete in time and magnitude • Process of converting a continuous signal to the digital domain is Sampling • Two key questions when sampling are: How often to sample & how precisely? Digital Signal Processing Analogue to Digital Converter (ADC) Digital to Analogue Converter (DAC) Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Basics • Sampling Frequency – (how often?) • Number of samples per second • Nyquist rate: • Greater than twice the highest frequency T f0 = 1/T fs= 2/T T/2 Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Basics • Resolution (how precisely?) • Each sample is represented by a number, how many bits should we use? • Converting a continuous value to a discrete value requires quantisation. • Quantisation Error • ‘1’ → 0.5 • ‘0’ → -0.5 1 Digital 0 +1.0 -1.0 Analogue Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Basics 101 • Resolution (how precisely?) • By using more bits, we reduce the error … skipping all the math … • Each additional bit of resolution improves SNR (signal to noise ratio) by 6.02 dB 000 +1.0 -1.0 Analogue Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Basics • Audio Signal • Sampling Frequency • Human perception – 20 Hz – 20,000 Hz • Nyquist says Fs >= 40 kHz • CD Audio: 44.1 kHz • Blu-ray (and before that DAT): 48 kHz • Bit depth • Range of loudness relative to human hearing… • Threshold of hearing – 0 dB • Jet Engines – 110-140 dB • Busy Road (standing at the curb) – 100 dB • Sustained exposure will cause damage – 85 dB • 16 bits per sample gives ~ 96 dB of dynamic range • 24 bits per sample = 144 dB When/Where might we use more? (higher sampling rate or more bits?) Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Basics • Audio Signal data rates • 48 kHz, 16 bits per sample = 768 kbps / ch • 3.86 GB for a 2hr movie (5.1 channels) (NB: DVD capacity = 4.7GB) • 4G has 5-12 Mbps bandwidth (down) compared to 5.1 channels of audio ~4.6 Mbps • For practical transmission of Audio it needs to be compressed Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Basics • Processing domains • Sampled audio i.e., Pulse Code Modulated (PCM) data is in the time domain • Not everything we want to do with audio is formulated as a time domain operation • e.g., Flattening the frequency response of a speaker • The Fourier Transform expresses a signal in terms of it’s frequency components (sinusoids). Using it we can formulate processing in the frequency domain • Whether processing is implemented in the time or the frequency domain can depend on where it is most efficient. • Signal processing also has other useful transform domains which may offer advantages for specific types of processing • e.g., image coding often uses the Discrete Cosine Transform – DCT Introducing Audio Signal Processing & Audio Coding

Headphone Virtualisation • Case Study 1

Headphone Virtualisation • How do you get surround sound out of a pair of headphones? Introducing Audio Signal Processing & Audio Coding

Headphone Virtualisation • Two things we need to achieve: • Make it sound like the audio is coming from different directions • Make it sound like the listener is in a room. • Both can be achieved by filtering the signal using the impulse response of the room (RIR) and the head-related transfer functions (HRTF). Introducing Audio Signal Processing & Audio Coding

Headphone Virtualisation • Room impulse response • By measuring how a short impulsive sound is altered by a room, the room’s reflections and echoes can be characterised to create an impulse response. https://www.youtube.com/watch?v=PkZjIHTJ4jc • The impulse response can in turn be used to filter any signal, to make it sound like it was in the room. • The process of filtering a signal using an impulse response is convolution: Introducing Audio Signal Processing & Audio Coding

Room impulse response • How many points would be required to capture a room? (i.e. how long is the impulse response?) • Limiting the impulse response to 50ms gives us 1440 points (@48kHz) • Considering the computational cost: 1440 * 48k –> 69 MFLOPS Headphone Virtualisation Introducing Audio Signal Processing & Audio Coding

Computational load • On a DSP chip with a single cycle MAC -> 69 MCPS • On an ARM, ‘MAC’s ~ 3.5 cycles each -> ~240 MCPS • 5.1 channels -> 10 filters = 2,400 MCPS Headphone Virtualisation Introducing Audio Signal Processing & Audio Coding

Headphone Virtualisation • The solution? • Convolution in Time domain <-> Multiplication in Frequency Domain • Fourier Transform the impulse response & the signal • Block based, e.g., blocks of 2048 • O[N.log2(N)] -> k*22528 ~ 78,848 • Operate in the Frequency domain, • Complex multiplies -> 4 * 2048 -> 8,192 • Transform the result back to the time domain. • Same as forward transform • Blocks per second? • 23 blocks/sec … ~4 MFLOPS / filter • What about the HRTFs ? Introducing Audio Signal Processing & Audio Coding

Headphone Virtualisation • Head-related Transfer Function • Measured on a dummy • Applied as filters • Same computational arguments lead us to the need to apply these in the frequency domain. • NB: we don’t need to go back to the time domain between the two sets of filters Introducing Audio Signal Processing & Audio Coding

Dolby Atmos for headphones debuted in Blizzard’s Overwatch Introducing Audio Signal Processing & Audio Coding

Dolby Atmos for Headphones is available in Windows through Dolby Access App Introducing Audio Signal Processing & Audio Coding

Perceptual Audio Coding • Case study 2

Perceptual Audio Coding • How do you reduce the storage and transmission bandwidth requirements of Audio signals? • Bitrates: • Uncompressed : 768 kbps / ch • DVD (AC3) : 448 kbps (5.1 channels) (~10:1 compression ratio) Introducing Audio Signal Processing & Audio Coding

Audio Coding is Lossy • Lossless compression: must perfectly reconstruct their source. (zip files) • Lossy compression: can ‘throw away’ data if it isn’t ‘needed’. The reconstruction need only be ‘good enough.’ • Deciding which bits to ‘throw away’ and what is ‘good enough’ is the hard part. Perceptual Audio Coding Introducing Audio Signal Processing & Audio Coding

Perceptual Audio Coding Time/Frequency analysis Quantisation Entropy coding Psychoacoustic analysis Bit allocation Introducing Audio Signal Processing & Audio Coding

Perceptual Audio Coding • Psychoacoustics • Study of sound Perception • Perception implies the human experience – which include physiological and psychological factors. http://auditoryneuroscience.com/vocalizations-speech/mcgurk-effect • Is at the heart of the question of which parts of an audio signal are important, or unimportant. Introducing Audio Signal Processing & Audio Coding

Perceptual Audio Coding • Psychoacoustics • Most perceptual quantities are non-linear and subjective • Loudness • Non-linearly related to sound pressure • Scales include: sone, phon • Pitch • Non-linearly related to frequency • Scales include: Bark, Mel, ERB Introducing Audio Signal Processing & Audio Coding

Perceptual Audio Coding • Frequency Masking Introducing Audio Signal Processing & Audio Coding

Perceptual Audio Coding • Temporal Masking Introducing Audio Signal Processing & Audio Coding

Time/Frequency analysis Quantisation Perceptual Audio Coding Psychoacoustic analysis Bit allocation • Time/Frequency analysis • Break the incoming signal into time blocks and transform into the frequency domain • Coding is always block based • The frequency representation is analysed in bins of equal perceptual bandwidth (bark) • Psychoacoustic analysis • Use the frequency representation of the current block to calculate the masking curve • Use the frequency masking curves from previous frames to account for temporal masking Introducing Audio Signal Processing & Audio Coding

Perceptual Audio Coding • Masking Curve • Areas of the spectrum where the masking curve is above the signal energy, represent ‘things we can’t hear’ • If we can’t hear them, we shouldn’t spend bits encoding them Introducing Audio Signal Processing & Audio Coding

Time/Frequency analysis Quantisation Perceptual Audio Coding Psychoacoustic analysis Bit allocation • Bit allocation • Using the masking curve, we can calculate the allowed signal to noise ratio in each of the frequency bands • Knowing that allocating a bit to a quantiser improves SNR by 6 dB, iterative allocate the bits available in the bit pool to band, until we either; run out of bits, or exceed the SNR requirements in all bands • (any left over bits can be used to code the next frame) • The bit distribution must be sent to the decoder • Quantiser • Quantise the frequency domain representation to send to the decoder. Introducing Audio Signal Processing & Audio Coding

Perceptual Audio Coding • Decoding is ‘simple’ • Recreate the frequency representation of each frame • Transform back to the time domain • Additional processing can be used to enhance the reconstructed signal Introducing Audio Signal Processing & Audio Coding

Summary

Summary • Audio Signal Processing Applications • Audio Signal Processing Basics • DSP requires sampling. • Audio signals are those we can hear – which tells us the sampling rate and bit depth we need. • We can process in different domains, e.g., time domain or frequency domain. • Case Study – Headphone Virtualisation • Reproduce a scene through headphones by simulating the room and your head shape • Key tool is FIR filtering with impulse responses. • Dur to computational complexity, we apply the filters using FFT • Case Study – Perceptual Audio Coding • Audio Coding is Lossy Compression • Psychoacoustics is the study of how humans perceive sound • Masking phenomena can be used to tell us which bits to ‘throw away’ when encoding Introducing Audio Signal Processing & Audio Coding

Introducing Audio Signal Processing & Audio Coding