1 / 42

Introducing Audio Signal Processing & Audio Coding

Introducing Audio Signal Processing & Audio Coding. Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd. Overview. Audio Signal Processing Applications @ Dolby Audio Signal Processing Basics Sampling What is an audio signal? Signal Processing Domains

charless
Download Presentation

Introducing Audio Signal Processing & Audio Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introducing Audio Signal Processing & Audio Coding • Dr Michael Mason • Senior Manager, CE Technology • Dolby Australia Pty Ltd

  2. Overview • Audio Signal Processing Applications @ Dolby • Audio Signal Processing Basics • Sampling • What is an audio signal? • Signal Processing Domains • Case Study – Headphone Virtualisation • Frequency Response • FIR filtering • Computational Complexity Introducing Audio Signal Processing & Audio Coding

  3. Audio Signal Processing Applications @ Dolby

  4. Audio Signal Processing Applications @ Dolby • Cinema • Delivering channel based audio - 5.1 – 7.1 • Distribute movies to multiple screens in a multiplex • Cinemas use speaker arrays – rather than single speakers – so processing required to fill the arrays from single channel feeds • Rendering immersive audio – Dolby Atmos • Cinema soundtrack is express as individual objects and locations - in every cinema the movie is renderer for that specific cinema’s speaker locations • Speaker equalization & protection • Process the audio sent to each speaker to compensate for the electro-acoustic properties of the speaker. (e.g., frequency response, distortion characteristics) • Ensure that audio sent to the speakers doesn’t over driver the speaker, which would damage them. • High channel count amplifiers Introducing Audio Signal Processing & Audio Coding

  5. Audio Signal Processing Applications @ Dolby • Broadcast / Home Theatre • Compression of Audio for Streaming / DVD / Blu-ray Disc • Perceptual audio coding (case study later) • Matrix encoding (Pro-logic) • Multi-channel audio coding • Multiple languages • Multiple playback formats (stereo / 5.1 / etc) • Broadcast end-to-end • Capture, mixing, coding, transmission, playback • AV Receivers (AVRs), Set Top Boxes (STBs), Digital Media Adapters (DMAs) • Games consoles Introducing Audio Signal Processing & Audio Coding

  6. Audio Signal Processing Applications @ Dolby • Personal Audio • Devices • Mobile phones (feature phones & smart phones) • Tablets • Music players • PCs • Same issues as Home Theatre, but usually more limited acoustic hardware (i.e. cheap speakers) • Headphone playback is a big use case (case study later) Introducing Audio Signal Processing & Audio Coding

  7. Audio Signal Processing Applications @ Dolby • Voice Processing • Many of the ‘same’ basic challenges – but because speech has some specifically different characteristics from general audio, different solutions exist • Speech coders use different approaches than audio codecs • What makes a good codec is measured differently • The transmission bandwidths used for the data is much more limited • Conferencing & Telephony Introducing Audio Signal Processing & Audio Coding

  8. DOLBY DIMENSION Introducing Audio Signal Processing & Audio Coding

  9. The  Products with Dolby processing Introducing Audio Signal Processing & Audio Coding

  10. Audio Signal Processing Basics

  11. Audio Signal Processing Basics • Sampling • Digital signals have samples which are discrete in time and magnitude • Process of converting a continuous signal to the digital domain is Sampling • Two key questions when sampling are: How often to sample & how precisely? Digital Signal Processing Analogue to Digital Converter (ADC) Digital to Analogue Converter (DAC) Introducing Audio Signal Processing & Audio Coding

  12. Audio Signal Processing Basics • Sampling Frequency – (how often?) • Number of samples per second • Nyquist rate: • Greater than twice the highest frequency T f0 = 1/T fs= 2/T T/2 Introducing Audio Signal Processing & Audio Coding

  13. Audio Signal Processing Basics • Resolution (how precisely?) • Each sample is represented by a number, how many bits should we use? • Converting a continuous value to a discrete value requires quantisation. • Quantisation Error • ‘1’ → 0.5 • ‘0’ → -0.5 1 Digital 0 +1.0 -1.0 Analogue Introducing Audio Signal Processing & Audio Coding

  14. Audio Signal Processing Basics 101 • Resolution (how precisely?) • By using more bits, we reduce the error … skipping all the math … • Each additional bit of resolution improves SNR (signal to noise ratio) by 6.02 dB 000 +1.0 -1.0 Analogue Introducing Audio Signal Processing & Audio Coding

  15. Audio Signal Processing Basics • Audio Signal • Sampling Frequency • Human perception – 20 Hz – 20,000 Hz • Nyquist says Fs >= 40 kHz • CD Audio: 44.1 kHz • Blu-ray (and before that DAT): 48 kHz • Bit depth • Range of loudness relative to human hearing… • Threshold of hearing – 0 dB • Jet Engines – 110-140 dB • Busy Road (standing at the curb) – 100 dB • Sustained exposure will cause damage – 85 dB • 16 bits per sample gives ~ 96 dB of dynamic range • 24 bits per sample = 144 dB When/Where might we use more? (higher sampling rate or more bits?) Introducing Audio Signal Processing & Audio Coding

  16. Audio Signal Processing Basics • Audio Signal data rates • 48 kHz, 16 bits per sample = 768 kbps / ch • 3.86 GB for a 2hr movie (5.1 channels) (NB: DVD capacity = 4.7GB) • 4G has 5-12 Mbps bandwidth (down) compared to 5.1 channels of audio ~4.6 Mbps • For practical transmission of Audio it needs to be compressed Introducing Audio Signal Processing & Audio Coding

  17. Audio Signal Processing Basics • Processing domains • Sampled audio i.e., Pulse Code Modulated (PCM) data is in the time domain • Not everything we want to do with audio is formulated as a time domain operation • e.g., Flattening the frequency response of a speaker • The Fourier Transform expresses a signal in terms of it’s frequency components (sinusoids). Using it we can formulate processing in the frequency domain • Whether processing is implemented in the time or the frequency domain can depend on where it is most efficient. • Signal processing also has other useful transform domains which may offer advantages for specific types of processing • e.g., image coding often uses the Discrete Cosine Transform – DCT Introducing Audio Signal Processing & Audio Coding

  18. Headphone Virtualisation • Case Study 1

  19. Headphone Virtualisation • How do you get surround sound out of a pair of headphones? Introducing Audio Signal Processing & Audio Coding

  20. Headphone Virtualisation • Two things we need to achieve: • Make it sound like the audio is coming from different directions • Make it sound like the listener is in a room. • Both can be achieved by filtering the signal using the impulse response of the room (RIR) and the head-related transfer functions (HRTF). Introducing Audio Signal Processing & Audio Coding

  21. Headphone Virtualisation • Room impulse response • By measuring how a short impulsive sound is altered by a room, the room’s reflections and echoes can be characterised to create an impulse response. https://www.youtube.com/watch?v=PkZjIHTJ4jc • The impulse response can in turn be used to filter any signal, to make it sound like it was in the room. • The process of filtering a signal using an impulse response is convolution: Introducing Audio Signal Processing & Audio Coding

  22. Room impulse response • How many points would be required to capture a room? (i.e. how long is the impulse response?) • Limiting the impulse response to 50ms gives us 1440 points (@48kHz) • Considering the computational cost: 1440 * 48k –> 69 MFLOPS Headphone Virtualisation Introducing Audio Signal Processing & Audio Coding

  23. Computational load • On a DSP chip with a single cycle MAC -> 69 MCPS • On an ARM, ‘MAC’s ~ 3.5 cycles each -> ~240 MCPS • 5.1 channels -> 10 filters = 2,400 MCPS Headphone Virtualisation Introducing Audio Signal Processing & Audio Coding

  24. Headphone Virtualisation • The solution? • Convolution in Time domain <-> Multiplication in Frequency Domain • Fourier Transform the impulse response & the signal • Block based, e.g., blocks of 2048 • O[N.log2(N)] -> k*22528 ~ 78,848 • Operate in the Frequency domain, • Complex multiplies -> 4 * 2048 -> 8,192 • Transform the result back to the time domain. • Same as forward transform • Blocks per second? • 23 blocks/sec … ~4 MFLOPS / filter • What about the HRTFs ? Introducing Audio Signal Processing & Audio Coding

  25. Headphone Virtualisation • Head-related Transfer Function • Measured on a dummy • Applied as filters • Same computational arguments lead us to the need to apply these in the frequency domain. • NB: we don’t need to go back to the time domain between the two sets of filters Introducing Audio Signal Processing & Audio Coding

  26. Dolby Atmos for headphones debuted in Blizzard’s Overwatch Introducing Audio Signal Processing & Audio Coding

  27. Dolby Atmos for Headphones is available in Windows through Dolby Access App Introducing Audio Signal Processing & Audio Coding

  28. Perceptual Audio Coding • Case study 2

  29. Perceptual Audio Coding • How do you reduce the storage and transmission bandwidth requirements of Audio signals? • Bitrates: • Uncompressed : 768 kbps / ch • DVD (AC3) : 448 kbps (5.1 channels) (~10:1 compression ratio) Introducing Audio Signal Processing & Audio Coding

  30. Audio Coding is Lossy • Lossless compression: must perfectly reconstruct their source. (zip files) • Lossy compression: can ‘throw away’ data if it isn’t ‘needed’. The reconstruction need only be ‘good enough.’ • Deciding which bits to ‘throw away’ and what is ‘good enough’ is the hard part. Perceptual Audio Coding Introducing Audio Signal Processing & Audio Coding

  31. Perceptual Audio Coding Time/Frequency analysis Quantisation Entropy coding Psychoacoustic analysis Bit allocation Introducing Audio Signal Processing & Audio Coding

  32. Perceptual Audio Coding • Psychoacoustics • Study of sound Perception • Perception implies the human experience – which include physiological and psychological factors. http://auditoryneuroscience.com/vocalizations-speech/mcgurk-effect • Is at the heart of the question of which parts of an audio signal are important, or unimportant. Introducing Audio Signal Processing & Audio Coding

  33. Perceptual Audio Coding • Psychoacoustics • Most perceptual quantities are non-linear and subjective • Loudness • Non-linearly related to sound pressure • Scales include: sone, phon • Pitch • Non-linearly related to frequency • Scales include: Bark, Mel, ERB Introducing Audio Signal Processing & Audio Coding

  34. Perceptual Audio Coding • Frequency Masking Introducing Audio Signal Processing & Audio Coding

  35. Perceptual Audio Coding • Temporal Masking Introducing Audio Signal Processing & Audio Coding

  36. Time/Frequency analysis Quantisation Perceptual Audio Coding Psychoacoustic analysis Bit allocation • Time/Frequency analysis • Break the incoming signal into time blocks and transform into the frequency domain • Coding is always block based • The frequency representation is analysed in bins of equal perceptual bandwidth (bark) • Psychoacoustic analysis • Use the frequency representation of the current block to calculate the masking curve • Use the frequency masking curves from previous frames to account for temporal masking Introducing Audio Signal Processing & Audio Coding

  37. Perceptual Audio Coding • Masking Curve • Areas of the spectrum where the masking curve is above the signal energy, represent ‘things we can’t hear’ • If we can’t hear them, we shouldn’t spend bits encoding them Introducing Audio Signal Processing & Audio Coding

  38. Time/Frequency analysis Quantisation Perceptual Audio Coding Psychoacoustic analysis Bit allocation • Bit allocation • Using the masking curve, we can calculate the allowed signal to noise ratio in each of the frequency bands • Knowing that allocating a bit to a quantiser improves SNR by 6 dB, iterative allocate the bits available in the bit pool to band, until we either; run out of bits, or exceed the SNR requirements in all bands • (any left over bits can be used to code the next frame) • The bit distribution must be sent to the decoder • Quantiser • Quantise the frequency domain representation to send to the decoder. Introducing Audio Signal Processing & Audio Coding

  39. Perceptual Audio Coding • Decoding is ‘simple’ • Recreate the frequency representation of each frame • Transform back to the time domain • Additional processing can be used to enhance the reconstructed signal Introducing Audio Signal Processing & Audio Coding

  40. Summary

  41. Summary • Audio Signal Processing Applications • Audio Signal Processing Basics • DSP requires sampling. • Audio signals are those we can hear – which tells us the sampling rate and bit depth we need. • We can process in different domains, e.g., time domain or frequency domain. • Case Study – Headphone Virtualisation • Reproduce a scene through headphones by simulating the room and your head shape • Key tool is FIR filtering with impulse responses. • Dur to computational complexity, we apply the filters using FFT • Case Study – Perceptual Audio Coding • Audio Coding is Lossy Compression • Psychoacoustics is the study of how humans perceive sound • Masking phenomena can be used to tell us which bits to ‘throw away’ when encoding Introducing Audio Signal Processing & Audio Coding

More Related