380 likes | 764 Views
NCC 2014 Kanpur, 28 Feb.- 2 Mar. 2014, Paper No. 1569847357 (Session III, Sat., 1 st Mar., 1020 – 1200) A Sliding-band Dynamic Range Compression for Use in Hearing Aids Nitya Tiwari Prem C. Pandey { nitya , pcpandey } @ ee.iitb.ac.in. IIT Bombay. Overview Introduction
E N D
NCC2014 Kanpur, 28 Feb.- 2 Mar. 2014, Paper No. 1569847357 (Session III, Sat., 1st Mar., 1020 – 1200) A Sliding-band Dynamic Range Compression for Use in Hearing Aids NityaTiwari Prem C. Pandey {nitya, pcpandey} @ ee.iitb.ac.in IIT Bombay
Overview Introduction Sliding-band Dynamic Range Compression Offline & Real-time Implementations Test Results Summary & Conclusion
1. Introduction • Sensorineuralhearing loss • Causes: abnormalities in the cochlear hair cells or the auditory nerve • Characteristics • Increase in hearing thresholds (due to loss of inner hair cells) • Loudness recruitment (abnormal loudness growth) & decrease in dynamic range (due to loss of outer hair cells) • Increased spectral & temporal masking, leading to degraded speech perception • Signal processing in hearing aids • Frequency selective amplification to compensate for frequency dependent elevation of hearing thresholds • Amplitude compression to compensate for decreased dynamic range
Dynamic range compression • Objective • To present sounds comfortably within the limited dynamic range of the listener by amplifying the low level sounds without making the high level sounds uncomfortably loud. • Processing steps • Input level estimation • Gain calculation based on input level • Multiplication of input with gain function • Output resynthesis • Classification • On the basis of signal level calculation: single-band or multiband • On the basis of gain control method: feedback or feed-forward
Single-band dynamic range compression • Processing • Gain dependent on the dynamically varying signal level. • Parameters • Compression threshold (Th) • Compression ratio (CR) • Attack & release time • Problems • Does not account for frequency dependent loudness growth function • Power mostly contributed by low-frequency components • → amplification of high-frequency components depends low-frequency components • → Inaudibility of high frequency components, distortions in temporal envelope
Multiband dynamic range compression General scheme of processing Spectral components of the input signal divided in multiple bands and the gain for each band calculated on the basis of signal power in that band. Parameters (band specific): compression threshold Th, compression ratio CR, attack & release time for detection.
Lippmann et al. (1980): 16-channel compression • 9% improvement in recognition score over linear amplification. • Asano et al.(1991): Multiband dynamic range compression realized as a single time-varying FIR filter & implemented on a 32-bit DSP fixed-point processor • Less spectral distortion due to smoothened frequency response of FIR filter. • Stone et al. (1999): Comparison of single and four-channel compression schemes & effect of varying CR, Th, and attack & release times • Intelligibility & quality tests showed no specific preference for schemes. • Li et al. (2000): Wavelet-based compression (7 octave sub-band analysis using wavelet filter bank & resynthesis after applying a logarithmic compression on the wavelet coefficients) • Increase in intelligibility without introducing noticeable distortions. • Magotraet al. (2000): Multiband dynamic range compression using a 16-bit fixed-point processor • Taylor's series approximation used for the compression function to reduce computations in gain calculation.
Disadvantages of multiband dynamic range compression • Spurious spectral distortions • Reduction in spectral contrasts and modulation depth • Distortion in spectral shape of formants lying across the band boundaries • Distortion of formant transitions across the adjacent bands • Time-varying magnitude response without corresponding variation in the phase response leading to quality degradation • → Audible distortions, perceptible discontinuities, adverse effect on the perception of certain speech cues
Example of distortion due to multiband dynamic range compression during spectral transition Swept sinusoidal input: constant amplitude, 125 –250 Hz linearly swept frequency, 200 ms sweep duration Time (s) Processed output: multiband compression with 18 auditory critical bands, CR = 30, Ta = 6.4 ms, Tr = 192 ms Time (s)
Research objective • Real-time dynamic range compression to compensate for frequency-dependent loudness recruitment associated with sensorineural hearing loss for use in hearing aids with a low-power processor. • Low distortions • Low computational complexity & memory requirement • Low signal delay (algorithmic + computational)
Sliding-band compression • Proposed for significantly reducing the temporal and spectral distortions associated with the currently used single-band and multiband compressions in hearing aids. • Realized with computational complexity acceptable for implementation on a 16-bit fixed-point DSP processor and signal delay acceptable for real-time application. • Investigations using offline & real-time implementations • Selection of processing parameters • Evaluation of the implementations • Informal listening, PESQ measure
2. Sliding-band Dynamic Range Compression Processing Applying a frequency-dependent gain function, with the gain for each spectral sample determined by the short-time power in auditory critical bandwidth centered at it & in accordance with the specified hearing thresholds, compression ratios, and attack and release times. Processing steps • Short-time spectral analysis: windowing, zero-padding, DFT calculation • Spectral modification: gain calculation, output spectrum calculation • Resynthesis: IDFT calculation, windowing, overlap-add
Short-time spectral analysis: windowing (length L, shift S), zero-padding, N-point DFT Spectral modification Pmc(k): Power at upper comfortable listening level CR(k): Compression ratio Resynthesis: N-point IDFT, overlap-add
Gain calculation • Auditory critical bandwidth • BW(k) = 25 + 75(1 + 1.4f 2)0.69, freq. sample = k, freq. = f • Target gain calculation • Power at upper comfortable listening level: Pmc(k) • Compression ratio: CR(k) • Input power: Pic(k), Output power: Poc(k) • Target gain: Gt(k) = Poc(k) / Pic(k) • Compression relation • dB scale: [Poc(k) / Pmc(k)]dB = [Pic(k) / Pmc(k)]dB / CR(k) • linear scale: Poc(k) / Pmc(k) = [Pic(k) / Pmc(k)]1/ CR(k) • Target gain for kth spectral sample • [Gt(k)]dB = [1 − 1 / CR(k)] [Pmc(k) / Pic(k)]dB
Gain calculation (contd.) Gain changed in steps from the previous value towards the target value with settable attack and release times Number of steps during attack phase = sa Number of steps during release phase = sr Target gain corresponding to min. input level = Gmax Target gain corresponding to max. input level = Gmin Gain ratio for attack phase γa = (Gmax/ Gmin)1/sa Gain ratio for release phase γr = (Gmax/ Gmin)1/sr Gain for ith window & kth spectral sample G(i,k) = max[G(i − 1,k) / γa, Gt(i,k)] for Gt(i,k) < G(i − 1,k) min[G(i − 1,k) γr, Gt(i,k)] for Gt(i,k) > G(i − 1,k) Attack time Ta = saS/ fs, Release time Tr = srS/ fs [fs= sampling freq., S = window shift] Fast attack: to avoid the output level from exceeding UCL during transients Slow release: to avoid the pumping effect or amplification of breathing
Implementation related challenges • Modifications in the short-time magnitude spectrum without corresponding changes in the phase spectrum can cause audible distortions. • Computational complexity: log or series approximation based gain calculations not suitable for use in sliding-band compression. Solution • Analysis-synthesis using least-square error based signal estimation from modified STFT (Griffin & Lim, 1984): Processing artifacts reduced by masking the effect of phase discontinuities in the modified short-time complex spectrum. • Look-up table based gain calculation: Two-dimensional look-up table relating the input power with gain as a function of frequency. • Reduces computations for real-time implementation. • Permits compression function most suited to compensate for the abnormal loudness growth.
3. Offline & Real-time Implementations • Implementation for offline processing • Implementation using Matlab 7.10 for evaluating the performance of the proposed technique and the effect of processing parameters. • Processing parameters • ◦ fs= 10 kHz ◦ Frame length = 25.6 ms (L = 256) • ◦ Overlap = 75% (S = 64) ◦ FFT size N =512 • 2D look-up table for frequency-dependent compression based on a linear relation between input-dB and output-dB, with settable CR(k) and Pmc(k). • ◦ Input range: 20 log intervals (trade-off: small gain increments, look-up table size). • ◦ Look-up table with 256×20 entries • Attack and release times • ◦ sa=1, Ta = 6.4 ms: Fast attack to avoid uncomfortable level during transients • ◦ sr=30, Tr = 192 ms: Slow release to avoid pumping & amplification of breathing
Implementation for real-time processing • Implementation on a 16-bit fixed-point DSP board to examine suitability of the technique for use in hearing aids. • DSP chip: TI/TMS320C5515 • 16 MB memory space (320 KB on-chip RAM with 64 KB dual access, 128 KB on-chip ROM) • Three 32-bit programmable timers • 4 DMA controllers each with 4 channels • FFT hardware accelerator (up to 1024-point FFT) • Max. clock speed: 120 MHz • DSP Board: eZdsp • 4 MB on-board NOR flash for user program • Stereo codec TLV320AIC3204: 16/20/24/32-bit ADC & DAC, 8 – 192 kHz sampling • Software development: C using TI's 'CCStudio ver. 4.0
Implementation • Input-output operations: DMA based I/O with cyclic buffers • ADC and DAC: one codec (left channel) with 16-bit quantization • Processing parameters (same as for offline processing): fs= 10 kHz, L = 256, S = 64, N =512 • Data representation (input samples, spectral values, processed samples): 16-bit real & 16-bit imaginary
Data transfers & buffering operations (S = L/4) • DMA cyclic buffers • 5-block S-sample input buffer • 2-block S-sample output buffer • Pointers • Current input block • Just-filled input block • Current output block • Write-to output block • (incremented cyclically on DMA interrupt) • Signal delay • Algorithmic: 1 frame (25.6 ms) • Computational ≤ frame shift (6.4 ms)
4. Test Results • Tests for verification and evaluation • Offline processing • Verification of the compression technique for speech input with a large level variation and examination of the effect of different set of processing parameters. • Assessment of output speech quality (using informal listening) for different input speech materials and time varying levels. • Comparison of distortions introduced by different compression techniques during spectral transitions. • Real-time processing • Comparison of the processed outputs from offline & real-time implementation: informal listening, PESQ measure (0 – 4.5). • Signal delay & computational requirement.
Results from offline processing Example: "you will mark ut please" concatenated with scaling factors for variation in the input level. CR = 2, Ta = 6.4 ms, Tr = 6.4 & 192 ms. Input waveform Scaling factor Unprocessed waveform Processed Tr= 6.4 ms, low Pmc Processed Tr= 192 ms, low Pmc Processed Tr= 6.4 ms, highPmc Processed Tr= 192 ms, highPmc Time (s) • Processing of different speech materials with varying levels: No audible roughness or distortion during informal listening.
Distortions during spectral transitions: Example of swept sinusoidal input. Input: constant amplitude, 125 –250 Hz linearly swept frequency, 200 ms sweep duration Single-band compression output Multiband compression (18 auditory critical bands) output Sliding band compression output CR = 30, Ta = 6.4 ms, Tr = 192 ms. Time (s)
Results from real-time processing Example: "you will mark ut please" concatenated with scaling factors for variation in the input level. CR = 2, Ta = 6.4 ms, Tr = 192 ms, low Pmc. Unprocessed waveform Offline processed waveform Real-time processed waveform Time (s) Informal listening: real-time output perceptually similar to the offline output PESQ for real-time w.r.t. offline : 3.5 Signal delay = 36 ms Use of processing capacity: 41% (lowest proc. clock for satisfactory operation = 50 MHz, max. clock = 120 MHz)
5. Summary & Conclusions • Sliding-band dynamic range compression presented to compensate for frequency-dependent loudness recruitment associated with sensorineural hearing loss without introducing the distortions associated with single-band & multiband compression. • Realized using modified fixed-frame analysis-synthesis for low computational complexity & without distortions associated with phase discontinuities. • Suitable for speech & non-speech audio & provision for settable attack time, release time, & compression ratios. • Implemented using 16-bit fixed-point DSP chip & tested for satisfactory operation: 36 ms signal delay, 41% use of processing capacity, indicating scope for combination with other processing techniques. • Further work • Evaluation of speech perception by hearing impaired listeners. • Implementation in combination with other techniques (spectral subtraction, multiband frequency compression, etc.) & evaluation.
National Conference on Communications, 28th Feb. to 2nd Mar., 2014, Kanpur, India (NCC 2014) • A Sliding-band Dynamic Range Compression for Use in Hearing Aids • NityaTiwari and Prem C. Pandey • Dept. of Electrical Engineering • IIT Bombay, Mumbai, India • Email: { nitya, pcpandey } @ ee.iitb.ac.in • Abstract— Sensorineural hearing loss is associated with elevated hearing thresholds, reduced dynamic range, and loudness recruitment. Dynamic range compression in the hearing aids is provided for restoring normal loudness of low level sounds without making the high level sounds uncomfortably loud. A sliding-band compression is presented for significantly reducing the temporal and spectral distortions generally associated with the currently used single and multiband compression techniques. It uses a frequency-dependent gain function calculated on the basis of critical bandwidth based short-time power spectrum and the specified hearing thresholds, compression ratios, and attack and release times. It is realized using FFT-based analysis-synthesis and can be integrated with other FFT-based signal processing in hearing aids to save computation. The technique is implemented and tested for satisfactory real-time operation, with sampling frequency of 10 kHz, window length of 25.6 ms with 75% overlap on a 16-bit fixed-point DSP processor with on-chip FFT hardware.
References [1] H. Levitt, J. M. Pickett, and R. A. Houde, Eds., Senosry Aids for the Hearing Impaired. New York: IEEE Press, 1980. [2] B. C. J. Moore, An Introduction to the Psychology of Hearing, London, UK: Academic, 1997, pp 66–107. [3] S. A. Gelfand, Hearing: An Introduction to Psychological and Physiological Acoustics, 3rd ed., New York: Marcel Dekker, 1998, pp. 314–318 [4] P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Binaural dichotic presentation to reduce the effects of spectral masking in moderate bilateral sensorineural hearing loss,” Int. J. Audiol., vol. 51, no. 4, pp. 334–344, 2012. [5]J. Yang, F. Luo, and A. Nehorai, “Spectral contrast enhancement: Algorithms and comparisons,” Speech Commun., vol. 39, no. 1–2, pp. 33–46, 2003. [6] T. Arai, K. Yasu, and N. Hodoshima, “Effective speech processing for various impaired listeners,” Proc. 18th Int. Congr. Acoust., Kyoto, Japan, 2004, pp. 1389–1392. [7] P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Multiband frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss,” Speech Commun., vol. 54, no. 3 pp. 341–350, 2012. [8] N. Tiwari, P. C. Pandey, and P. N. Kulkarni, “Real-time implementation of multi-band frequency compression for listeners with moderate sensorineural impairment,” in Proc. Interspeech 2012, Portland, Oregon, 2012, paper no. 689. [9] P. C. Loizou, Speech Enhancement: Theory and Practice. New York: CRC, 2007. [10] S. K. Waddi, P. C. Pandey, and N. Tiwari, “Speech enhancement using spectral subtraction and cascaded-median based noise estimation for hearing impaired listeners,” in Proc. Nat. Conf. Commun. 2013, New Delhi, India, doi: 10.1109/NCC.2013. 6487989. [11] H. Dillon, Hearing Aids. New York: Thieme Medical Publisher, 2001. [12] R. E. Sandlin, Textbook of Hearing Aid Amplification, San Diego, Cal.: Singular 2000, pp. 210–220. [13] L. D.Braida, N. I. Durlach, R. P. Lippmann, B. L. Hicks, W. M. Rabinowitz, and C. M. Reed, “Hearing aids–a review of past research on linear amplification, amplitude compression, and frequency lowering,” Journal of the American Speech and Hearing Association Monographs 19, pp. 1–114, 1979.
[14] R. P. Lippmann, L . D. Braida, and N. I. Durlach, " Study of multichannel amplitude compression and linear amplification for persons with sensorineural hearing loss," J. Acoust. Soc. Am., vol. 69,no. 2, pp. 524–534, 1981. [15] F. Asano, Y. Suzuki, T. Sone, S. Kakehata, M. Satake, K. Ohyama, T. Kobayashi, and T. Takasaka, “A digital hearing aid that compensates for sensorineural impaired listeners,” in Proc. IEEE ICASSP, pp. 3625 – 3628, 1991. [16] M. A. Stone, B. C. Moore, J. I. Alcántara, and B. R. Glasberg, "Comparison of different forms of compression using wearable digital hearing aids," J. Acoust. Soc. Am., vol. 106, no. 6, pp. 3603–3619, 1999. [17] M. Li, H. G. McAllister, N. D. Black, and T. A. Perez, “ Wavelet based non-linear AGC method for hearing aid loudness compensation,” in Proc. IEE Vision, Image and Signal Proc., vol. 147, no. 6, pp. 502-507, 2000. [18] E. Zwicker, “Subdivision of the audible frequency range into critical bands (Freqenzgruppen),” J. Acoust. Soc. Am., vol. 33, no. 2, pp. 248, 1961. [19] J. W. Picone, “Signal modeling techniques in speech recognition, ” in Proc. IEEE, vol. 81, no. 9, pp. 1512 – 1547, 1993. [20] N. Magotra, S. Kamath, F. Livingston, and M.Ho, “Development and fixed-poinot implementation of a multiband dynamic range compression (MDRC) algorithm, in Proc. ACSSC, vol. 1, pp. 428-432, 2000. [21] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. Acoustics, Speech, Signal Proc., vol. 32, no. 2, pp. 236 – 243, 1984. [22] Texas Instruments, Inc., “TMS320C5515 Fixed-Point Digital Signal Processor,” 2011, [online] Available: focus.ti.com/lit/ds/symlink/ tms320c5515.pdf. [23] Spectrum Digital, Inc., “TMS320C5515 eZdsp USB Stick Technical Reference,” 2010, [online] Available: support.spectrumdigital.com/ boards/usbstk5515/reva/files/usbstk5515_TechRef_RevA.pdf [24] Texas Instruments, Inc., “TLV320AIC3204 Ultra Low Power Stereo Audio Codec,” 2008, [online] Available: focus.ti.com/lit/ds/ symlink/tlv320aic3204.pdf. [25] ITU, “Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs,” ITU-T Rec., P.862, 2001. [26] N. Tiwari, “Dynamic range compression results”, 2014, [online] Available: www.ee.iitb.ac.in/~spilab/material/nitya/ncc2014.