220 likes | 359 Views
Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment. [ Ref .: N. Tiwari , P. C. Pandey , P. N. Kulkarni , Proc. Interspeech 2012, Portland, Oregon, 9-13 Sept 2012, Paper 689]. Outline. Introduction Signal processing
E N D
Real-time Implementation of Multi-band Frequency Compressionfor Listeners with Moderate Sensorineural Impairment [Ref.: N. Tiwari, P. C. Pandey, P. N. Kulkarni, Proc. Interspeech 2012, Portland, Oregon, 9-13 Sept 2012, Paper 689]
Outline • Introduction • Signal processing • Real-time implementation • Results • Conclusion
1. Introduction Signal processing for reducing the effects of increased intra-speech spectral masking • Binaural hearing (moderate bilateral loss) – Binaural dichotic presentation (Lunneret al. 1993, Kulkarniet al. 2012) • Monaural hearing – Spectral contrast enhancement (Yang et al. 2003) Errors in identifying peaks, increases dynamic range >> degraded speech perception – Multi-band frequency compression (Arai et al. 2004, Kulkarniet al. 2012) Multi-band frequency compression Presentation of speech energy in relatively narrow bands to avoid masking by adjacent spectral components, and without causing perceptible distortions.
Research objective Implementation of multi-band frequency compression for real-time operation for use in hearing aids • Reduction in computational requirement for implementing on a DSP chip without speech quality degradation • Low algorithmic and computational delay for real-time implementation on a DSP board with16-bit fixed-point processor
2. Signal Processing Signal processing for multi-band frequency compression Processing steps • Segmentation and spectral analysis ● Spectral modification • Resynthesis Multi-band frequency compression (Arai et al. 2004) • Processing using auditory critical bandwidth • Compressed magnitude & original phase spectrum >> Complex spectrum Multi-band frequency compression (Kulkarniet al. 2009, 2012) • Investigations using different bandwidth, segmentation & frequency mapping • Processing using complex spectrum • Reduced computation and processing artifacts
Multi-band frequency compression(Kulkarni et al. 2009, 2012) Signal processing • Segmentation using analysis window with 50 % overlap – fixed-frame (FF) segmentation : window length L = 20 ms • pitch-synchronous (PS) segmentation: L = 2 pitch periods / prev. L (voiced / unvoiced) • Zero padding >> N-point frame • Complex spectra by N-point DFT • Analysis bands for compression – fixed bandwidth • one-third octave • auditory critical band (ACB) • Mappings for spectral modification – sample-to-sample • spectral sample superposition • spectral segment • Resynthesis by overlap-add method Frequency mapping with compression factor α = 0.6 as shown by Kulkarniet al. 2012
/i/ BB noise (a) Unprocessed (b) Processed Details of signal processing Analysis band for compression • 18 bands based on ACB Mapping for spectral modification • Spectral segment mapping No irregular variations in spectrum Resynthesis • Resynthesis by N-point IDFT • Reconstruction by overlap-add method Spectral Segment Mapping as shown by Kulkarniet al. 2012 Unprocessed and processed (α =0.6) spectra of vowel /i/ and broad band noise
Evaluation and results • Modified Rhyme Test (MRT) Best results with PS segmentation, ACB, spectral segment mapping & compression factor of 0.6 • Normal hearing subjects with loss simulated by additive noise – 17 % improvement in recognition scores – 0.88 s decrease in response time • Subjects with moderate sensorineural loss – 16.5 % improvement in recognition scores – 0.89 s decrease in response time
Real-time implementation requirements Comparison of processing schemes with different segmentation • FF processing: Fixed-frame segmentation • Discontinuities masked by 50 % overlap-add • Perceptible distortions in form of another superimposed pitch related to shift duration • PS Processing: Pitch-synchronous segmentation • Less perceptible distortions • Pitch estimation (large computational requirement) • Not suitable for music Modified segmentation scheme for real-time implementation • Lower perceptual distortions • Lower computational requirement
Modified multi-band frequency compression (LSEE processing) • Griffin and Lim’s method of signal estimation from modified STFT • Minimize mean square error between estimated and modified STFT • Multiplication by analysis window before overlap-add • Window requirement • For partial overlap only a few windows meet the requirement • Fixed window length L and shift S, with S = L/4 (75% overlap) • Modified Hamming window where p = 0.54 and q = -0.46
a) b) c) d) Investigations using offline implementation • Comparison of analysis-synthesis techniques for multi-band frequency compression • Pitch-synchronous implementation • LSEE method based implementation • Evaluation • Informal listening • Objective evaluation using PESQ measure (0 – 4.5) • Test material Sentence “ Where were you a year ago?” from a male speaker, vowels /a/-/i/-/u/, music • Results PESQ score (compression factor range 1 ― 0.6): 4.3 ― 3.7 (sentence/ vowels) Comparison of analysis-synthesis methods: (a) unprocessed, (b) FF, (c) PS, and (d) LSEE. Processing with spectral segment mapping, auditory critical bandwidth, α = 0.6.
Processed outputs from off-line implementation • Conclusion • LSEE processing suited for non-speech and audio applications
3. Real-time Implementation • 16-bit fixed point DSP: TI/TMS320C5515 • 16 MB memory space : 320 KB on-chip RAM with 64 KB DARAM, 128 KB on-chip ROM • Three 32-bit programmable timers, 4 DMA controllers each with 4 channels • FFT hardware accelerator (8 to 1024-point FFT) Max. clock speed: 120 MHz • DSP Board: eZdsp • 4 MB on-board NOR flash for user program • Codec TLV320AIC3204: stereo ADC & DAC, 16/20/24/32-bit quantization , 8 – 192 kHz sampling • Development environment for C: TI's 'CCStudio, ver. 4.0‘
Signal acquisition & processing Implementation details • ADC & DAC quantization: 16-bit ●Sampling rate:10 kHz • FFT length N:1024 / 512 • Analysis-synthesis : LSEE-based fixed frame, 260-sample (26 ms) window, 75% overlap • Mapping for spectral modification: spectral segment mapping with auditory critical bands • Input samples, spectral values & processed output: 32-bits each with 16-bit real and imaginary parts
Data transfer and buffering operations (S = L/4) • DMA cyclic buffers • 5 block input buffer • 2 block output buffer (each of S samples) • Four pointers • current input block • current output block • just-filled input block • write-to output block (Pointers incremented cyclically on DMA interrupt) • Efficient realization of 75 % overlap & zero padding • Total delay • Algorithmic delay:L = 4S samples (26 ms) • Computational delay: L/4 = S samples (6.5 ms)
4. Results Comparison of proc. output from offline-implementation & DSP board • Test material Sentence “ Where were you a year ago?”, vowels /a/-/i/-/u/, music • Evaluation methods • Informal listening ●PESQ measure (0 – 4.5) • Results • Lowest clock for satisfactory operation: 20 MHz / 12 MHz (N= 1024/512) • No perceptible change in output with N = 512, processing capacity used ≈ 1/10 of the capacity with highest clock (120 MHz) • Informal listening: Processed output from DSP similar to corresponding output from offline implementation • PESQ scores (compression factor range 0.6 ― 1) For sentence 2.5 ― 3.4 & for vowels 3.0 ― 3.4 • Total processing delay: 35 ms
5. Conclusion • Multi-band frequency compression using LSEE processing -- Processed speech output perceptually similar to that from earlier established pitch-synchronous processing. -- Suitable for speech as well as non-speech audio signals. -- Well-suited for real-time processing due to use of fixed-frame segmentation. • Implementation for real-time operation using 16-bit fixed-point processor TI/TMS320C5515: used-up processing capacity ≈ 1/10, delay = 35 ms • Further work Combining multi-band frequency compression with other types of processing for use in hearing aids • Frequency-selective gain • Multi-band dynamic range compression (settable gain and compression ratios) • Noise reduction • CVR modification
Abstract Widening of auditory filters in persons with sensorineural hearing impairment leads to increased spectral masking and degraded speech perception. Multi-band frequency compression of the complex spectral samples using pitch-synchronous processing has been reported to increase speech perception by persons with moderate sensorineural loss. It is shown that implementation of multi-band frequency compression using fixed-frame processing along with least-squares error based signal estimation reduces the processing delay and the speech output is perceptually similar to that from pitch-synchronous processing. The processing is implemented on a DSP board based on the 16-bit fixed-point processor TMS320C5515, and real-time operation is achieved using about one-tenth of its computing capacity.
References [1]H. Levitt, J. M. Pickett, and R. A. Houde (eds.), Senosry Aids for the Hearing Impaired. New York: IEEE Press, 1980. • [2]B. C. J. Moore, An Introduction to the Psychology of Hearing, London, UK: Academic, 1997, pp 66–107. • [3] J. M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Boston, Mass.: Allyn Bacon, 1999, pp. 289–323. • [4] H. Dillon, Hearing Aids. New York: Thieme Medical, 2001. [5] T. Lunner, S. Arlinger, and J. Hellgren, “8-channel digital filter bank for hearing aid use: preliminary results in monaural, diotic, and dichotic modes,” Scand. Audiol. Suppl., vol. 38, pp. 75–81, 1993. • [6]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Binaural dichotic presentation to reduce the effects of spectral masking in moderate bilateral sensorineural hearing loss”, Int. J. Audiol., vol. 51, no. 4, pp. 334–344, 2012. • [7] T. Baer, B. C. J. Moore, and S. Gatehouse, “Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: effects on intelligibility, quality, and response times”, Int. J. Rehab. Res., vol. 30, no. 1, pp. 49–72, 1993. • [8]J. Yang, F. Luo, and A. Nehorai, “Spectral contrast enhancement: Algorithms and comparisons,” Speech Commun., vol. 39, no. 1–2, pp. 33–46, 2003. [9] T. Arai, K. Yasu, and N. Hodoshima, “Effective speech processing for various impaired listeners,” in Proc. 18th Int. Cong. Acoust., 2004, Kyoto, Japan, pp. 1389–1392. • [10]K. Yasu, M. Hishitani, T. Arai, and Y. Murahara, “Critical-band based frequency compression for digital hearing aids,” Acoustical Science and Technology, vol. 25, no. 1, pp. 61-63, 2004. • [11] P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Multi-band frequency compression for reducing the effects of spectral masking,” Int. J. Speech Tech., vol. 10, no. 4, pp. 219–227, 2009. • [12]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss,” Speech Commun., vol. 54, no. 3 pp. 341–350, 2012. • [13]E. Zwicker, “Subdivision of the audible frequency range into critical bands (Freqenzgruppen),” J. Acoust. Soc. Am., vol. 33, no. 2, pp. 248, 1961. • [14] D. G. Childers and H. T. Hu, “Speech synthesis by glottal excited linear prediction”, J. Acoust. Soc. Am., vol. 96, no. 4, pp. 2026–2036, 1994.
[15] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. Acoustics, Speech, Signal Proc., vol. 32, no. 2, pp. 236-243, 1984. [16] X. Zhu, G. T. Beauregard, and L. L. Wyse, “Real-time signal estimation from modified short-time Fourier transform magnitude spectra”, IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 5, pp. 1645–1653, 2007. [17] Texas Instruments Inc., TMS320C5515 Fixed-Point Digital Signal Processor. 2011, [online] Available: http://focus.ti.com/ lit/ds/symlink/tms320c5515.pdf. [18] Texas Instruments Inc., TLV320AIC3204 Ultra Low Power Stereo Audio Codec. 2008, [online] Available: http://focus. ti.com/lit/ds/symlink/tlv320aic3204.pdf.