1 / 22

Outline

Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment. [ Ref .: N. Tiwari , P. C. Pandey , P. N. Kulkarni , Proc. Interspeech 2012, Portland, Oregon, 9-13 Sept 2012, Paper 689]. Outline. Introduction Signal processing

hank
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real-time Implementation of Multi-band Frequency Compressionfor Listeners with Moderate Sensorineural Impairment [Ref.: N. Tiwari, P. C. Pandey, P. N. Kulkarni, Proc. Interspeech 2012, Portland, Oregon, 9-13 Sept 2012, Paper 689]

  2. Outline • Introduction • Signal processing • Real-time implementation • Results • Conclusion

  3. 1. Introduction Signal processing for reducing the effects of increased intra-speech spectral masking • Binaural hearing (moderate bilateral loss) – Binaural dichotic presentation (Lunneret al. 1993, Kulkarniet al. 2012) • Monaural hearing – Spectral contrast enhancement (Yang et al. 2003) Errors in identifying peaks, increases dynamic range >> degraded speech perception – Multi-band frequency compression (Arai et al. 2004, Kulkarniet al. 2012) Multi-band frequency compression Presentation of speech energy in relatively narrow bands to avoid masking by adjacent spectral components, and without causing perceptible distortions.

  4. Research objective Implementation of multi-band frequency compression for real-time operation for use in hearing aids • Reduction in computational requirement for implementing on a DSP chip without speech quality degradation • Low algorithmic and computational delay for real-time implementation on a DSP board with16-bit fixed-point processor

  5. 2. Signal Processing Signal processing for multi-band frequency compression Processing steps • Segmentation and spectral analysis ● Spectral modification • Resynthesis Multi-band frequency compression (Arai et al. 2004) • Processing using auditory critical bandwidth • Compressed magnitude & original phase spectrum >> Complex spectrum Multi-band frequency compression (Kulkarniet al. 2009, 2012) • Investigations using different bandwidth, segmentation & frequency mapping • Processing using complex spectrum • Reduced computation and processing artifacts

  6. Multi-band frequency compression(Kulkarni et al. 2009, 2012) Signal processing • Segmentation using analysis window with 50 % overlap – fixed-frame (FF) segmentation : window length L = 20 ms • pitch-synchronous (PS) segmentation: L = 2 pitch periods / prev. L (voiced / unvoiced) • Zero padding >> N-point frame • Complex spectra by N-point DFT • Analysis bands for compression – fixed bandwidth • one-third octave • auditory critical band (ACB) • Mappings for spectral modification – sample-to-sample • spectral sample superposition • spectral segment • Resynthesis by overlap-add method Frequency mapping with compression factor α = 0.6 as shown by Kulkarniet al. 2012

  7. /i/ BB noise (a) Unprocessed (b) Processed Details of signal processing Analysis band for compression • 18 bands based on ACB Mapping for spectral modification • Spectral segment mapping No irregular variations in spectrum Resynthesis • Resynthesis by N-point IDFT • Reconstruction by overlap-add method Spectral Segment Mapping as shown by Kulkarniet al. 2012 Unprocessed and processed (α =0.6) spectra of vowel /i/ and broad band noise

  8. Evaluation and results • Modified Rhyme Test (MRT) Best results with PS segmentation, ACB, spectral segment mapping & compression factor of 0.6 • Normal hearing subjects with loss simulated by additive noise – 17 % improvement in recognition scores – 0.88 s decrease in response time • Subjects with moderate sensorineural loss – 16.5 % improvement in recognition scores – 0.89 s decrease in response time

  9. Real-time implementation requirements Comparison of processing schemes with different segmentation • FF processing: Fixed-frame segmentation • Discontinuities masked by 50 % overlap-add • Perceptible distortions in form of another superimposed pitch related to shift duration • PS Processing: Pitch-synchronous segmentation • Less perceptible distortions • Pitch estimation (large computational requirement) • Not suitable for music Modified segmentation scheme for real-time implementation • Lower perceptual distortions • Lower computational requirement

  10. Modified multi-band frequency compression (LSEE processing) • Griffin and Lim’s method of signal estimation from modified STFT • Minimize mean square error between estimated and modified STFT • Multiplication by analysis window before overlap-add • Window requirement • For partial overlap only a few windows meet the requirement • Fixed window length L and shift S, with S = L/4 (75% overlap) • Modified Hamming window where p = 0.54 and q = -0.46

  11. a) b) c) d) Investigations using offline implementation • Comparison of analysis-synthesis techniques for multi-band frequency compression • Pitch-synchronous implementation • LSEE method based implementation • Evaluation • Informal listening • Objective evaluation using PESQ measure (0 – 4.5) • Test material Sentence “ Where were you a year ago?” from a male speaker, vowels /a/-/i/-/u/, music • Results PESQ score (compression factor range 1 ― 0.6): 4.3 ― 3.7 (sentence/ vowels) Comparison of analysis-synthesis methods: (a) unprocessed, (b) FF, (c) PS, and (d) LSEE. Processing with spectral segment mapping, auditory critical bandwidth, α = 0.6.

  12. Processed outputs from off-line implementation • Conclusion • LSEE processing suited for non-speech and audio applications

  13. 3. Real-time Implementation • 16-bit fixed point DSP: TI/TMS320C5515 • 16 MB memory space : 320 KB on-chip RAM with 64 KB DARAM, 128 KB on-chip ROM • Three 32-bit programmable timers, 4 DMA controllers each with 4 channels • FFT hardware accelerator (8 to 1024-point FFT) Max. clock speed: 120 MHz • DSP Board: eZdsp • 4 MB on-board NOR flash for user program • Codec TLV320AIC3204: stereo ADC & DAC, 16/20/24/32-bit quantization , 8 – 192 kHz sampling • Development environment for C: TI's 'CCStudio, ver. 4.0‘

  14. Signal acquisition & processing Implementation details • ADC & DAC quantization: 16-bit ●Sampling rate:10 kHz • FFT length N:1024 / 512 • Analysis-synthesis : LSEE-based fixed frame, 260-sample (26 ms) window, 75% overlap • Mapping for spectral modification: spectral segment mapping with auditory critical bands • Input samples, spectral values & processed output: 32-bits each with 16-bit real and imaginary parts

  15. Data transfer and buffering operations (S = L/4) • DMA cyclic buffers • 5 block input buffer • 2 block output buffer (each of S samples) • Four pointers • current input block • current output block • just-filled input block • write-to output block (Pointers incremented cyclically on DMA interrupt) • Efficient realization of 75 % overlap & zero padding • Total delay • Algorithmic delay:L = 4S samples (26 ms) • Computational delay: L/4 = S samples (6.5 ms)

  16. 4. Results Comparison of proc. output from offline-implementation & DSP board • Test material Sentence “ Where were you a year ago?”, vowels /a/-/i/-/u/, music • Evaluation methods • Informal listening ●PESQ measure (0 – 4.5) • Results • Lowest clock for satisfactory operation: 20 MHz / 12 MHz (N= 1024/512) • No perceptible change in output with N = 512, processing capacity used ≈ 1/10 of the capacity with highest clock (120 MHz) • Informal listening: Processed output from DSP similar to corresponding output from offline implementation • PESQ scores (compression factor range 0.6 ― 1) For sentence 2.5 ― 3.4 & for vowels 3.0 ― 3.4 • Total processing delay: 35 ms

  17. Processed outputs from DSP board

  18. 5. Conclusion • Multi-band frequency compression using LSEE processing -- Processed speech output perceptually similar to that from earlier established pitch-synchronous processing. -- Suitable for speech as well as non-speech audio signals. -- Well-suited for real-time processing due to use of fixed-frame segmentation. • Implementation for real-time operation using 16-bit fixed-point processor TI/TMS320C5515: used-up processing capacity ≈ 1/10, delay = 35 ms • Further work Combining multi-band frequency compression with other types of processing for use in hearing aids • Frequency-selective gain • Multi-band dynamic range compression (settable gain and compression ratios) • Noise reduction • CVR modification

  19. THANK YOU

  20. Abstract Widening of auditory filters in persons with sensorineural hearing impairment leads to increased spectral masking and degraded speech perception. Multi-band frequency compression of the complex spectral samples using pitch-synchronous processing has been reported to increase speech perception by persons with moderate sensorineural loss. It is shown that implementation of multi-band frequency compression using fixed-frame processing along with least-squares error based signal estimation reduces the processing delay and the speech output is perceptually similar to that from pitch-synchronous processing. The processing is implemented on a DSP board based on the 16-bit fixed-point processor TMS320C5515, and real-time operation is achieved using about one-tenth of its computing capacity.

  21. References [1]H. Levitt, J. M. Pickett, and R. A. Houde (eds.), Senosry Aids for the Hearing Impaired. New York: IEEE Press, 1980. • [2]B. C. J. Moore, An Introduction to the Psychology of Hearing, London, UK: Academic, 1997, pp 66–107. • [3] J. M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Boston, Mass.: Allyn Bacon, 1999, pp. 289–323. • [4] H. Dillon, Hearing Aids. New York: Thieme Medical, 2001. [5] T. Lunner, S. Arlinger, and J. Hellgren, “8-channel digital filter bank for hearing aid use: preliminary results in monaural, diotic, and dichotic modes,” Scand. Audiol. Suppl., vol. 38, pp. 75–81, 1993. • [6]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Binaural dichotic presentation to reduce the effects of spectral masking in moderate bilateral sensorineural hearing loss”, Int. J. Audiol., vol. 51, no. 4, pp. 334–344, 2012. • [7] T. Baer, B. C. J. Moore, and S. Gatehouse, “Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: effects on intelligibility, quality, and response times”, Int. J. Rehab. Res., vol. 30, no. 1, pp. 49–72, 1993. • [8]J. Yang, F. Luo, and A. Nehorai, “Spectral contrast enhancement: Algorithms and comparisons,” Speech Commun., vol. 39, no. 1–2, pp. 33–46, 2003. [9] T. Arai, K. Yasu, and N. Hodoshima, “Effective speech processing for various impaired listeners,” in Proc. 18th Int. Cong. Acoust., 2004, Kyoto, Japan, pp. 1389–1392. • [10]K. Yasu, M. Hishitani, T. Arai, and Y. Murahara, “Critical-band based frequency compression for digital hearing aids,” Acoustical Science and Technology, vol. 25, no. 1, pp. 61-63, 2004. • [11] P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Multi-band frequency compression for reducing the effects of spectral masking,” Int. J. Speech Tech., vol. 10, no. 4, pp. 219–227, 2009. • [12]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss,” Speech Commun., vol. 54, no. 3 pp. 341–350, 2012. • [13]E. Zwicker, “Subdivision of the audible frequency range into critical bands (Freqenzgruppen),” J. Acoust. Soc. Am., vol. 33, no. 2, pp. 248, 1961. • [14] D. G. Childers and H. T. Hu, “Speech synthesis by glottal excited linear prediction”, J. Acoust. Soc. Am., vol. 96, no. 4, pp. 2026–2036, 1994.

  22. [15] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. Acoustics, Speech, Signal Proc., vol. 32, no. 2, pp. 236-243, 1984. [16] X. Zhu, G. T. Beauregard, and L. L. Wyse, “Real-time signal estimation from modified short-time Fourier transform magnitude spectra”, IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 5, pp. 1645–1653, 2007. [17] Texas Instruments Inc., TMS320C5515 Fixed-Point Digital Signal Processor. 2011, [online] Available: http://focus.ti.com/ lit/ds/symlink/tms320c5515.pdf. [18] Texas Instruments Inc., TLV320AIC3204 Ultra Low Power Stereo Audio Codec. 2008, [online] Available: http://focus. ti.com/lit/ds/symlink/tlv320aic3204.pdf.

More Related