220 likes | 359 Views
ICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia [Wed, 25 th Aug, R. 201 Speech processing & communication systems 2, 15.40] Enhancement of Electrolaryngeal Speech by Spectral Subtraction, Spectral Compensation, and Introduction of Jitter and Shimmer
E N D
ICA 2010 : 20th Int. Congress on Acoustics, 23-27 August 2010, Sydney, Australia [Wed, 25th Aug, R. 201 Speech processing & communication systems 2, 15.40] Enhancement of Electrolaryngeal Speech by Spectral Subtraction, Spectral Compensation, and Introduction of Jitter and Shimmer Prem C. Pandey S. Khadar Basha {pcpandey, basha}ee.iitb.ac.in http://www.ee.iitb.ac.in/~spilab IIT Bombay, India
OVERVIEW1. Introduction2. Spectral subtraction3. Estimation of noise spectrum4. Jitter, shimmer, & spectral compensation5. Results 6. Conclusion
1 INTRODUCTION Glottal excitation to vocal tract Intro. 1/4 Natural speech
Intro. 2/4 Electrolaryngeal speech Excitation to vocal tract from external vibrator
Intro. 3/4 Problems with electrolarynx • Dynamic control of level, voicing, & pitch not feasible • Background noise due to leakage of acoustic energy, affecting the intelligibility • Unnatural quality due to ▪ Low frequency spectral deficit ▪ Constant pitch & level
Intro. 4/4 Methods of noise reduction • Acoustic shielding of vibrator(Epsy-Wilson et al1996) • 2-input noise cancellation based on LMS algorithm ( Epsy-Wilson et al1996) • Single input noise cancellation using spectral subtraction ▪ Averaging based noise est. & pitch-synch. generalized spectral subtraction (Pandey et al 2002) ▪ Quantile based noise estimation (Pandey et al 2004) ▪ Parameter adaptation using freq.domain auditory masking (Liu et al 2006) ▪ Min.statistics based noise estimation (Mitra & Pandey 2006, Kabir et al 2008)
Spec. sub 1/5 2 SPECTRAL SUBTRACION Noise generation (Pandey et al 2002) • Leakage of vibrations produced by vibrator membrane • Improper coupling of vibrations to the neck tissue
Spec. sub 2/5 s(n) = e(n)*hv(n), l(n) = e(n)*hl(n), x(n) = s(n) + l(n) Xn(ej) = En(ej) [Hvn(ej) + Hln(ej)] • Assuming hv(n) & hl(n) to be uncorrelated Xn(ej)2 = En(ej)2[Hvn(ej)2 + Hln(ej)2] • For short-time spectra calculated using pitch-synchronous window, En(ej)2 may be considered as constant E(ej)2 • During non-speech intervals, s(n) will be negligible, Xn(ej)2 =|Ln(ej)|2 = |E(ej)|2 |Hln(ej)|2
Spec. sub 3/5 Generalized spectral subtraction(Berouti et al 1979) using FFT E(k)= | Xn(k)|γ - α|Ln(k)|γ Clean mag. spectrum |Y n(k)|= [E(k)](1/ γ), if E(k) > [β|Ln(k)|]γ β|Ln(k)|, otherwise ( : subtraction, : spectral floor, : subtraction power) yn(m) = IDFT [ Yn(k) ejθn(k)]
Spec. sub 4/5 Phase estimation ▪ Noisy phase : θn(k) =Xn(k) ▪ Zero Phase : θn(k) = 0 ▪ Random phase: θn(k) = r (uniformly distr over [0, 2π] ▪ Min. phase calculation iterative tech. (Quatieri and Oppenheim 1981), cepstrum based non-iterative calculation (Oppenheim & Schafer 1975, Rabiner & Schafer 1978, Yegnanarayana & Dhayalan 1981) ▪ Phase set for continuity across the frames θn(k) = θn-1(k) + (2πndk)/N where nd = window shift, N = FFT size ▪ Noisy phase resulted in better quality than others
Spec. sub 5/5 Block diagram of spectral subtraction
Est. noise 1/1 3 ESTIMATION OF NOISE SPECTRUM • Variation in noise due to change in the electrolarynx orientation • Voice activity detection difficult in electrolaryngeal speech • Averaging based noise est. (Pandey et al 2002) unsuitable for long term use • Quantile based noise est. (Stahl et al 2000) used for electrol. speech (Pandey et al 2004) difficult to implement for real-time processing • Minimum statistics based method (Martin 1994) used for elec. lary. speech (Mitra & Pandey 2006, Kabir et al 2008) not effective with fixed subtraction parameters.
Intro. J & S 1/4 4 INTRO. OF JITTER & SHIMMER & SPECTRAL COMPENSATION • Introduction of jitter and shimmer, using LPC based analysis synthesis, after spectral subtraction for reducing unnaturalness • Spectral compensation for low-frequency spectral deficit
Intro. J & S 3/4 Implementation of shimmer Impulse amplitude = a(1+sr1) a = mean amplitude s = peak-to-peak shimmer r1= random number uniformly distributed over +0.5 Implementation of jitter Impulse repetition period = N(1+jr2) N = mean pitch period in number of samples j = peak-to-peak jitter r2 = random number uniformly distributed over +0.5
Intro. J & S 4/4 Spectral compensation • Low frequency spectral deficit in electrolaryngeal speech • High frequency spectral emphasis in resynthesized speech due to impulse train excitation in LPC analysis-synthesis, • Spectral compensation filter designed by comparing LPC smoothened spectra of natural and resynthesized /a/, /i/, /u/. Inserted in the excitation path for spectral compensation.
Results 1/2 Materal: “….Where were you a year ago? 1 2 3 4 5 6 7 8 9 10” Electrolarynx Solatone. • RESULTS γ = 1 Averaging: α=10, β=0.001, Min.: α = 25, β=0.005, Median: α = 1.5, β = 0.001
Results 2/2 Electrolaryngeal speech Enhan. electrolar. speech after spec. sub. with MBNE (α = 1.2, β = 0.001, γ = 1), Material: “…Where were you a year ago? “, Electrolarynx: Solatone
CONCLUSION • ▪ Median based noise estimation could be used for noise suppression without varying the oversubtraction factor. • ▪Phase estimation based on minimum phase and phase continuity did not imrove the quality above that of the noisy speech. ▪Introduction of shimmer did not improve speech quality. ▪Introduction of peak-to-peak jitter of up to 6 % and spectral compensation helped in improving the quality.
P. C. Pandey, S. K. Basha, “Enhancement of electrolaryngeal speech by spectral subtraction, spectral compensation, and introduction of jitter and shimmer”, Proc. 20th International Congress on Acoustics ( ICA 2010), 23-27 August 2010, Sydney, Australia. Abstract -- An electrolarynx, a verbal communication aid used by laryngectomy patients, is a vibrator held against the neck tissue to provide excitation to the vocal tract, as a substitute to that provided by the glottal vibrations. Although the user can set the vibration level and pitch, a dynamic control of level, voicing, and pitch during speech production is not feasible. In addition to this basic limitation, the electrolaryngeal speech suffers from (i) presence of background noise caused by leakage of acoustic energy from the vibrator and vibrator-tissue interface, (ii) low-frequency spectral deficiency, and (iii) unnatural quality due to constant pitch and level. Background noise decreases the intelligibility, while the other two factors affect the speech quality. Present study involved investigations for improving the intelligibility and quality of electrolaryngeal speech. Pitch-synchronous application of generalized spectral subtraction was used for reducing the background noise. In order to track the variation in the spectrum of the leakage noise due to changes in vibrator orientation and pressure during speech production, a dynamic estimation of noise was carried out from a set of past frames. The estimated noise spectrum was subtracted from that of the noisy speech and the resulting magnitude spectrum was combined with the original phase spectrum. The speech signal was resynthesized using overlap-add method, with two-pitch period analysis frames and one period overlap. Estimation of phase spectrum by minimum-phase assumption and the assumption of phase continuity did not improve the speech quality. An introduction of jitter and shimmer in the speech signal, using LPC based analysis-synthesis, was investigated for improving its naturalness. The excitation for synthesis was an impulse train with the frequency equal to that of the vibrator, with random frequency and amplitude modulations for providing the jitter and the shimmer, respectively. An FIR filtering of the excitation was used to match the long-term average spectral envelope of the processed electrolaryngeal speech to that of the normal speech. A peak-to-peak jitter of up to 6 % increased the naturalness, while introduction of shimmer decreased the quality.
REFERENCES 1 M. Weiss, G. Y. Komshian, and J. Heinz, “Acoustic and perceptual characteristics of speech produced with an electronic artificial larynx,” J. Acoust. Soc. Am., 65, 1298-1308 (1979). 2 H. L. Barney, F. E. Haworth, and H. K. Dunn, “An experimental transistorized artificial larynx,” Bell Systems Tech. J., 38, 1337-1356 (1959). 3 Q. Yingyong and B. Weinberg, “Low frequency energy deficit in electrolaryngeal speech,” J. Speech Hearing Res., 34, 1250-1256 (1991). 4 C. Y. Espy-Wilson, V. R. Chari, and C. B. Haung, “Enhancement of alaryngeal speech by adaptive filtering,” Proc. ICSLP, 764-771 (1996). 5 P. C. Pandey, S. M. Bhandarkar, G. K. Baccher, and P. K. Lehena, “Enhancement of alaryngeal speech using spectral subtraction,” Proc. 14th Int. Conf. Digital Signal Prcessing (DSP 2002), Santorini, Greece, 591-594 (2002). 6 P. C. Pandey, S. S. Pratapwar, and P. K. Lehana, “Enhancement of electrolaryngeal speech by reducing leakage noise using spectral subtraction with quantile based dynamic estimation of noise,” Proc. 18th Int. Congress Acoustics (ICA 2004), Kyoto, Japan, 3029-3032 (2004). 7 H. Liu, Q. Zhao, M. Wan, and S. Wang, “Application of spectral subtraction method on enhancement of electrolaryngeal speech,” J. Acoust. Soc. Am., 120, 398-406 (2006). 8 H. Liu, Q. Zhao, M. Wan and S. Wang, “Enhancement of electrolarynx speech based on auditory masking,” IEEE Trans. Biomed. Eng.,53, 865-874 (2006). • P. Mitra and P.C. Pandey, “Enhancement of electrolaryngeal speech by spectral subtraction with minimum statistics-based noise estimation,” J. Acoust. Soc. Amer., 120, 3039 (2006). 10 R. Kabir, A. Greenblatt, K. Panetta, and S. Agaian, “Enhancement of alaryngeal speech utilizing spectral subtract ion and minimum statistics,” Proc. 7th International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July (2008). 11 S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process, 27, 113-120 (1979). • M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” Proc.IEEE ICASSP’79, 208-211 (1979). 13 V. Stahl, A. Fisher, and R. Bipus, “Quantile based noise estimation for spectral subtraction and wiener filtering,” Proc. IEEE ICASSP’00, 3, 1875-1878 (2000).
14 R. Martin, “Spectral subtraction based on minimum statisic” Proc. 7th European Signal Processing Conf. (EUSIPCO–94), Edinburgh, Scoltland, 1182-1185 (1994). 15 T. F. Quatieri and A. V. Oppenheim,“Iterative techniques for minimum phase signal reconstruction from phase or magnitude,” IEEE Trans. Acoust., Speech, Signal Process., 29, 1187-1193 (1981). 16 B. Yegnanarayana and A. Dhayalan, “Noniterative techniques for minimum phase signal reconstruction from phase or magnitude,” Proc. IEEE ICASSP, 639-642, (1983). 17 A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. (Prentice-Hall, Englewood Cliffs, New Jersey, 1975). 18 L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, (Prentice Hall, Englewood Cliffs, New Jersey, 1978).