210 likes | 302 Views
On improving the intelligibility of synchronized over-lap-and-add (SOLA) at low TSM factor Wong, P.H.W.; Au, O.C.; Wong, J.W.C.; Lau, W.H.B. TENCON '97. IEEE Region 10 Annual Conference. Fast time scale modification using envelope-matching technique (EM-TSM)
E N D
On improving the intelligibility of synchronized over-lap-and-add (SOLA) at low TSM factor Wong, P.H.W.; Au, O.C.; Wong, J.W.C.; Lau, W.H.B. TENCON '97. IEEE Region 10 Annual Conference. Fast time scale modification using envelope-matching technique (EM-TSM) Wong, J.W.C.; Au, O.C.; Wong, P.H.W. Circuits and Systems, 1998. ISCAS '98.
Outline • Introduction • Review of Synchronized Overlap-and-Add (SOLA) • Modified SOLA • Simulation and Results • Conclusions
Introduction • TSM (Time-Scale Modification) • to change the time scale of a signal • to make degraded speech more intelligible • TSM factor α • α = 1 : the signal is unchanged • α > 1 : the signal is time expanded • α < 1 : the signal is time compressed
Introduction • TSM algorithms • time domain techniques • OLA, SOLA …. • frequency domain techniques • LSEE_MSTFTM (Least Square Error Estimation from Modified Short Time Fourier Transform Magnitude)
Introduction • SOLA • based on OLA which simply overlaps and adds adjacent frames • overlaps only at the points with highest similarity between the two overlapping frames.
Review of SOLA • x[n] : the analysis signal (input) • be segmented into frames that are a distance of Sa apart • y[n] : the synthesis signal (output) • be segmented into frames that are a distance of Ss apart
kmin kmax x[n] 0 Sa 2Sa 3Sa y[n] Ss 2Ss 3Ss 0 Review of SOLA Ss = Sa xα
Review of SOLA • The normalized cross-correlation function
On improving the intelligibility of synchronized over-lap-and-add (SOLA) at low TSM factor
Modified SOLA for small TSM factor • Use a time varying TSM factor α(t), rather than a fixed constant α. • αshould be small when adjacent analysis frames are very similar and high when they are not so similar. • In addition, remove the silent frames characterized by very little frame energy. • Use the cross-correlation as a check.
Modified SOLA for small TSM factor 1. All frames are tested for silent frames which are discarded. 2. All non-silent frames are assumed to be vowel-like frames and are to use a smaller-than-target TSM factor. 3. If the cross-correlation ever exceeds 0.9 with in the search range, the frame is confirmed to be vowel-like and the first peak above 0.9 will be considered the optimal position.
Modified SOLA for small TSM factor 4. If the cross correlation does not exceed 0.9 throughout the search range, the frame is considered a transient frame and a larger TSM factor is used. The search range is extended to cover the range for and further searching is done.
Fast time scale modification using envelope-matching technique (EM-TSM)
Simulation and Results • The mean square difference • the smaller in mean square difference indicates better quality.
Conclusions • Using time varying time scale factor rather than a constant one improves the intelligibility when the TSM factor is small. • By the fast technique for measuring the signal similarities, speed up factor in the order of 102 can be obtained with very good speech quality.