1 / 18

Chapter 5 Homomorphic Processing(1)

Chapter 5 Homomorphic Processing(1). In frequency domain, X(z) = E(z) V(z) (ignoring R(z)) In time domain x(n) is the convolution between the excitation e(n) and system unit sampling response v(n).

neil
Download Presentation

Chapter 5 Homomorphic Processing(1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5 Homomorphic Processing(1) • In frequency domain, X(z) = E(z) V(z) (ignoring R(z)) • In time domain x(n) is the convolution between the excitation e(n) and system unit sampling response v(n). • How to get e(n) and v(n) from x(n) is important. This algorithm is called the de-convolution algorithm. There are two categories of it: parametric de-convolution and non-parametric de-convolution.

  2. 5.1 Principle of Homomorphic Processing (1) • The general system for homomorphic processing is like following: • x1(n)△x2(n)D△[.] x1(n)+x2(n)L[.] y1(n)+y2(n)D△-1[.]y1(n)△y2(n) • Where △is an operation( multiplication or convolution), D△[.] is called the characteristic system by which x1(n)△x2(n) becomes x1(n)+x2(n). L[.] is a linear system by which output is y1(n)+y2(n),D△-1[.] creates y1(n)△y2(n)

  3. Principle of Homomorphic Processing (2) • Only discuss the convolution homomorphic signal processing system • D*[.]: • X(z) = Z[x(n)] = Σ x(n)Z-n from N1 to N2 • X(Z) = ln[X(Z)] • x(n) = Z-1[X(Z)]=∮ln[X(Z)]Zn-1dZ/(2πj) • L[.] : y(n) = L[x(n)]

  4. Principle of Homomorphic Processing (3) • D△-1[.] : • Y(Z)=Z[y(n)]=Σy (n)Z-n from -∞to∞ • Y(Z)=exp[Y(Z)] • Y(n)=Z[Y(Z)]=∮ exp[Y(Z)]Zn-1dZ/ (2πj) • If x(n) = x1(n)* x2(n) • Then x(n)=x1(n)+x2(n) and y(n)=y1(n)*y2(n) • x(n) is called the Complex Cepstrum of x(n)

  5. Principle of Homomorphic Processing (4) • In most cases, the convergent areas of X(Z),X(Z),Y(Z),Y(Z) include the unit circle, so DTFT could be used to replace Z transform: • X(expjω)=F[x(n)]=Σx(n)exp(-jωn) • X(expjω)=ln[X(expjω)] • x(n)=∫X(expjω)exp(jωn)dω/(2π)

  6. Principle of Homomorphic Processing (5) • Y(expjω)=F[y(n)]=Σy(n)exp(-jωn) • Y(expjω)=exp[Y (expjω)] • y(n)=∫ Y(expjω)exp(jωn)/ (2π) • There are some properties: • X(expjω)=Σx(n)exp(-jωn) • Y(expjω)=Σy(n)exp(-jωn) • If x(n) is a real sequence, x(n) is also a real sequence

  7. Principle of Homomorphic Processing (6) • How to do de-convolution? • If in discrete time domain x(n)=x1(n)*x2(n), then in complex cepstrum domain x(n)=x1(n)+x2(n) • Suppose x1(n) is 0 outside [n1,n2], x2(n) is 0 outside [n3,n4], and two intervals do not overlap, then properly designed L[.] could separate x1(n) if L[.] is a rectangle window over [n1,n2]. So L[.] is called Lifter.

  8. Principle of Homomorphic Processing (7) • Another Homomorphic processing system • X(expjω)=F[x(n)]=Σx(n)exp(-jωn) • C(expjω)=ln[|X(expjω)|] • c(n)=∫ C(expjω)exp(jωn)dω/(2π) • Inverse transformation is same. The only difference is replacing ln[X(expjω)] with ln[|X(expjω)|]. c(n) is called cepstrum.

  9. Principle of Homomorphic Processing (8) • If c1(n) and c2(n) are the cepstrums of x1(n) and x2(n), x(n)=x1(n)*x2(n), then the cepstrum of x(n)  c(n)=c1(n)+c2(n). • The difference is that through the forward and backward transformation x(n) is no longer itself in the cepstrum case.

  10. Principle of Homomorphic Processing (9) • c(n) could be found by x(n). • Suppose x(n)=xe(n)+xo(n) • xe(n) = xe(-n), xo(n) = - xo(-n) • xe(n) = [x(n) + x(-n)]/2 • xo(n) = [x(n) - x(-n)]/2 • Because the DTFT of an even symmetric sequence is a real function, c(n)= xe(n)= [x(n) + x(-n)]/2

  11. 5.3 Practical Algorithms for finding the Complex cepstrum and cepstrum(1) • Because directly computing x(n) by X(Z) involves solving the high order algebraic equations, it is not practical. • We can use the formula for DTFT, but for computer, it should use DFT or FFT to do. • Suppose x(n) has limited length [0,N-1] by zero padding.

  12. Practical Algorithms for finding the Complex cepstrum and cepstrum(2) • X(k) = Σx(n)exp(-j2πnk/N) n,k=0~N-1 • X(k) = lnX(k) k=0~N-1 • Or C(k)=ln|X(k)| k=0~N-1 • x(n) = ΣX(k)exp(j2πnk/N) /N k,n=0~N-1 • c(n) = ΣC(k)exp(j2πnk/N) /N k,n=0~N-1 • Be careful for anti-aliasing : N>2max{na,nb} • See Fig. 4-5 on page 57 for system

  13. 5.4 Application of Homomorphic Processing • Characteristics of complex cepstrum and cepstrum of speech signals • Application in U/V decision and pitch estimation • Application in Extraction of formants • Application in Speech Synthesis

  14. Characteristics of complex cepstrum and cepstrum of speech signal (1) • Characteristics of complex cepstrum and cepstrum of speech signals • In Z domain, X(Z)=E(Z)V(Z) • In complex cepstrum domain x(n)=e(n)+v(n) • For voiced phone e(n) is a periodic sequence. Suppose the period is Np, (Np=Tp fs) • e(n)=Σr=0Rδ(n-rNp) (See page 59) • So e(n) !=0 only on n=mNp, m=1,2,3,… • Tp is 2.5ms-20ms. If fs=10kHz, Np is 25-200.

  15. Characteristics of complex cepstrum and cepstrum of speech signal (2) • v(n) is small outside [-25,25]. • So if a Lifter L[n] = 1 |n|<25 and 0 |n|>=25 is used, the v(n) could be separated. Then v(n) could be estimated by the inverse characteristic system. • If a Lifter L[n] = 1 |n|>=25 and 0 |n|<25 is used, the e(n) could be separated and e(n) could be restored. • For unvoiced phones e(n) has the property of noise, e(n) has no obvious peaks, it is in all time domain; v(n) is only in low time domain. • Please see the examples on page 60-61, diagram 4-6 and 4-7.

  16. Application in U/V decision and pitch estimation (1) • In the complex cepstrum and cepstrum of voiced there exist some peaks in multiple period of the pitch (Np). This is the main basis for distinguishing the unvoiced(U) and voiced(V). Also by Np and fs the Tp could be estimated. • But the trouble is for voiced sometime the peaks are not obvious; and for unvoiced the random peak is possible. • Absolute threshold and relative threshold for one frame • Decision by a couple of frames • The first peak should be at Np. If fs=10kHz, Np=25-200, so the search area should be around this range. The frame length should be at least 200 points (20ms)

  17. Application in Extraction of formants • If v(n) or cv(n) are separated, the logarithmic spectrum ln|V(expjω)| could be found by DTFT over cv(n) • By further processing the formants could be obtained. • The windowing function should not change rapidly (Hamming window is better than rectangle window).

  18. Application in Speech Synthesis (1) • For high quality of speech synthesis, the rhyming rules must be introduced into the system. In the speech database only single syllable is recorded. When uttering a word these syllables must be changed according to the rhyming rules (change amplitude, duration, tone and so on) • If e(n) and v(n) for every syllable are separated and stored in the database, the the changes will be implemented easily. • Changed e(n) will convolute with v(n) and generate the new speech for various words. This is one way to do speech synthesis. • By the real system, the speech quality is high but the smooth concatenation of the syllables is still to be improved.

More Related