Personal Information

Personal Information Name: Li, Zongge 李宗葛 Group: Automation Office: 505 Computer Building Tel: 65642071(o) 65250749(h) Fax: 65642071 Email: zgli@fudan.edu.cn

Part I Speech Representation, Models and Analysis This part covers chapter 1 to chapter 6. It is general for all kinds of speech application.

Chapter 1 Fundamentals of Digital Speech Processing and Probability Theory (1) • 1.1 Discrete-Time Signals and Systems • 1.2 Transform Representation of Signals and Systems • 1.2.1 The Z-Transform • 1.2.2 The Fourier Transform • 1.2.3 The Discrete Fourier Transform • 1.3 Fundamentals of Digital Filters • 1.3.1 FIR Systems • 1.3.2 IIR Systems • 1.4 Sampling • 1.4.1 The Sampling Theorem • 1.4.2 Decimation and Interpolation of Sampled Waveforms

Fundamentals of Digital Speech Processing and Probability Theory (2) • 1.5 Basics of Probability Theory • 1.5.1 Probability of Events • 1.5.2 Random Variables and Its Distribution • 1.5.3 Mean and Variation • 1.5.4 Covariance and Correlation • 1.5.5 Random Vector and Multivariate Distribution • 1.6 Basics of Information Theory • 1.6.1 Entropy • 1.6.2 Condition entropy • 1.6.3 Source and channel coding theorems • 1.7 Basics of Stochastic Process • 1.7.1 Stochastic Process and Its Distributions • 1.7.2 Numeral Characteristics of Stochastic Process • 1.7.3 Stationary stochastic process • 1.8 Problems

1.1 Discrete-Time Signals and Systems • Original speech signal is a continuous time function xa(t) • After sampling, we have discrete sequence x(n) = xa(nT) • Signal processing involves the transformation of a signal into the desired form • Single input/single output system y(n) = T[x(n)] • Single input/multiple output system y(n) = T[x(n)] • For linear shift-invariant system : • T[a1x1(n)+a2x2(n)]=a1T[x1(n)]+a2T2[x(n)] linear • y[n-n0] = T{x[n-n0]} time-invariant • y(n)=Σk=0n x(k)h(n-k)= Σk=0n x(n-k)h(n) • =x(n)*h(n) (convolution)

1.2 Transform Representation of Signals and Systems • 1.2.1 The Z-Transform of a sequence x(n) • X(z)=Σn=0 ∞ x(n)z-n x(n) = ∮X(z)zn-1dz / (2πj) • 1.2.2 The Fourier Transform of a discrete-time signal • X(ejω) = Σ n=0 ∞ x(n) e-jωn • x(n) = ∫-∞∞ X(ejω) ejωn dω/ (2π) • 1.2.3 The Z-Transform of a finite length sequence x(n) • X(k) = Σ n=0 N-1 x(n)e -j2πkn /N k = 0, 1, …, N -1 • x(n) = Σ k=0 N-1 X(k) e j2πkn /N / N n = 0, 1, …, N -1

1.3 Fundamentals of Digital Filters (1) • A digital filter is a discrete-time linear shift-invariant system for which • Y(z) = H(z) X(z) where H(z) is called system function • H(ejω) is called frequency response, it is a complex: • H(ejω) = Hr(ejω) +j Hi(ejω) or • H(ejω) = |H(ejω)| exp {jarg[H(ejω)]} • The inverse of H(ejω) is the impulsive response : • h(n)= ∫-∞∞ H(ejω) ejωn dω/ (2π) • The input and output of a filter satisfy : • y(n) – Σk=1m aky(n-k) = Σr=0M brx(n-r)

Fundamentals of Digital Filters (2) • 1.3.1 FIR( Finite Impulse Response ) Systems • All ak are 0 : y(n) = Σr=0M brx(n-r) • we have h(n) = bn 0 <= n <= M or 0 otherwise • There is no nonzero poles, only zeros for them • And they have exactly linear phase • 1.3.2 IIR( Infinite Impulse Response ) Systems • Not all ak are 0 : y(n) = Σaky(n-k) + Σbrx(n-r) • They have both poles and zeros and infinite duration • It is more efficiently to implement than an FIR

1.4 Sampling • x(n) = xa(nT) -∞ < n < ∞ , n is an integer • T is called the sampling period • 1.4.1 Shannon Sampling Theory • If a signal xa(t) has a bandlimited Fourier transform Xa(jΩ): • Xa(jΩ) = 0 for Ω>= 2πFN( Nyquist Frequency ) • Then xa(t) could be reconstructed from x(n) if 1/T > 2FN • 1.4.2 Decimation and Interpolation of sampled waveforms • New Sampling period T’ = MT ( reduced sampling rate ) • Sampled sequence is y(n) = xa(nT’) = xa(nMT) = x(Mn) • New Sampling period T’ = T/L ( increased sampling rate ) • Sampled sequence is y(n) = xa(nT’) = xa(nT/L) = x(n/L)

1.5 Basics of Probability Theory(1) • 1.5.1 Probability of event ( result of a test )P(A) = lim NA/Ns where Ns is the total times of the testNAis the times event A occurred, limit does exist when test number is big enoughP(AB) = P(B|A)P(A) = P(A|B)P(B) P(A1A2…An) = P(An|A1…An-1)…P(A2|A1)P(A1) • 1.5.2 Probability of Random Variable X fx(x) = P(X=x) the probability function for discrete random variableP(X<=x)=∫-∞x f(x)dx where f(x) is probability density function (continuous variable), P is distribution function

Basics of Probability Theory (3) • 1.5.5Covariance and Correlation • Covariance: Cov(X, Y) = E[(X-μx)(Y-μy)] if X and Y are random variables having a specific joint distribution, and E(X) = μx, E(Y) = μy, Var(X) = σx2 , Var(Y) = σy2 • Correlation coefficient of X and Y: ρxy = Cov(X,Y)/(σxσy ), -1<= ρxy <= 1 • 1.5.6 Random Vector and Multivariate Distributions • X = (X1, …, Xn), fx(x1, …, xn) = P(X1=x1, …, Xn=xn) • Vector form: X = [ X1 X2 … Xn ]’ E[X] = [ E[X1] E[X2] … E[Xn] ] • Cov(X1, X1) … Cov(X1, Xn) • Cov(X) = …… … …… it is the covariance matrix • Cov(Xn, X1) … Cov(Xn, Xn)

1.6 Basics of Information Theory(1) • 1.6.1 Entropy • Random variable xi has amount of information: I(xi) = log(1/P(xi)) • Information source S has average amount of informationH(S)=ΣiP(xi)I(xi)= ΣiP(xi) log(1/P(xi)) = E[-logP(xi)] • It is the entropy of information source S, H(S) >= 0 • 1.6.2 Condition Entropy • When X=(x1, x2, …, xs) is the input symbols of channel and Y = (y1, y2, …, yl) is the output symbols, the information channel is defined by Mij = P(yj|xi)Before transmission, the uncertainty of X is H(X). • Suppose yj is received, the uncertainty of X is reduced to H(X|Y=yj), H(X,Y) = H(X) + H(Y|X) • H(X1, …, Xn) = H(Xn|X1, …, Xn-1) + … + H(X2|X1) + H(X1)

Basics of Information Theory(2) • 1.6.3 Source and channel coding theorems and mutual information • Shannon’s source coding theorem says that a source cannot be coded with fewer bits than its entropy. • Channel coding theorem: I(X;Y) = H(X) - H(X|Y) is called mutual information • I(X;Y) = I(Y;X)=E[P(X,Y)/[P(X)P(Y)]] • 0<= I(X;Y) <= min [H(X), H(Y)] • C = max I(X,Y) • Shannon’s channel coding theorem says that for a given channel there exists a code that will permit the error-free transmission across the channel, provided R <= C, where R is the rate of the communication system

1.7 Basics of Stochastic Process (1) • 1.7.1 Stochastic Process and Its Distributions • ξ(t) is a stochastic process (time function), ξ(t1) is a random variable • One dimension probability distribution function : F1(x1, t1) = P[ξ(t1) ≤ x1] • One dimension probability density function : f1 (x1, t1) = δ F1(x1, t1) / δx1 • Expansion to n-dimension: Fn(x1, x2, …, xn; t1, t2, …, tn) and fn(x1, x2, …, xn; t1, t2, …, tn)

Basics of Stochastic Process (2) • 1.7.2 Numeral Characteristics of ξ(t) • The mathematical expectation of ξ(t) at time t: a(t) = E{ξ(t)} = ∫-∞∞ x f1(x;t)dx • The variance of ξ(t) at time t: σ2(t) = E{[ξ(t) - a(t)]2} = ∫-∞∞ x2 f1(x;t)dx – [a(t)]2 • The correlation function of ξ(t) : R(t1, t2)=E{[ξ(t1)ξ(t2)]}=∫-∞∞ ∫-∞∞x1x2f 2(x1,x2;t1,t2) dx1dx2 • 1.7.3 Stationary stochastic process • Definition: A ξ(t) is stationary if for any n and h, • fn(x1, x2, …, xn; t1, t2, …, tn) = fn(x1, x2, …, xn; t1+h, t2+h, …, tn+h) • For this process R(t1, t2) = R(t2- t1) = R(τ)

1.8 Problems(1) • 1.8.1 Use any software tool to get a speech data file (.wav) with sampling rate 8kHz and 16bit accuracy. Display it on the screen and play it by audio. • 1.8.2 If a device (or an algorithm) has following output y(n) = x(n) - αx(n-1), where x(n) is the input, α≈ 0.95-1.0. Please discuss the main function of the device. • 1.8.3 Make a program to draw the |H(ω)| vs the ω. Use db as unit of |H(ω)| .

Problems(2) • 1.8.4 Find the z-transform and the Fourier transform of each of following sequences :(1) Rectangular window w1(n) = 1 0 ≤ n ≤ N - 1; = 0 otherwise(2) Hamming window w2(n) = 0.54 - 0.46 cos[2πn/(N - 1)] 0 ≤ n ≤ N - 1 = 0 otherwise(3) Hanning window w3(n) = 0.5 {1 - cos[2πn/(N - 1)]} 0 ≤ n ≤ N - 1 = 0 otherwise

Personal Information