120 likes | 266 Views
Current HOARSE related activities. 6-7 Sept 2002. …include the following (+ more). Novel architectures All-combinations HMM/ANN Tandem HMM/ANN hybrid DBNs: exploring new topologies HMM2: formant ftrs for spkr norm. Clustering & segmentation Speech/music segmentation Speaker clustering.
E N D
Current HOARSE related activities 6-7 Sept 2002
…include the following (+ more) Novel architectures • All-combinations HMM/ANN • Tandem HMM/ANN hybrid • DBNs: exploring new topologies • HMM2: formant ftrs for spkr norm. Clustering & segmentation • Speech/music segmentation • Speaker clustering Evidence weighting • Mic arrays for MD mask estimation • Entropy based MS combination • Confusion based entropy correction • Noise PDF transformation in MD ASR
All-combinations HMM/ANN MAP static weighting leads to MAP combination after decoding AC sum rule ACMS overcomes assumption of conditional independence between data streams
Tandem HMM/ANN hybrid Tandem multi-stream Output from one or more MLPs is appended and orthogonalised, then used as discriminative feature data for training standard HMM/GMM e.g. combine MSG with PLP Tandem multi-band Training narrow sub-band MLPs with noisy data results in robust features which are independent of noise type Robust sub-band features concatenated before input to speech feature extractor
DBNs: exploring new topologies Baseline DBN for IWR Topology 1 Topology 2 Topology 3 Aux variables tested: articulator (quantised)(+); pitch(-); speech rate(-); energy(+)
HMM2: formant ftrs for spkr norm. WER avg over SNR: 4 fmnt = 28.1%, MFCC = 14.8%, fmnt + MFCC = 14.3%
Speech/Music Segmentation Best results from concatenated Entropy & Dynamism ftrs. ACMS not tested Whether best from GMM or MLP is task dependent Entropy Dynamism
Speaker Clustering Clustering: start with many clusters. Repeat (merge cluster pair with most negative dist) untill no such pair. Usual distance is BIC (Bayesian Information Criterion) dist. New model: proposed distance avoids estimation of lamda by ensuring K = 0, where K is diff. in size (# params) of merged cluster and sum of sizes of separate clusters.
Mic arrays for MD mask estimation Reliability mask MA + MD => 40% rel. err. red. over MA enhancement Oracle 1 chan Advantage still greater with 2 mics 2 chan Filter-sum beamformer with post filter “one” 4 mic array used
Entropy based MS combination Various functions of the stream entropies were tested for recognition performance. Combination used weighted ACMS sum rule.
silence space Confusion based entropy correction Confusion matrix for 1/6 fullband MFCC expert with: band 1 band 6 With multi-condition trained narrowband models, entropy first increases with noise level, but then decreases to zero Misleading expert entropies can be avoided if posterior probabilities are corrected by a linear transformation obtained from corresponding X validation confusion matrix
Noise PDF transf. in MD ASR • Usual SMD clean data mix pdf has 2 mix comps (uniform & dirac) for SNR<0 or >=0. But “max” assumption used here is inaccurate. • Better to use 3 mix pdf: SNR < SNRlo or SNR > SNRhi or neither • For case “neither”, with noise pdf p N(.), compression function C(.) with inverse B(.), and noisy obs. z, clean data mix pdf pX(.) is pX(x) = pN(B(z)-B(x))B’(x) , over x in [0,z] • e.g. for p N(.) uniform and cube root compression, pX(x) = 3x2/z3