1 / 12

Current HOARSE related activities

Current HOARSE related activities. 6-7 Sept 2002. …include the following (+ more). Novel architectures All-combinations HMM/ANN Tandem HMM/ANN hybrid DBNs: exploring new topologies HMM2: formant ftrs for spkr norm. Clustering & segmentation Speech/music segmentation Speaker clustering.

louisa
Download Presentation

Current HOARSE related activities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Current HOARSE related activities 6-7 Sept 2002

  2. …include the following (+ more) Novel architectures • All-combinations HMM/ANN • Tandem HMM/ANN hybrid • DBNs: exploring new topologies • HMM2: formant ftrs for spkr norm. Clustering & segmentation • Speech/music segmentation • Speaker clustering Evidence weighting • Mic arrays for MD mask estimation • Entropy based MS combination • Confusion based entropy correction • Noise PDF transformation in MD ASR

  3. All-combinations HMM/ANN MAP static weighting leads to MAP combination after decoding AC sum rule ACMS overcomes assumption of conditional independence between data streams

  4. Tandem HMM/ANN hybrid Tandem multi-stream Output from one or more MLPs is appended and orthogonalised, then used as discriminative feature data for training standard HMM/GMM e.g. combine MSG with PLP Tandem multi-band Training narrow sub-band MLPs with noisy data results in robust features which are independent of noise type Robust sub-band features concatenated before input to speech feature extractor

  5. DBNs: exploring new topologies Baseline DBN for IWR Topology 1 Topology 2 Topology 3 Aux variables tested: articulator (quantised)(+); pitch(-); speech rate(-); energy(+)

  6. HMM2: formant ftrs for spkr norm. WER avg over SNR: 4 fmnt = 28.1%, MFCC = 14.8%, fmnt + MFCC = 14.3%

  7. Speech/Music Segmentation Best results from concatenated Entropy & Dynamism ftrs. ACMS not tested Whether best from GMM or MLP is task dependent Entropy Dynamism

  8. Speaker Clustering Clustering: start with many clusters. Repeat (merge cluster pair with most negative dist) untill no such pair. Usual distance is BIC (Bayesian Information Criterion) dist. New model: proposed distance avoids estimation of lamda by ensuring K = 0, where K is diff. in size (# params) of merged cluster and sum of sizes of separate clusters.

  9. Mic arrays for MD mask estimation Reliability mask MA + MD => 40% rel. err. red. over MA enhancement Oracle 1 chan Advantage still greater with 2 mics 2 chan Filter-sum beamformer with post filter “one” 4 mic array used

  10. Entropy based MS combination Various functions of the stream entropies were tested for recognition performance. Combination used weighted ACMS sum rule.

  11. silence space Confusion based entropy correction Confusion matrix for 1/6 fullband MFCC expert with: band 1 band 6 With multi-condition trained narrowband models, entropy first increases with noise level, but then decreases to zero Misleading expert entropies can be avoided if posterior probabilities are corrected by a linear transformation obtained from corresponding X validation confusion matrix

  12. Noise PDF transf. in MD ASR • Usual SMD clean data mix pdf has 2 mix comps (uniform & dirac) for SNR<0 or >=0. But “max” assumption used here is inaccurate. • Better to use 3 mix pdf: SNR < SNRlo or SNR > SNRhi or neither • For case “neither”, with noise pdf p N(.), compression function C(.) with inverse B(.), and noisy obs. z, clean data mix pdf pX(.) is pX(x) = pN(B(z)-B(x))B’(x) , over x in [0,z] • e.g. for p N(.) uniform and cube root compression, pX(x) = 3x2/z3

More Related