60 likes | 198 Views
ESTER Project Balamand work. Rania Bayeh, Chafic Mokbel. Introduction. Participation in two tasks: Detection (SES) Transcription (TRS) Tools used: GMM (Spro + Becars for detection) HMM (HCM for transcription). Detection. Feature Extraction (using Spro):
E N D
ESTER ProjectBalamand work Rania Bayeh, Chafic Mokbel
Introduction • Participation in two tasks: • Detection (SES) • Transcription (TRS) • Tools used: • GMM (Spro + Becars for detection) • HMM (HCM for transcription)
Detection • Feature Extraction (using Spro): • 20 MFCC including energy + first order derivative (40 coefficients) • Frame duration 64ms, Frame shift 20ms • 128 Gaussian pdf GMM for (using Becars): • Female Speech, Male speech, Music and Silence • Window based detection (20 frames ~ 450 ms) • A simple time smoothing algorithm: • One window that is detected different than surrounding windows is merged to surrounding windows
Transcription • Acoustic modeling (using HCM a full toolkit): • Feature extraction (using HTK): 13 MFCC + first and second-order derivatives: 39 coefficients • Triphone models: • 3-states models with 32 Gaussian pdfs on each state • Classified using CART algorithm • Trained on ESTER database • Words boundaries models are star type
Transcription • Language modeling: • SRILM to build a bigram • Obtained bigram compiled to fit HCM
Transcription • Comments: • HCM training is real-time • Problems while decoding and high error rate (that is why no results submitted). The problems were: • No silence model included (error in HCM scripts) • Error in the phonemes attributes provided to the CART algorithm: several phonemes are confused (grouped) together • ReTraining is going on and results will be submitted soon