170 likes | 266 Views
CALO Decoder Progress Report for June. Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University July 6, 2004. This Presentation. Progress report for June (15 pages) Review and Highlight (2 pages) ICSI AM training (4 pages)
E N D
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University July 6, 2004
This Presentation • Progress report for June (15 pages) • Review and Highlight (2 pages) • ICSI AM training (4 pages) • Infrastructure (2 page) • Decoder (8 pages) • Summary and Outlook (1 pages) • Review of Q2 2004 • Live-mode APIs not completed • Sphinx not yet tested for task with vocab> 2k • ICSI training just started
June high-light • They are completed ! • (to some extent) • Live-mode APIs prototype is completed • A demo is built. • Sphinx 3.4 went through the WSJ 5k task successfully • Without pruning • First two phases of ICSI training are completed
ICSI Training -Grand Plan • By Ziad and ArthurC • Transcript conversion is completed • 4 Phases • Phase I - Replication of Rita’s training • Phase II – Fixing Resource • Use corrected train/test/dev sets • Fixed transcriptions and dictionary • Phase III – Tuning • Training: On topology/#senones/#mix • Recognition: Parameters tuning • Phase IV – Further Improvement • Use SCHMM to generate trees? • Automatic question generation? • Others?
ICSI Training-Current Status • Phase I completed • Within 0.5% difference from Rita’ results • Tested on transcriber’s meeting • 47.3% WERR. (45.2% WERR when equivalence pair were considered) • Phase II completed • In the development set and testing set • Results varied from 47% to 29% • Clipped speech deletion found to be ineffective.
ICSI Training-Before we go to Phase III • From the last two phases • We have some results that looks good. • BUT, Results vary with meeting conditions • # of speakers? • Speaker speaking rate entropy? • Cross talk? • Understanding is more important than typing! • Plan of next month • Understand why recognition results vary • Complete Phase III and IV with current test sets. • Obtain standard test set from NIST
Infrastructure (2 pages) -Workshops and Presentations • 2 CVS Workshops • had great discussion in the workshop • Slides can be found at ArthurC’s web page • Will re-do it in the new semester. • 2 Speech Developer’s meetings • Next meeting on this Thursday: • “From main() to GMM computation.
Infrastructure-CVS • What’re there in CVS? • MRCP source code (v1 and v2) • Standard training scripts: • ICSI Conversion Scripts • Communicator Training Scripts • Guarantee giving you 100% Satisfaction and 12% WERR. • WSJ 5k Training Scripts • Guarantee giving you 100% Satisfaction and 8% WERR. • Outlook • Need to migrate to other machines. • Next: ICSI training scripts (P1 to P4) • Communicator /WSJ testing scripts.
Decoder work (7 pages)-Interface • By Yitao (he didn’t even get hurt!) • Sphinx 2-like APIs’ prototype is completed, functions completed • Initialization • A demo is also built. • Will be officially included in Sphinx 3.5. • Latest code already available in CVS • Plan of July • Let the APIs go-through its ultimate challenge: be used in an application. • Enable logging of the recognizer
Decoder work -Speed • With big help from Evandro • WSJ 5k task evaluation completed • NVP, perplexity ~= 90 • Tested under a 2G machine • All results are not tuned. (very wide beam-width, no fast GMM computation) • S3 (s3flat) : WERR 6.5%, Speed 2.7xRT • S3.4 (s3fast) : WERR 6.65%, Speed 0.94xRT • Conclusion : WSJ 5k task is not our challenge. • Plan of July -> It is time to try a 20k task. (ICSI or WSJ 20k)
SphinxTrain work • In the current Baum-Welch trainer of SphinxTrain (v0.92) • Silence is not optionally deleted in Baum-Welch • Multiple pronunciations are not allowed in Baum-Welch • We rely on force alignment to get the correct alignment
SphinxTrain 0.93 progress • Silence Modeling • Optional silence deletion is now allowed • Progress : Completed • Multiple Pronunciation • To be Allowed in Baum-Welch • Progress : nearly completed (need 2-3 days) • Correct Triphone Expansion • May not have time to finish it in Q3. • Plan of July • Enable multiple pronunciations in Baum-Welch • Legacy is a problem! (We could fix Sphinx 4 Trainer instead.)
Decoder work -Adaptation • Mainly code-tracing in this part • Situation: • Two versions of MLLR adaptation (Sam Joo’s and SphinxTrain’s) • Some code need to be refined before we expose them • S3flat has MLLR but not S3fast • Plan of this month • After finish trainer job, we will tackle it.
Decoder work –Packaging and Distribution • Official Web page: • cmusphinx.sourceforge.net/ • Release Process • 1, set n = 1 • 2, Loop • Distribute the Release Candidate n • See anyone yell in one week (calm down period) • If yes, n = n + 1, loop again. • If no, break • 3, Copy the RC into Sourceforge’s standard distribution web site. • Current status: • People yelled in RC II in the calm down period (Yitao fixed them) • Create RCIII this week.
Decoder work -Miscellaneous • Continuous HMM for Communicator model is also completed. • Ready for combination (Do we want to?) • Possibly we want to combine ICSI model and CMU model. • Training script is still a big headache for use • Still have no time to fix it.
Decoder work –Documentation(aka sphinxDoc) • Only have progress when • ArthurC procrastinates and doesn’t want to read and play video game • Draft I of Chapter I and II are completed. • Chapter I : License Agreement and user responsibility • Chapter II : • What is speech recognition for dummy. • History of speech recognition • History of sphinx • Version of sphinx (When to use what)
Summary and Outlook • We have done something in June • We better do more in next 3 months. • Priorities – We have to deal with “CALO Grand Challenge” • Recorder/Classifier/Recognizer Integration • Improvement of Acoustic/Language Modeling • Speaker Adaptation • Non-completed tasks always on the list and will pop up in the right time.