170 likes | 295 Views
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August). Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder) Ziad Al Bawab (Recorder/End-pointer) Rong Zhang (ICSI Training) Arthur Chan (Recorder/End-Pointer/Decoder/Trainer)
E N D
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder) Ziad Al Bawab (Recorder/End-pointer) Rong Zhang (ICSI Training) Arthur Chan (Recorder/End-Pointer/Decoder/Trainer) Carnegie Mellon University Aug 30, 2004
Summer Highlight • This presentation (15 pages) • Review of June and Highlight (2 pages) • Recorder (3 pages) • Decoder (6 pages) • ICSI Training (1 page) • Trainer (1 page) • Documentation (1 page) • Conclusion (1 page)
Review of June • Three goals we set in June • 1, Recorder/Classifier/Decoder Integration • 2, Further Improvement of ICSI Training • 3, Speaker Adaptation • Summer highlight • We solve 2 (1+0.5 + 0.5) out of the 3 problems • Plus more
Problems we faced in the Summer • Summer is a nice season • Many of us had vacation/left • Alex : Went to Spain in last three weeks of July • Jason : Left and went to Texas • ThomasQ, Yash, Moss : Internship in other states • Ziad : Back to Lebanon from Aug 1 – Aug 21 • Mock : Back to Thailand from Aug 1 – Aug 15 • (Evandro) : Went to vacation from Aug 1 – Aug 15 • Arthur : broke down from Aug 12 – Aug 22 • Lack of man power were a big problem.
Recorder (Integration) • Ziad/Yitao/Arthur • Recorder + Classifier + Decoder • Code Integration is completed • Classifier and end-pointer are now modularized and incorporated to CALO Recorder. • “FSM” of end-pointer is now implemented • Classifier + Decoder had a hard-time • Trapped by feature mismatch • Now fixed. • Yitao also separate classifier and decoder into separate thread. • Outlook: Before code check-in, we may need to fix speed-up problems • (Our weakness) 3 components are closely coupled
Recorder (Portability) • By Jason/Arthur • We are not yet “CP” • In Windows, cygwin, linux and Mac OSX, our codebase in CVS • compiled • linked • It now works in the following platforms: • Windows -Fully functioning with extra functions specific to Windows • Cygwin -Small problems in GUI, NTP works now • MacOSX -Fully functioning, just need to fix some memory leaks and invalid memory read/write • In Linux • AD97 chipset still confuse Portaudio library
Recorder Outlook in Q4 • What should we do? • Linux : Focus on Linux’s Port • Fix portaudio problem • Fix offline classifier • Barely able to support more feature requests without Thomas. • We need to implement switch for processing routines. • Reducing the boundary of release and development • After we fix the portability problem, it’s time to move to SRI’s CVS. • Memory management can be an issue • Need to scan it using memory checking tools
Decoder (Live Mode APIs) • More robust than Jun • Fixed couple of memory problems • Now going through in-depth code review • Documented and commented • An advantage for our partner.
Decoder (Speed) • We finally have a s3.x setup for ICSI • A quick hack without careful tuning • 0.6xRT in a 2G machine with relative 20% degradation (from 69% -> 63%) • Outlook: become important Q4’s goals again
Decoder (Speaker Adaptation) • Single regression class MLLR is now fully supported • Produce exactly the same result as Sam-Joo’s package • Lack of test cases for now • Outlook: In Q4, we need to • Test the current package with more test cases. • If time allows, enable multiple regression class and MAP.
Decoder (s3.0/s3.x code merging) • align, astar, allphone, dag, decode-anytopo are now in s3.5 codebase • Thanks to Carl Quillen • Merging is 80% completed, • Code compiled, linked and ran. • align and allphone are fixed. There are still small difference because there are small difference between s3.0 and s3.x • astar/dag/decode-anytopo in progress. • 12k lines of code are saved • from s3.0 + s3.2 (63k) to s3.5 (51k) • Only slight increase in the package size • 0.3 M to 0.5 M
Decoder (s3.0/s3.x code merging) (cont.) • Consequence of merging, it will be possible to use 3.x to • Generate alignment • Generate n-best • Do phoneme recognition • Search best path in the lattice • Do flat lexicon search. • Interface is also available reading N-best. Not exposed yet. • Outlook : More code merging activities will happened in next two quarters.
Decoder (Release) • We need to provide our partners a recognizer • With state of the art technology • high performance • Sphinx 3.5 will be released at the beginning of September • Still need work on • Write two more chapters of documentation • Polish live-mode APIs • Some small code clean-ups • Will also announce corresponding tag for SphinxTrain. • A simultaneous release of s3.5 + ST
ICSI Training (Phase III) • By Rong • Phase I and II had been completed in May and June. • Now in Phase III: Tuning • We already tuned the parameters such as # of senone and # of mixture. • Ziad and Arthur are too busy in Summer • Outlook: an area which was under-worked in Summer. Need to do more in Q4.
Trainer (Clean-up) • Unification of the front-end • Sphinx 2/ Sphinx 3/SphinxTrain • Thanks to Evandro • No need to worry about code-level mismatch • Unification of command-line interface • 36 out of 37 tools now have standard command-line interface. • All support options –example and -help • Appendix A.2 of Hieroglyph • A 94 pages comprehensive and formatted documentation can now be found on-line
Documentation • Project Hieroglyph • An effort to build a set of comprehensive documentation • using Sphinx, SphinxTrain and CMU LM Toolkit to build speech application • In Summer • 1st Draft of “Speaker Adaptation” (Chapter 9) is completed • 1st Draft of “SphinxTrain command line reference” (Chapter A.2) is completed. • 2nd Draft of “License of Sphinx” is completed. • All can be found in • www.cs.cmu.edu/~archan/sphinxDoc.html
Conclusion • We have done something in the Summer • But with great pain • We need to put more stress on some weak areas in Q4 • Outlook in September and Q4 • September : ICASSP 2005 and ICSLP 2004 preparation • October : Polish Speaker Adaptation • November : Complete dynamic LM addition/deletion • December : Search refinement, further speed-up.