1 / 17

CALO Recorder/Decoder Progress Report for Summer 2004 (July and August)

CALO Recorder/Decoder Progress Report for Summer 2004 (July and August). Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder) Ziad Al Bawab (Recorder/End-pointer) Rong Zhang (ICSI Training) Arthur Chan (Recorder/End-Pointer/Decoder/Trainer)

Download Presentation

CALO Recorder/Decoder Progress Report for Summer 2004 (July and August)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder) Ziad Al Bawab (Recorder/End-pointer) Rong Zhang (ICSI Training) Arthur Chan (Recorder/End-Pointer/Decoder/Trainer) Carnegie Mellon University Aug 30, 2004

  2. Summer Highlight • This presentation (15 pages) • Review of June and Highlight (2 pages) • Recorder (3 pages) • Decoder (6 pages) • ICSI Training (1 page) • Trainer (1 page) • Documentation (1 page) • Conclusion (1 page)

  3. Review of June • Three goals we set in June • 1, Recorder/Classifier/Decoder Integration • 2, Further Improvement of ICSI Training • 3, Speaker Adaptation • Summer highlight • We solve 2 (1+0.5 + 0.5) out of the 3 problems • Plus more

  4. Problems we faced in the Summer • Summer is a nice season • Many of us had vacation/left • Alex : Went to Spain in last three weeks of July • Jason : Left and went to Texas • ThomasQ, Yash, Moss : Internship in other states • Ziad : Back to Lebanon from Aug 1 – Aug 21 • Mock : Back to Thailand from Aug 1 – Aug 15 • (Evandro) : Went to vacation from Aug 1 – Aug 15 • Arthur : broke down from Aug 12 – Aug 22 • Lack of man power were a big problem.

  5. Recorder (Integration) • Ziad/Yitao/Arthur • Recorder + Classifier + Decoder • Code Integration is completed • Classifier and end-pointer are now modularized and incorporated to CALO Recorder. • “FSM” of end-pointer is now implemented • Classifier + Decoder had a hard-time • Trapped by feature mismatch • Now fixed. • Yitao also separate classifier and decoder into separate thread. • Outlook: Before code check-in, we may need to fix speed-up problems • (Our weakness) 3 components are closely coupled

  6. Recorder (Portability) • By Jason/Arthur • We are not yet “CP” • In Windows, cygwin, linux and Mac OSX, our codebase in CVS • compiled • linked • It now works in the following platforms: • Windows -Fully functioning with extra functions specific to Windows • Cygwin -Small problems in GUI, NTP works now • MacOSX -Fully functioning, just need to fix some memory leaks and invalid memory read/write • In Linux • AD97 chipset still confuse Portaudio library

  7. Recorder Outlook in Q4 • What should we do? • Linux : Focus on Linux’s Port • Fix portaudio problem • Fix offline classifier • Barely able to support more feature requests without Thomas. • We need to implement switch for processing routines. • Reducing the boundary of release and development • After we fix the portability problem, it’s time to move to SRI’s CVS. • Memory management can be an issue • Need to scan it using memory checking tools

  8. Decoder (Live Mode APIs) • More robust than Jun • Fixed couple of memory problems • Now going through in-depth code review • Documented and commented • An advantage for our partner.

  9. Decoder (Speed) • We finally have a s3.x setup for ICSI • A quick hack without careful tuning • 0.6xRT in a 2G machine with relative 20% degradation (from 69% -> 63%) • Outlook: become important Q4’s goals again

  10. Decoder (Speaker Adaptation) • Single regression class MLLR is now fully supported • Produce exactly the same result as Sam-Joo’s package • Lack of test cases for now • Outlook: In Q4, we need to • Test the current package with more test cases. • If time allows, enable multiple regression class and MAP.

  11. Decoder (s3.0/s3.x code merging) • align, astar, allphone, dag, decode-anytopo are now in s3.5 codebase • Thanks to Carl Quillen • Merging is 80% completed, • Code compiled, linked and ran. • align and allphone are fixed. There are still small difference because there are small difference between s3.0 and s3.x • astar/dag/decode-anytopo in progress. • 12k lines of code are saved • from s3.0 + s3.2 (63k) to s3.5 (51k) • Only slight increase in the package size • 0.3 M to 0.5 M

  12. Decoder (s3.0/s3.x code merging) (cont.) • Consequence of merging, it will be possible to use 3.x to • Generate alignment • Generate n-best • Do phoneme recognition • Search best path in the lattice • Do flat lexicon search. • Interface is also available reading N-best. Not exposed yet. • Outlook : More code merging activities will happened in next two quarters.

  13. Decoder (Release) • We need to provide our partners a recognizer • With state of the art technology • high performance • Sphinx 3.5 will be released at the beginning of September • Still need work on • Write two more chapters of documentation • Polish live-mode APIs • Some small code clean-ups • Will also announce corresponding tag for SphinxTrain. • A simultaneous release of s3.5 + ST

  14. ICSI Training (Phase III) • By Rong • Phase I and II had been completed in May and June. • Now in Phase III: Tuning • We already tuned the parameters such as # of senone and # of mixture. • Ziad and Arthur are too busy in Summer • Outlook: an area which was under-worked in Summer. Need to do more in Q4.

  15. Trainer (Clean-up) • Unification of the front-end • Sphinx 2/ Sphinx 3/SphinxTrain • Thanks to Evandro • No need to worry about code-level mismatch • Unification of command-line interface • 36 out of 37 tools now have standard command-line interface. • All support options –example and -help • Appendix A.2 of Hieroglyph • A 94 pages comprehensive and formatted documentation can now be found on-line

  16. Documentation • Project Hieroglyph • An effort to build a set of comprehensive documentation • using Sphinx, SphinxTrain and CMU LM Toolkit to build speech application • In Summer • 1st Draft of “Speaker Adaptation” (Chapter 9) is completed • 1st Draft of “SphinxTrain command line reference” (Chapter A.2) is completed. • 2nd Draft of “License of Sphinx” is completed. • All can be found in • www.cs.cmu.edu/~archan/sphinxDoc.html

  17. Conclusion • We have done something in the Summer • But with great pain • We need to put more stress on some weak areas in Q4 • Outlook in September and Q4 • September : ICASSP 2005 and ICSLP 2004 preparation • October : Polish Speaker Adaptation • November : Complete dynamic LM addition/deletion • December : Search refinement, further speed-up.

More Related