1 / 18

Sphinx 3.4 Development Progress Report in February

Sphinx 3.4 Development Progress Report in February. Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004. This Presentation. S3.4 Development Progress Speed-up Language Model facilities CALO and S3.5 Development Which features should be there to make CALO better?

zasha
Download Presentation

Sphinx 3.4 Development Progress Report in February

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sphinx 3.4 DevelopmentProgress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004

  2. This Presentation • S3.4 Development Progress • Speed-up • Language Model facilities • CALO and S3.5 Development • Which features should be there to make CALO better? • Schedule for next three months

  3. Review of Last Month Progress • Last month • Wrote a speed-up version of s3. • Completed some coding of s3.4 speed-up task. • This month • Backbone of speed-up functionalities s3.4 completed and tested. • Basic LM facilities completed and smoked-tested.

  4. Current Systems Specifications(without Gaussian Selection)

  5. Speed-up Facilities in s3.3 GMM Computation Seach Lexicon Structure Tree. Pruning Standard Heuristic Search Speed-up Not Implemented Frame-Level Not implemented Senone-Level Not implemented Gaussian-Level SVQ-based GMM Selection Sub-vector constrained to 3 Component-Level SVQ code removed

  6. Speed-up Facilities in s3.4 GMM Computation Seach Lexicon Structure Tree Pruning (New) Improved Word-end Pruning Heuristic Search Speed-up (New) Phoneme-Look-ahead Frame-Level (New) Naïve Down-Sampling (New) Conditional Down-Sampling Senone-Level (New) CI-based GMM Selection Gaussian-Level (New) VQ-based GMM Selection (New) Unconstrained no. of sub-vectors in SVQ-based GMM Selection Component-Level (New) SVQ code enabled

  7. S3.4 Speed Performance in Communicator Task

  8. Issues in Speed Optimization • Implementation Issues: • Beams applied on GMM causing many techniques hard to be implemented • Some facilities were hardwired for specific purpose. • Performance Issues • Each techniques reduced computation by 40-50% with <5% degradation. • However, they didn’t add-up…… • Reduction in computation has certain lower bound (usually 75%-80% reduction is max.) • Overhead is huge in some techniques • E.g. VQ-based Gaussian Selection take 0.25xRT

  9. Language Model Facilities • S3.3 only accept single LM without class in binary format • So far, S3.4 is able to accept multiple class-based LMs in binary format. • One major modification of codes • Affect 6-7 files. • Caveats: • Not perfect implementation. • Text format is not yet supported. Backward compatibility is an issue. • Lack of test-cases. Only slightly smoke-tested • ~1 more week work

  10. Problems with s3.4 (valid for Feb 29th, 2004) • Only accept DMP file. • Txt format reader is very complex in Sphinx 2. • Straight conversion is not clean. • LMs are all loaded into memory • We can work on this. • Lexical tree are all built at the beginning • We tried to avoid the overhead of rebuilding tree in every utterance.

  11. Summary in Sphinx 3.4 Development • Derivative s3.3 • With Speed Optimization • Better LM facilities • Algorithmic Optimization is 90% completed • Still need to improve overhead performance. Tree-based GMM selection is desirable. • Improvement for individual technique. • Go-through the major hurdle of multiple LMs and class-based LMs. • Need more time to make it more stable. • Expected internal release time : March 8, 2004

  12. Sphinx 3.4 and CALO • Which pieces are missing? • Sphinx 3.4’s decoding is still not streamlined => Continuous Listening is not yet enabled. • Sphinx’s speed may still not be ideal. • From s3 to s3.3, ~10% degradation. • Sphinx 3.4 doesn’t learn from data yet.

  13. Sphinx 3.5. What should we do in next 3 months? • Expected release time (May – June) • Interfaces: • Streamlined front-end and decoding • (?) Portaudio based audio routine. • Speed/Accuracy • Improved lexical tree search • Machine optimization of Gaussian computation. • Combination of multiple recognizers • Learning • Acoustic Model adaptation • (?) Language Model adaptation • (In Phoenix) Better semantic parsing • Resource Acquisition and Load Balancing

  14. Highlight I: Speed/Accuracy • Improved lexical tree search • Current implementation used single lexical tree. • May be desirable to create tree copies. • Machine Optimization of Gaussian Computation • SIMD (Single Implementation Multiple Data) • Require help from assembly language experts. (Jason/Thomas)

  15. Highlight II: Multiple Recognizer Combination and Resource Acquisition • Research by Rong suggests combination of multiple recognizer can improve accuracy • Speed worsen by 100% if we run two recognizers. • An interesting solution: • Computation can be shared by other machines in the meeting. • Inspired by routing implementation. • A very natural solution in meeting scenario because usually only one person will be speaking. • Challenges : Bandwidth and Load Balancing

  16. Highlight III: • Learning • Acoustic Model • Maximum Likely Linear Regression (MLLR) • Will be responsible by Jahanzeb • (?)Language Model • How? • Cached-based LM? • (?)Improved Robust Parsing • Better parsing based on previous command history •  Phoenix’s source code is not easy to trace • Thomas Harris’s implementation may be a good place to start.

  17. Arthur and Jahanzeb’s Proposed Schedule

  18. Cont.

More Related