Sphinx 3.4 Development Progress Report in February

Sphinx 3.4 DevelopmentProgress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004

This Presentation • S3.4 Development Progress • Speed-up • Language Model facilities • CALO and S3.5 Development • Which features should be there to make CALO better? • Schedule for next three months

Review of Last Month Progress • Last month • Wrote a speed-up version of s3. • Completed some coding of s3.4 speed-up task. • This month • Backbone of speed-up functionalities s3.4 completed and tested. • Basic LM facilities completed and smoked-tested.

Current Systems Specifications(without Gaussian Selection)

Speed-up Facilities in s3.3 GMM Computation Seach Lexicon Structure Tree. Pruning Standard Heuristic Search Speed-up Not Implemented Frame-Level Not implemented Senone-Level Not implemented Gaussian-Level SVQ-based GMM Selection Sub-vector constrained to 3 Component-Level SVQ code removed

Speed-up Facilities in s3.4 GMM Computation Seach Lexicon Structure Tree Pruning (New) Improved Word-end Pruning Heuristic Search Speed-up (New) Phoneme-Look-ahead Frame-Level (New) Naïve Down-Sampling (New) Conditional Down-Sampling Senone-Level (New) CI-based GMM Selection Gaussian-Level (New) VQ-based GMM Selection (New) Unconstrained no. of sub-vectors in SVQ-based GMM Selection Component-Level (New) SVQ code enabled

S3.4 Speed Performance in Communicator Task

Issues in Speed Optimization • Implementation Issues: • Beams applied on GMM causing many techniques hard to be implemented • Some facilities were hardwired for specific purpose. • Performance Issues • Each techniques reduced computation by 40-50% with <5% degradation. • However, they didn’t add-up…… • Reduction in computation has certain lower bound (usually 75%-80% reduction is max.) • Overhead is huge in some techniques • E.g. VQ-based Gaussian Selection take 0.25xRT

Language Model Facilities • S3.3 only accept single LM without class in binary format • So far, S3.4 is able to accept multiple class-based LMs in binary format. • One major modification of codes • Affect 6-7 files. • Caveats: • Not perfect implementation. • Text format is not yet supported. Backward compatibility is an issue. • Lack of test-cases. Only slightly smoke-tested • ~1 more week work

Problems with s3.4 (valid for Feb 29th, 2004) • Only accept DMP file. • Txt format reader is very complex in Sphinx 2. • Straight conversion is not clean. • LMs are all loaded into memory • We can work on this. • Lexical tree are all built at the beginning • We tried to avoid the overhead of rebuilding tree in every utterance.

Summary in Sphinx 3.4 Development • Derivative s3.3 • With Speed Optimization • Better LM facilities • Algorithmic Optimization is 90% completed • Still need to improve overhead performance. Tree-based GMM selection is desirable. • Improvement for individual technique. • Go-through the major hurdle of multiple LMs and class-based LMs. • Need more time to make it more stable. • Expected internal release time : March 8, 2004

Sphinx 3.4 and CALO • Which pieces are missing? • Sphinx 3.4’s decoding is still not streamlined => Continuous Listening is not yet enabled. • Sphinx’s speed may still not be ideal. • From s3 to s3.3, ~10% degradation. • Sphinx 3.4 doesn’t learn from data yet.

Sphinx 3.5. What should we do in next 3 months? • Expected release time (May – June) • Interfaces: • Streamlined front-end and decoding • (?) Portaudio based audio routine. • Speed/Accuracy • Improved lexical tree search • Machine optimization of Gaussian computation. • Combination of multiple recognizers • Learning • Acoustic Model adaptation • (?) Language Model adaptation • (In Phoenix) Better semantic parsing • Resource Acquisition and Load Balancing

Highlight I: Speed/Accuracy • Improved lexical tree search • Current implementation used single lexical tree. • May be desirable to create tree copies. • Machine Optimization of Gaussian Computation • SIMD (Single Implementation Multiple Data) • Require help from assembly language experts. (Jason/Thomas)

Highlight II: Multiple Recognizer Combination and Resource Acquisition • Research by Rong suggests combination of multiple recognizer can improve accuracy • Speed worsen by 100% if we run two recognizers. • An interesting solution: • Computation can be shared by other machines in the meeting. • Inspired by routing implementation. • A very natural solution in meeting scenario because usually only one person will be speaking. • Challenges : Bandwidth and Load Balancing

Highlight III: • Learning • Acoustic Model • Maximum Likely Linear Regression (MLLR) • Will be responsible by Jahanzeb • (?)Language Model • How? • Cached-based LM? • (?)Improved Robust Parsing • Better parsing based on previous command history •  Phoenix’s source code is not easy to trace • Thomas Harris’s implementation may be a good place to start.

Arthur and Jahanzeb’s Proposed Schedule

Cont.

Sphinx 3.4 Development Progress Report in February

Sphinx 3.4 Development Progress Report in February

Presentation Transcript

Progress Report Skills Development

Sphinx

Progress Report: Week 2 February 16, 2006

SPHINX

Atascadero Economic Development Strategic Plan Progress Report February 24, 2009

3 rd Progress Meeting For Sphinx 3.6 Development

Progress Report

Institutional Development Working Group Progress Report

NPPD’s Wind Development Progress Report

Sphinx Recognizer Progress Q2 2004

Progress Report

Progress Presentation of Sphinx 3.6 (2005 Q2)

Progress Report

February 8 th grade progress report

Career Development Services Progress Report

Progress Report of Sphinx in Summer 2004 (July 1 st to Aug 31 st )

PROGRESS REPORT FEBRUARY 2012

3.4 Prenatal Development

Institutional Development Working Group Progress Report

3.4 3.4 MULTIMEDIA APPLICATIONS development

Progress Report in REFMAC

Progress Report: Development of Operations Standards