190 likes | 285 Views
Technical Aspects of the CALO Recorder. By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky. Role of the CALO recorder. A centralized mechanism to collect all perceptual events. Speech, Text CMU provides technology on
E N D
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky
Role of the CALO recorder • A centralized mechanism to collect all perceptual events. • Speech, Text • CMU provides technology on • On Event Recording • On Speech Recognition
Role of the CALO Recorder • One of the component of CAMPER • The four: • CALO recorder • Speechalizer • End-pointing Information • Prosodic Information • Speech Recognition • CAMSeg • Speech Segmentation • Understanding
An Architecture Diagram (Client Side) Audio Capturing Text Capturing through Keyboard Other Events Ring Buffers End-Pointer VU Meter Speech Decoder Storage
Persistence of Data • Background Intelligent Transfer System (BITS) • Use to transfer data off-line
Technical Challenges in the Recorder • Threading • Audio Buffering • Time-synchronization • Real-time processing • End-pointing • Speech processing • Portability • Maintenance/Distribution
Threading • Several processing needs to be concurrently • VU meter • Speech Processing and Higher-level Understanding • Graphical User Interface • Long development time was invested to make the communication between to be correct. • (By Thomas Quisel) See Architecture Diagram next slides • Example Issues: In some platforms, WX implementation will make GUI thread disallow other threads to call its drawing functions.
Audio Buffering • Sphinx 2, 3.X libaudio require, • Capture audio • Do processing on the audio buffer. • If the processing thread is slightly slower than 1xRT • Audio will be lost • (By Jason Cohen) A ring buffer structure is implemented.
Time Synchronization • By David Huggins • Simple NTP (SNTP) is used in getting universal time coordinate (UTC) from arbitrary NTP server • Clone of standard NTP implementation • Internal Synchronization • Synchronization time between machines • 50-60ms • Major challenge is the delay imposed by OS/audio capturing software.
Real-time Processing • Role of End-pointing and Recognition • After long-time debate • Two stage end-pointing and recognition architecture is chosen • By Ziad • High performance end-pointing routine is created • Gaussian Mixture Model-based • End-pointer implemented as a frames voter within segments • The parameters are further manually tuned. • Speed optimized. • Now in s3ep, a customized version of Sphinx
Speech Recognizer • Resulting output is fed to the recognizer • Speech Recognition in meeting • Regards as one of the biggest challenge in the field • Results largely varied from meeting style, number of attendants, topics, disfluencies of the speakers.
Accuracy Performance, still under heavy work, Currently…… • In the cleanest meeting (Bdb001) • With one very dominating male speaker • With one very dominating female speaker • Speaker speaking rate entropy is lowest • Error rate 29.4%
Phase IV of Accuracy Improvement (Core) • Boosting-based training • Confidence-based N-best re-ranking • Speaker adaptation based on transformation • Speaker normalization • Include BN , SWB material in LM training • Dictionary Refinement
Phase IV of Accuracy Improvement (Optional) • STC • MLLT • DT • PLP, TRAP • LM with disfluencies and back-channeling
Speed • 2.2G machine • Communicator • S2, 17.3%, 0.34xRT • S3.X BL 11.8%, 4xRT • S3.X Tuned 12.8, 0.87xRT • WSJ 5k • S3.X BL 7.4% 1.61xRT • S3.X BL 8.3% 0.5xRT • ICSI • With tuning SVQ and CIGMMS, 0.7xRT is achieved. • We may possibly tune up the results. • Benchmarking results need time to prepared
Maintenance and Distribution • All in local CVS • C, Java • Will soon move to SRI • Regular release is created, usage of SRI’s CVS will blur this line.
Conclusion • Engineering work is mostly done for the recorder • Time to improve individual components. • Everyone is welcomed to join the effort.