1 / 14

Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul

Update on WordWave Fisher Transcription. Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul. Outline. Schedule update Investigating WordWave + auto segmentation quality Updated evaluation method

fritzi
Download Presentation

Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Update on WordWave Fisher Transcription Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul

  2. Outline • Schedule update • Investigating WordWave + auto segmentation quality • Updated evaluation method • Separating effect of transcripts and segmentation • Improved segmentation algorithm • Plans • Update on using Fisher data in Training

  3. Data Schedule • BBN has received 925 hours from WordWave (WWave) • Processed and released 478 hours via LDC • 91 hrs on 8/1/03 • 300 on 9/24/03 • 87 on 10/21/03 • WWave is currently running more slowly than planned • Reason: CTS transcription is hard! • They will complete 1600 hrs by the end of Jan 04, with remaining 200 hrs to follow as quickly as possible.

  4. Segmentation Quality as of Sept 03 • Auto segmentation goals: Given audio and transcript and no timing info, break into fairly short segments and align correct text to each segment • In September, we compared transcription and segmentation approaches on a 20 hour Swbd set: • LDC/MSU careful transcription and manual segmentation vs. • LDC fast transcription and manual segmentation vs. • WWave transcripts + BBN automatic segmentation. • Compared 2 different segmentation algorithms • Alg I: run recognizer and segment at “reliable” silences; decode using segmentation and reject based on sclite alignment errors • Alg II: use recognizer to get coarse initial segmentation; then forced alignment within coarse segs to find finer segs; final rejection pass as before.

  5. Performance Comparison in Sept • Unadapted recognition; acoustic models trained with 20-hour Swbd1 set, LM trained on full Switchboard • ML, GI, VTL, HLDA-trained models

  6. Improving the Evaluation Method • There were a number of issues and shortcuts in the training and test, that clouded comparisons. • We therefore • Adopted improved training sequence, including new binaries • Reduced pruning errors in decode • Converted from fast approximate VTL length estimation to more careful approach • Adopted more stable VTL models • VTL models trained on 20 hours differed dramatically for small changes in segmentation • This is a bug in our VTL model estimation that we need to fix • For following experiments used stable VTL models from RT03 eval • Switched from our historic LDC+MSU baseline to all MSU for simplicity.

  7. Comparison with Better Train and Test

  8. Separating Effect of Segmentation • Compare segmentations using identical (MSU) transcripts • Alg I WER same for WWave vs MSU transcripts • Segmentation may be biggest/only problem.

  9. Segmentation Algorithm III • Algorithm II used forced alignment within coarse segments provided by initial pass of recognition, but examination revealed unrecoverable errors (words in wrong segment) from coarse initial seg. • Tried forced alignment of complete conversation sides • Overcame initial problems of failed alignments by • Pre-chopping out long silences, where our system tends to get confused • Used auto-segmenter developed for RT03 CTS eval for this • Changing forced alignment program to do much less pruning at begin and end of conversation • This accommodated things like beeps, line noise, and words cut off by recording start and stop • Forced alignment is followed by script that breaks segments at silences, then rejection pass

  10. Algorithm III with MSU transcripts • Manually comparing MSU and Alg III showed that Alg III: • had more, shorter segments • had less silence padding around utterances • allowed utterances > 15 seconds when speaker did not pause • Modified Alg III to approximate MSU’s statistics

  11. Improved Algorithm III • Matching MSU’s utterance lengths and silence improves WER slightly • Alg III seems good enough, at least for this task

  12. Results with WordWave Transcripts • WWave transcripts seem fine given improved seg

  13. Plans • Confirm quality of WWave with Alg III seg • On Swbd 20 hour set, train MMI models to compare all-MSU vs. WWave/Alg III • On Swbd + 150 hour Fisher experiment, where we got gains using Alg I segmented data. • Performance should not degrade • Improve speed of Alg III • Resegment and redistribute all data that has been released so far • Catch up with and continue segmenting latest WWave transcript deliveries.

  14. Update on Adding Fisher Data • In Martigny, showed 1.4% gain for adding 150 hrs Fisher data (Alg I segmented) to RT03 training • Hoped to have results with 350 hours but we had bugs in our initial runs. • Did train MMI on RT03 (sw370) vs RT03+Fisher150 • Results on 2nd adaptation pass with POS LM rescoring • CAVEAT: non-rigorous comparison! Fisher150 system optimized (gains 0.1-0.2% gain); used diff phone set & faster training (degrades 0.2% in other comparisons).

More Related