Realtime Speech to Text: A Means to an End

Realtime Speech to Text: A Means to an End Mark J. Golden, CAE Executive Director and Chief Executive Officer National Court Reporters Association LangTech 2008 February 28, 2008

Speech Capture • Three variables • The human element • Skill and other human judgments that go in to creating the input to the text processing system • Technology • The functional requirements of the party for whom the text is being produced • What will the text be used for?

Functionality • Two parameters • Speed • How quickly is text required? • Accuracy • How exact must the translation be? • Interrelationship of these two parameters • E.g.; if some delay in production of the final text is acceptable, there is an opportunity to review and revise in order to correct for any deficiencies in the initial capture and translation.

Realtime • There are many speech capture/text production applications where you need text as soon as possible but not immediately • There are some applications where a comprehensible text is needed immediately, but a rough translation is adequate • There are numerous applications, however, that rely upon true realtime • Instant text production, with no opportunity to revise or correct • High degree of accuracy

High-Tech Courtroom • Reduce costs • Improve efficiency • Better customer service • Secure data

Litigation Support • Immediate access to the record • Annotation • Highlights • Search • Instant message • Complete multimedia record

Evidence Presentation • Realtime text hyperlink to multimedia record • Immediate display of evidence and other documentation

Virtual Justice • Complete access to multimedia record • Remote video participation by parties or witnesses – from anywhere in the world

Communication Access • Full and effective participation • Complete understanding

Stenographic Realtime • Only current method for high accuracy, immediate voice-to-text translation • Multiple speakers • Near perfect accuracy at high speeds • Serves as foundation to the applications previously described

The Question of Accuracy

Realtime Accuracy: 100 Percent The hurricane, with winds exceeding 150 miles per hour, is expected to hit the coast by 10 in the morning. Local government officials have ordered all coastal residents to leave their homes and move farther inland. More to come later.

Realtime Accuracy: 90 Percent The hurricane, with winds exceeding 150 miles per hour, is expected to hit the coast by TEPB in the morning. Local government officials have ORD/-D all coastal residents to HRAOEFB their homes and PHOFB farther inland. More to come later.

Realtime Accuracy: 80 Percent The hurricane, with winds exceeding 150 miles per hour, is EBGS/PEBGT/-D to HEUT the coast by TEPB in the morning. Local government officials have ORD/-D all coastal RES/TKEPBTS to HRAOEFB their homes and PHOFB farther EUPB/HRAPBD. More to come later.

Mistranslates: Steno and Voice • How do you justify stopping the defendant's vehicle? • How do you just if I stopping the defendant's vehicle? (voice) • How do you just if he stopping the defendant's vehicle? (steno) • Do you recognize this document I've just marked as Exhibit 1? • Do you recognize the stockman I've just marked as it's a bit 1? (voice) • Do you recognize this doctor I've just marked as Kent 1? (steno)

Voice Capture ≠ Instant Translation Appropriate Technology + Competent Reporter ↓ High Accuracy and Immediate Translation ↓ Full, Complete and Instant Understanding

Thank You! Questions and Answers

Realtime Speech to Text: A Means to an End

Realtime Speech to Text: A Means to an End

Presentation Transcript

pliq.me mobile speech-to-text recognition service (russian)

Free Speech/1 st Amendment

Text to Speech Systems (TTS) EE 516 Spring 2009

Speech Recognition

Reconstructing Spontaneous Speech

Means-End Theory

Parts of Speech

Occupational and Speech Therapy: Treating children with ASD

Why Inner Speech?

Understanding Text Structures

Parts of Speech

Laryngeal Function and Speech Production

Text Features

A Tutorial on Bayesian Speech Feature Enhancement

Language models for speech recognition Bhiksha Raj and Rita Singh

Text to Speech Systems (TTS) EE 516 Spring 2009

Clear and present danger (test)

Feature Extraction for speech applications

Postilion User Training

Letter to a B Student

Conditional Random Fields for Automatic Speech Recognition

Making a Living