240 likes | 746 Views
Speech Recognition and its clinical applications. Thankam Thyvalikakath, MDS Center for Biomedical Informatics University of Pittsburgh. Outline. In-class assignment Background SpeechActs paper Clinical application of speech recognition Speech recognition in dentistry.
E N D
Speech Recognition and its clinical applications Thankam Thyvalikakath, MDS Center for Biomedical Informatics University of Pittsburgh
Outline • In-class assignment • Background • SpeechActs paper • Clinical application of speech recognition • Speech recognition in dentistry
Speech recognition ? Speech Recognition are technologies of particular interest, for their support of direct communication between humans and computers, through a communications mode, humans commonly use among themselves and at which they are highly skilled. Rudnicky, Hauptman, and Lee http://starbase.cs.trincoll.edu/~ram/cpsc352/
What was the first success story of speech recognition? “Radio Rex” in the 1920’s, was the first success story in the field of speech recognition www.stanford.edu/class/linguist236/lec1.pdf
Timeline of Speech recognition • 1936 - AT & T’s Bell labs started study of speech recognition (funded by DARPA) • 1974 - optical character recognition • 1975 – text to speech synthesis ( Kurzweil reading machine) • 1978 – speak and spell toy released by Texas Instruments • 1980 – Xerox started producing reading machine Text bridge • 1997 – Dragon Systems produces first continuous speech recognition product http://starbase.cs.trincoll.edu
How speech recognition evolved? acoustic approach (pre - 1960’s) pattern recognition approach (1960’s) linguistic approach (1970’s) pragmatic approach (1980's)
Types of speech recognition • Isolated words • Connected words • Continuous speech • Spontaneous speech (automatic speech recognition) • Voice verification and identification Fundamentals of Speech Recognition". L. Rabiner & B. Juang. 1993
Speech recognition – uses and applications • Dictation • Command and control • Telephony • Medical/disabilities Fundamentals of Speech Recognition". L. Rabiner & B. Juang. 1993
Challenges of speech recognition • Ease of use • Robust performance • Automatic learning of new words and sounds • Grammar for spoken language • Control of synthesized voice quality • Integrated learning for speech recognition and synthesis B.S Atal. Speech recognition in 2001: New research directions Proc.Natl.Acad.Sci USA Vol 92, pp 10046-100551Oct1995
SpeechActs SpeechActs is a prototype testbed for developing spoken natural language applications Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
Why develop SpeechActs? • Integrated conversational applications • No specialized language expertise • Technology independence Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
Information flow in SpeechActs Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
SpeechActs - Framework • Audio server presents raw digitized audio to speech recognizer • Swiftus parses the word list to produce a set of feature-value pairs • Discourse manager maintains a stack of information about the current conversation • Discourse manager and application respond to the user by sending a text string to ‘text to speech manager’ Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
SpeechActs: A Spoken Language Framework • Continuous-speech recognizers require grammars that specify every possible utterance a user could say to the application • The recognizer grammar should closely synchronize with the Swiftus semantic grammar • Solved by inventing Unified Grammar Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
Unified grammar • Collection of rules • Made of a pattern such as Backus-Naur Form followed by augmentations which are statement written in the Pascal-like form • Compiler that produces a grammar specific to speech recognizer and corresponding Swiftus grammar Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
Swiftus – the natural language processor • Semantic representation generated in real time to facilitate conversation • Accurate understanding • Tolerance of misrecognized words • Wide variation among applications • Ease of use Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
Swiftus performance - Solved Swiftus was designed by using coarse keyword matching and full, in-depth semantic analysis Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
Discourse management • To support more natural speech , we need at least rudimentary discourse management • Should support discourse-segment pushing and popping • Prompt design • Error-correcting mechanism Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
Discourse manager • discourse represented as a data structure consisting of functions for handling user output • maintains a stack of these structures, and the top one handles the default discourse for the current application or current dialogue • current application or dialogue popped off the stack when the user cancels the activity or the problem is resolved • keeps a simple stack of referenced items to a avoid entering into a subdialogue Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
To simulate human conversation…. • conversational pacing • explicit error corrections • define the functional boundaries of an application Paul Martin, Fredrick Crabbe, Stuart Adams, Eric Baatz, Nicole Yankelovich. SpeechActs: A Spoken Language Framework, IEEE Computer, Vol. 29, Number 7, July 1996.
Clinical applications • Medical transcription mainly in radiology and pathology • First use of speech recognition in the field of radiology in 1981 • Mean accuracy rate of reading pathology reports, using IBM Via Voice Pro software – 93.6% compared to human transcription at 99.6% M. Al.Aynati, K.Chomeyko Comparison of Voice-automated Transcription and Human Transcription in General Pathology ReportsArch Pathol Lab Med. 2003;127:721–725)
Speech recognition in clinical dentistry? • 13% used voice recognition • 16% discontinued using voice recognition • 21% believed chairside computer use could be improved with better voice recognition • Using an automatic speech recognition will be the way to go!! T. Schleyer et al (unpublished data) Chairside Computer Use in Clinical Dentistry