80 likes | 105 Views
Explore the history and core disciplines of speech recognition, including signal processing, linguistics, and psychology. Learn about the word recognition model and higher-level processors that decode speech into meaningful words. Gain insights into automatic speech recognition's goal and how it relies on various scientific fields.
E N D
Fundamentals of Speech Recognition • Goal • Automatic recognition of speech by machine
Fundamentals of Speech Recognition • Disciplines applied to most of the speech recognition problems: • Signal Processing: the process of extracting relevant information from the speech signal in an efficient and robust manner. • Physics: the science of understanding the relationship between the physical speech signal and physiological mechanisms that produces speech and with which the speech is perceived. • Pattern recognition: is the research area that studies the operation and design of the systems that recognize patterns in data.
Fundamentals of Speech Recognition • Communication and information theory: the methods for detecting the presence of particular speech pattern. • Linguistics: the relationship between sounds (phonology), words in a language (syntax), meaning of spoken words (semantics), and sense derived from the meaning (pragmatics). • Physiology: understanding of the mechanisms within the human central nervous system that account for speech production and perception in human beings.
Fundamentals of Speech Recognition • Computer Science:the study of efficient algorithms for implementing, in S/W and H/W, the various methods used in a practical speech-recognition system. • Psychology:the science of understanding the factors that enable a technology to be used by human beings in practical tasks.
The Paradigm Speech Recognition • Word recognition model: (spoken o/p is recognized) Speech signal is decoded into a series of words that are meaningful according to syntax, semantics, and pragmatics. • Higher-level processor: the meaning of the recognized words is obtained. The processor uses a dynamic knowledge representation to modify the syntax, semantics and the pragmatics according to the context of what it has previously recognized. • The feedback limits the search for valid input sentences from the user. • The system responds to the user in the form of a voice output.