130 likes | 141 Views
ITCS 6010. Spoken Language Systems: Architecture. Elements of a Spoken Language System. Endpointing Feature extraction Recognition Natural language understanding Dialog management. Endpointing. Dialog Management. Recognition. Natural Language Understanding. Feature Extraction.
E N D
ITCS 6010 Spoken Language Systems: Architecture
Elements of a Spoken Language System • Endpointing • Feature extraction • Recognition • Natural language understanding • Dialog management Endpointing Dialog Management Recognition Natural Language Understanding Feature Extraction
Elements of a Spoken Language System (cont’d) • Endpointing • Detects the beginning and ending of speech • Represents caller’s spoken utterance as wave form
Elements of a Spoken Language System (cont’d) • Feature extraction • Transforms endpoint utterance into sequence of feature vectors • Feature vector – list of numbers that represent measurable characteristics of speech • Characteristics related to energy amounts at varying frequencies
Elements of a Spoken Language System (cont’d) • Recognizer • Determines spoken words using feature vectors • Recognition model • contains all word strings caller can say • Consists of: • Acoustic model • Dictionary • Grammar
Elements of a Spoken Language System (cont’d) • Acoustic model • Internal representation of pronunciation of each basic sound/phoneme • Created by training process • Modeled features are same as those in feature vectors
Elements of a Spoken Language System (cont’d) • Dictionary • List of words and pronunciations • Indicates which acoustic models create a word • Can contain multiple entries/pronunciations for a word Dallas d a l * s Boston b o s t * n economics E k * n A m I k s economics i k * n A m I k s
Elements of a Spoken Language System (cont’d) • Grammar • Definition everything caller can say to system • Includes all possible strings of words and rules that associate meaning to strings • Two types of grammars: • Rule-based grammar – set of explicit rules completely define grammar • Statistical language model (SLM) – statistical grammar created from the probability of word occurrence in given context
Elements of a Spoken Language System (cont’d) • Recognition search • For each word model as defined in grammar: • Defined in dictionary • Has appropriate sequence of acoustic models • Feature vectors compared to word model • Recognition • Comparing of possible models against sequence of feature vectors to find best match
Elements of a Spoken Language System (cont’d) • 3 important features of recognition • Confidence measures • N-best processing • Barge-in
Elements of a Spoken Language System (cont’d) • Confidence measures • Quantitative measure of the recognizer’s confidence it found the right match • Measure of closeness between feature vectors of caller’s utterance to best-matching path • Used by designers in design process • e.g. to determine if explicit confirmation required
Elements of a Spoken Language System (cont’d) • N-best processing • A number of results (best possible matches) returned with their confidence measures • Barge-in • Allows callers to interrupt prompt • Recognizer starts listening at beginning of prompt
Elements of a Spoken Language System (cont’d) • Natural language understanding • Assigns meaning to spoken words • Slots defined for each item of information required • Example • Dialog manager • Determines application’s next step