1.06k likes | 1.21k Views
Spoken Language Recognition and Understanding. Renato De Mori Mc Gill University University of Avignon. LUNA IST contract no 33549. IEEE DL 2009 Athens, october 17th 2009. 1. Summary.
E N D
Spoken Language Recognition and Understanding Renato De Mori Mc Gill University University of Avignon LUNA IST contract no 33549 IEEE DL 2009 Athens, october 17th 2009 1
Summary • Short discussion on automatic language recognition problems • - Automatic speech recognition in a working system • Interpretation process • Full / shallow parsing • Generative vs discriminative models models • Semantic composition • Confidence and learning • Examples of working systems
Dialog system architecture acoustic language semantic dialog synthesis models RECOGNITION input UNDERSTANDING and DIALOG answer ANSWER GENERATION
The channel model Considering the following coding scheme : the objective of recognition is to reconstruct W based on the observation of A. The source and the channel contain KSs. The source generates a variety of sequences W with a given probability distribution. The channel, for a given W, generates a variety of A with a given probability distribution.
Decoding as search Search for a sequence of words W such that computed by the language model LM computed by the acoustic model AM W : sequence of hypothesized words A: acoustic evidence
word hypothesis generation sequence of acoustic descriptors text corpus signal corpus HYPOTHESIS GENERATION WORDS training training INTEGRATED NETWORK language model acoustic models lexical models
P(W1/W1) W1 P(W2/W1) P(W1) P(W3/W1) P(W1/W2) P(W2) W2 P(W3) P(W2/W2) P(W1/W3) P(W3/W2) P(W2/W3) W3 P(W3/W3)
left to right models with skips and emplty transitions gij(a) is the probability of observing a during transition i j Pii + Pij = 1 gii(a) Pii gij(a) Sj Si Pij
Mixture densities If the components ofa are statistically independent, then Pii Pij
Major problems Impressive performance but not robust enough. Some reasons: acoustic variability noise disfluency pronunciation variability language variety reduced feature size modeling limits non-admissible search
Demo LUNAVIZ OK 340.1 HS 267 3 267 4 DISF 298 1 COM 238 6 COM 238 4
Possible applications Constrained environments (broadcast transcriptions) Partial automation (telephone services) Redundant messages (spoken surveys) Specific content extraction (understanding and dialog)
Command And Control (e.g., Simple call Routing; VRCP; Voice dialing) Prompt Constrained Natural Language (e.g., Travel Reservations, Finance, Directory asst) Free-form Natural Language Dialogue (Customer Care, Help Desks, E-Commerce) Larger vocabulary, Hand-built grammars Very large vocabulary, NL, DM, TTS Simple ASR; isolated words, connected digits Interaction Complexity(human-machine) Complexity, Functionality, Technology 1990 2009
Interpretation problem decomposition Speechsigns meaning 1-best, n-best, lattices Acoustic features wordsconstituentsstructures features for interpretation Problem reduction representation is context-sensitive Interpretation is a composite decision process. Many decompositions are possible involving a variety of methods and KSs, suggesting to consider a modular approach to process design. Robustness is obtained by evaluation and possible integration of different KSs and methods used for the same sub-task.
Understanding process overview learning Long Term Memory : AM LM interpretation KSs speech speech to conceptual structures and MRL signs words concept tags concept structures MRL description Short Term Memory dialogue
Interpretation as a translation process Interpretation of written text can be seen as a process that uses procedures for translating a sequence of words in natural language into a set of semantic hypotheses (just constituents or structures) described by a semantic language. W:[S[VP [V give, PR me] NP [ART a, N restaurant] PP[PREP near, NP [N Montparnasse, N station]]]] G:[Action REQUEST ([Thing RESTAURANT], [Path NEAR ([Place IN ([Thing MONTPARNASSE])])]] Interesting discussion in (Jackendoff, 1990) Each major syntactic constituent of a sentence maps into a conceptual constituent, but the inverse is not true.
Semantic building actions in full parsing parsing Use tree kernel methods for learningargument matching (Moschitti, Raymond, Riccardi, ASRU 2007) Not robust enough if WER is high
Predicate/argument structures and parsers Recently, classifiers were proposed for detecting concepts and roles. Such detection process was integrated with a stochastic parser (e.g. Charniak 2001). A solution using this parser and tree-kernel based classifiers for predicate argument detection in SLU is proposed in (Moschitti et al. ASRU 2007). Other relevant contributions on stochastic semantic parsing can be found in (Goddeau and Zue. 1992, . Goodman. 1996,. Chelba and Jelinek, 2000,. Roark, 2002, Collins, 2003) Lattice-based parsers are reviewed in (Hall, 2005)
Org. Dest. else Date Probabilistic interpretation in the Chronous system The probability P(CW) is computed using Markov models as P(CW)=P(W|C)P(C) (Pieraccini et al., 1991, Pieraccini, E. Levin, E. Vidal, 1993).
City? no yes from City? yes no Origin to City? no yes Dest. Semantic Classification trees (Kuhn and De Mori, 1995)
Conceptual Language Models FSTs This architecture is used also for separating in domain from out domain message segments (Damnati, 2007) and for spoken opinion analysis (Camelin et al., 2006). The whole ASR knowledge models in this way a relation between signal features and meaning.
Hypothesis generation from lattices An initial ASR activity generates a word graph (WG) of scored word hypotheses with a generic LM. The network is composed with WG resulting in the assignment of semantic tags to paths in WG
Word Graph autour de vingt euros un e du Word graphWG Trocadéro
autour/NEAR /RANGE euros/<PRICE> de/e vingt/ e <b(PRICE)> <b(PLACE)> un/ e du/ e Trocadéro/<PLACE> e / e Trocadéro/e Composition
Projection In order to obtain the conceptual hypotheses that are more likely to be expressed by the analyzed utterance, SEMG is projected on its outputs (the second elements of the pairs associated to transitions) leading to a weighted Finite State Machine (FSM) with only indicators of beginning and end words of semantic tags. The resulting FSM is then made deterministic and minimized leading to an FSM SWG give by SWG=OUTPROJ(SEMG) Operations on SFSTs are performed using the ATT library (Mohri et al.) The objective is to have the correct interpretation as the most likely among the hypotheses coherent with the dialogue predictions.
Demo LUNAVIZ OK 340.1 Best path avoids insertions
CRF Possibility of having features from long-term dependences Results for LUNA from Riccardi, Raymond, Ney, Hann
Results FRENCH MEDIA corpus 3000 test dialog turns Telephone speech Vocabulary 2K WER 27.4%
Computational semantics Epistemology, the science of knowledge, considers a datum as basic unit. Semantics deals with the organization of meanings and the relations between sensory signs or symbols and what they denote or mean. Computer epistemology deals with observable facts and their representation in a computer. Natural language interpretation by computers performs a conceptualization of the world using computational processes for composing a meaning representation structure from available signs and their features.
Frames as computational structures (intension) A frame scheme with defining properties represents types of conceptual structures (intension) as well as instances of them (extension). Relations with signs can be established by attached procedures (S. Young et al., 1989). {address loc TOWN (facet) ……attached procedures area DEPARTMENT OR PROVINCE OR STATE ……attached procedures country NATION ……attached procedures street NUMBER AND NAME ……attached procedures zip ORDINAL NUMBER ……attached procedures }
Frame instances (extension) A convenient way for asserting properties, and reasoning about semantic knowledge is to represent it as a set of logic formulas. A frame instance (extension) can be obtained from predicates that are related and composed into a computational structure. Frame schemata can be derived from knowledge obtained by applying semantic theories. Interesting theories can be found, for example in (Jackendoff, 1990, 2002) or in (Brackman 1978, reviewed by Woods 1985)
Frame instance Schemata contain collections of properties and values expressing relations. A property or a role are represented by a slot filled by a value {a0001 instance_of address loc Avignon area Vaucluse country France street 1, avenue Pascal zip 84000 }
Composition process 1 Step 1 – concept tags -> frame instance fragments Concept tag : hotel-parking, Fragment : HOTEL.[hotel_facilities FACILITY.[facility_type parking]]
Composition process 2 Step 2 – composition by fusion HOTEL.[luxury four_stars] HOTEL.[h_facility FACILITY.[hotel_service tennis]] Fusion rule result HOTEL. [luxury four_stars, h_facility FACILITY.[hotel_service tennis]]
Composition process 3 Step 3– Composition by attachment new fragment ADDRESS.[adr_city Lyon] Composition result: HOTEL. [luxury four_stars, h_facility FACILITY.[hotel_service tennis], at_loc ADDRESS.[adr_city Lyon]]
Demo LUNAVIZ OK 340 1
Confidence Evaluate confidence of components and compositions represents the confidence indicators or a function of them. Notice that it is difficult to compare competing interpretation hypotheses based on the probability where Y is a time sequence of acoustic features, because different semantic constituents may have been hypothesized on different time segments of stream Y.
Probabilistic frame based systems In probabilistic frame-based systems, (Koller 1998 ) a frame slot S of a frame F is associated a facet Q with value Y: Q(F,S,Y). A probability model is part of a facet as it represents a restriction on the values Y. It is possible to have a probability model for a slot value which depends on a slot chain. It is also possible to inherit probability models from classes to subclasses, to use probability models in multiple instances and to have probability distributions representing structural uncertainty about a set of entities.
Dependency graph with cycles Acoustic_evidence support concept filled-slot If the dependence graph has cycles, then possible worlds can be considered. The computation of probabilities of possible worlds is discussed in (Nilsson, 1986). A general method for computing probabilities of possible worlds based on Markov logic networks (MLN) is proposed in (Richardson, 2006).
Probabilistic models of relational data Relational Markov Networks (RMN) are a generalization of CRFs that allow for collective classification of a set of related entities by integrating information from features of individual entities as well as relations between them (Taskar et al., 2002). Methods for probabilistic logic learning are reviewed in (De Raedt, 2003).
Define confidence-related situations Consensus among classifiers and SFST is used to produce confidence indicators in a sequential interpretation strategy (Raymond et al. 2005, 2007). Classifiers used are SCT, SVM, adaboost. Committee-Based Active Learning uses multiple classifiers to select samples (Seung et al. 1992) Fusion strategy FSM SCT SVM adaboost
Confidence measures Two basic steps: 1) generate as many features as possible based on the speech recognition and/or natural language understanding process and 2) Estimate correctness probabilities with these features, using a combination model.
Features for confidence Many features are based on empirical considerations: semantic weights assigned to words, uncovered word percentage, gap number, slot number, word, word-pair and word-triplet occurrence counts,
Adaptive Learning in Practice (Riccardi Tur Hakkani-Tur, 2005)