230 likes | 366 Views
Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source . Jens-Uwe Moller Natural Language Systems Division,
Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source Jens-Uwe Moller Natural Language Systems Division, Dept. of Computer Science, Univ. of Hamburg
Overview • Dialog modeling based on a set of units called dialog act • Dialog acts from theory doesn’t fit with a specific domain • Labeling dialog is time consuming and subjective • learn an application specific dialog acts from speech data using conceptual clustering
The learning task • Learning dialog acts from turns • Unsupervised classification (no prior definition of dialog acts is given) • Hierarchy classification with inspectable classifying rules
Features • Domain knowledge: structure of task, task knowledge represented by goals and plans • Word recognizer: word hypotheses • Prosodic data: Pause & Stress mark important unit • Lexical semantics • Syntax (less important in spoken dialog) • Semantics (larger units of lexical semantics)
COWEB • Symbolic machine learning algorithm • Build a classification tree • Distinction between subnodes are made from a function overall attribute • Support probabilistic data • Support multiple overlapping hierarchies (for ambiguous case) • Can handle multiple entries of one attribute (e.g. stream of words)
COWEB (2) • Learning from simultaneous events • Learn from structure data: Conceptual Graphs. • Learn case descriptions from terminological descriptions • Subsumption = correclation criterion over structured data. e.g. subsumption of individuals to classes
Andrew Pargellis, Eric Fosler-Lussier, Alexandros Potamianos, Chin-Hui LeeDialogue Systems Research Dept., Bell Labs, Lucent Technologies Murray Hill, NJ, USA Metrics for Measuring Domain Independence of Semantic Classes
Introduction • Employ semantic classes (concepts) from another domain • Need to identify domain-independent concepts base on comparison across domain • Domain-independent concepts should occur in similar syntactic (lexical) contexts across domains
Comparing concepts across domains • Concept-comparison method • Concept-projection method
Concept-comparison method • Find the similarity between all pairs of concepts across the two domains • Two concepts are similar if their respective bigram contexts are similar • Use left and right context bigram language models
Kullback-Leibler (KL) distance • Compare how san francisco and newark are used in the Travel domain with how comedies and westerns are used in the Movie domain • Distance between two concepts
Concept-projection method • How well a single concept from one domain is represented in another domain. • How the words comedies and westerns are used in both domains • Useful for identifying the degree of domain-independence for a particular concept.
Semi-Automatic Acquisition of Domain-Specific Semantic Structures Siu K.C., Meng H.M. Human-Computer Communications Laboratory Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong
Grammar induction • Use unannotated corpora • Portable across domain & language • Output grammar has reasonable coverage of within-domain data and reject out-of-domain data • Amenable to interactive refinement by human • Support optional injection of prior knowledge
Spatial clustering • Use kullback-liebler distance. • use left and right context. • Consider word with pre-set minimum occurrence. (set to 5) • use left and right context. Consider word w1, w2 (later be c1, c2) pair-wise for words that have a least pre-set minimum occurrence. (set to 5)
Temporal clustering • Use Mutual Information (MI). • N-highest MI pairs are clustered (N=5 in experiment) • Do spatial clustering and temporal clustering iteratively • Post-process by human
Automatic Concept identification In goal-oriented conversations Ananlada Chotimongkol and Alexander I. Rudnicky Language Technologies Institute Carnegie Mellon University
Concept identification • First step towards the goal of automatically inferring domain ontologies • Goal-oriented human-human conversation has a clear structure • This structure can be used to automatically identify domain topics, e.g. dialog classfication
Clustering algorithm • Hierarchical clustering • Mutual information based • Criterion=minimize the loss of average mutual information • Kullback-Lierbler based • Criterion=word pair with minimum distance
Evaluation metrics • Reference concept from class-based n-gram model • Cluster concept=majority concept • Precision • Recall • Singularity score (SS) • Quality score (QS)