Jens-Uwe Moller Natural Language Systems Division, Dept. of Computer Science, Univ. of Hamburg

Towards Learning Dialogue Structures from Speech Data and Domain Knowledge: Challenges to Conceptual Clustering using Multiple and Complex Knowledge Source Jens-Uwe Moller Natural Language Systems Division, Dept. of Computer Science, Univ. of Hamburg

Overview • Dialog modeling based on a set of units called dialog act • Dialog acts from theory doesn’t fit with a specific domain • Labeling dialog is time consuming and subjective • learn an application specific dialog acts from speech data using conceptual clustering

The learning task • Learning dialog acts from turns • Unsupervised classification (no prior definition of dialog acts is given) • Hierarchy classification with inspectable classifying rules

Features • Domain knowledge: structure of task, task knowledge represented by goals and plans • Word recognizer: word hypotheses • Prosodic data: Pause & Stress mark important unit • Lexical semantics • Syntax (less important in spoken dialog) • Semantics (larger units of lexical semantics)

COWEB • Symbolic machine learning algorithm • Build a classification tree • Distinction between subnodes are made from a function overall attribute • Support probabilistic data • Support multiple overlapping hierarchies (for ambiguous case) • Can handle multiple entries of one attribute (e.g. stream of words)

COWEB (2) • Learning from simultaneous events • Learn from structure data: Conceptual Graphs. • Learn case descriptions from terminological descriptions • Subsumption = correclation criterion over structured data. e.g. subsumption of individuals to classes

Andrew Pargellis, Eric Fosler-Lussier, Alexandros Potamianos, Chin-Hui LeeDialogue Systems Research Dept., Bell Labs, Lucent Technologies Murray Hill, NJ, USA Metrics for Measuring Domain Independence of Semantic Classes

Introduction • Employ semantic classes (concepts) from another domain • Need to identify domain-independent concepts base on comparison across domain • Domain-independent concepts should occur in similar syntactic (lexical) contexts across domains

Comparing concepts across domains • Concept-comparison method • Concept-projection method

Concept-comparison method • Find the similarity between all pairs of concepts across the two domains • Two concepts are similar if their respective bigram contexts are similar • Use left and right context bigram language models

Kullback-Leibler (KL) distance • Compare how san francisco and newark are used in the Travel domain with how comedies and westerns are used in the Movie domain • Distance between two concepts

Concept-projection method • How well a single concept from one domain is represented in another domain. • How the words comedies and westerns are used in both domains • Useful for identifying the degree of domain-independence for a particular concept.

Result: Concept-comparison

Result: Concept-projection

Concept Example

Semi-Automatic Acquisition of Domain-Specific Semantic Structures Siu K.C., Meng H.M. Human-Computer Communications Laboratory Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong

Grammar induction • Use unannotated corpora • Portable across domain & language • Output grammar has reasonable coverage of within-domain data and reject out-of-domain data • Amenable to interactive refinement by human • Support optional injection of prior knowledge

Spatial clustering • Use kullback-liebler distance. • use left and right context. • Consider word with pre-set minimum occurrence. (set to 5) • use left and right context. Consider word w1, w2 (later be c1, c2) pair-wise for words that have a least pre-set minimum occurrence. (set to 5)

Temporal clustering • Use Mutual Information (MI). • N-highest MI pairs are clustered (N=5 in experiment) • Do spatial clustering and temporal clustering iteratively • Post-process by human

Automatic Concept identification In goal-oriented conversations Ananlada Chotimongkol and Alexander I. Rudnicky Language Technologies Institute Carnegie Mellon University

Concept identification • First step towards the goal of automatically inferring domain ontologies • Goal-oriented human-human conversation has a clear structure • This structure can be used to automatically identify domain topics, e.g. dialog classfication

Clustering algorithm • Hierarchical clustering • Mutual information based • Criterion=minimize the loss of average mutual information • Kullback-Lierbler based • Criterion=word pair with minimum distance

Evaluation metrics • Reference concept from class-based n-gram model • Cluster concept=majority concept • Precision • Recall • Singularity score (SS) • Quality score (QS)

Jens-Uwe Moller Natural Language Systems Division, Dept. of Computer Science, Univ. of Hamburg

Jens-Uwe Moller Natural Language Systems Division, Dept. of Computer Science, Univ. of Hamburg

Presentation Transcript

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Graduate Programs in Dept of Computer Science Univ. of Texas at San Antonio

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Computer Processing of Natural Language

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Jesse M. Heines, Ed.D. Dept. of Computer Science Univ. of Massachusetts Lowell

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Natural Language Processing Lab Dept. of Computer Science and Engineering, Korea Univertity

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications