200 likes | 329 Views
Semantic annotation of a dialog corpus. Silvie Cinková Institute of Formal and Applied Linguistics Charles University in Prague , Czech Republic COMPANIONS ( www.companions-project.org )
E N D
Semantic annotationof a dialog corpus Silvie Cinková Institute of Formal and Applied Linguistics Charles University in Prague, Czech Republic COMPANIONS (www.companions-project.org) European Commission Sixth Framework Programme Information Society Technologies Integrated Project IST-34434
Data for machine learning • audio-synchronized transcription • linguistic annotation • Charles University (Czech Republic) • Napier University (Edinburgh, UK) • University of Sheffield (UK) • Oxford University (UK)
Functional Generative Description • formal language description • Prague structuralism + computational ling. • since 1960's • stratifies language • phonology • morphology • surface syntax • underlying syntax (tectogrammatics) • transition between syntax and semantics • a "poor men's interlingua"
Tectogrammatical representation "Underlying syntax" • linguistic meaning • syntactic and semantic relations parent-child node(s) • valency • ellipsis restoration • coreference across sentence boundaries • information structure (TFA) • synonymous function identical representation
Tectogrammatical representation Is that Jess on the left?
Tectogrammatical representation • ellipsis restoration • coreference Yes it is, laughing.
written Prague Dependency Treebank Czech newspapers 800 k words manually LDC 2006 Wall Street Journal in progress, 15% so far monolog reporting standard language spoken dialogs real time interaction clause fragments exophora, deixis (syntax deviations) and challenges Current...
Non-sentential utterances (NSU) • phrases (NP, PP, ADVP, ADJP) • Me. • At 5 o'clock. • Blue. • interjections • Mhm. • Oh, no! • interjections attached to phrases • No, Billy. • Oh, sure. • subordinate clause without main clause • If he goes with me. • Skiing. • phrase combinations in coordination or apposition • With Mary in the morning or shopping at Tesco. • Or without.
Utterance-response pair response NSU utterance U "Who's that?" "Peggy." UPred UMods Functors (semantic labels)
Utterance-response pair Who's that? [Peggy.] ("That is Peggy"). Peggy.
Predicate with interjections Mhm. Yes. No, Billy.
NSUMods versus UMods • attribute: response_type • values: • overrules • bridging • wh-path • other • form: reference (arrow) to antecedent node
Non-conflicting Modifier addition [It will be] probably not [worth getting]. Yes [I brought the book].
Overruling I'm at a little place called Ellenthorpe. Hellenthorpe.
Bridging There are only two people in the class. Two students?
Wh-path A: "Who's that?" B: "Peggy."
Wh-path - different functor matches • up to the annotator • we expect regular alternation patterns Where would you like to go tomorrow? Shopping with Mary.
Other A: He entered the largest room. B: Room 128? A: I don't know the number.
Summary • U-NSU pairs • NSU inherits the predicate of U (coreference) • NSU inherits all modifiers of U • NSU's own modifiers overrule the inherited • overrule • bridging • wh-path • other
References • Raquel Fernández, Jonathan Ginzburg, and Shalom Lappin (2007): Classifying Non-Sentential Utterances in Dialogue: A Machine Learning Approach. Computational Linguistics, Volume 33, Nr. 3. MIT Press for the Association for Computational Linguistics • Eva Hajičová (ed) (1995): Text-And-Inference-Based Approach to Question Answering, Prague, 1995