Computational Discourse Analysis in Natural Language Processing

CS3730/ISP3120Corpus Annotation

Manual Annotation • AKA coding, labeling

From Webber’s Chapter • The aims of computational work in [discourse and dialog]: • Modeling particular phenomena in discourse and dialog in terms of underlying computational processes • Providing useful natural language services, whose success depends in part on handling aspects of discourse and dialog • What computation contributes is a coherent framework for modeling these phenomena in terms of … search through a space of possible candidate interpretations (in language analysis) or candidate realizations (in language generation)

Desiderata • Interesting and rich enough • Not so rich that automation is too far ahead of the current state of the art • Too complex logical structure • Knowledge bottleneck (without viable source) • Too fine-grained or subtle • Annotation instructions (aka coding manual) feasible • Time required for training is reasonable • Annotators can reliably perform the annotations in a reasonable amount of time

Minimal Process for NLP • Develop initial coding manual • At least two people perform sample annotations, and discuss their disagreements and experiences • Revise coding manual • Repeat 2-3 until agreement on training data is sufficient • Independently annotate a fresh test set • Evaluate agreement

Additional Steps • Develop initial coding manual • At least two people perform sample annotations, and discuss their disagreements and experiences. Analysis of patterns of agreement and disagreement using probability models (Wiebe et al. ACL-99; Bruce & Wiebe NLE-99; from work in applied statistics) • Revise coding manual • Repeat 2-3 until agreement on training data is sufficient • Independently annotate a fresh test set • Evaluate agreement • Train more annotators, assess average time for training and annotation • Evaluate other types of reliability (psychology, content analysis, applied statistics literatures)

Measures of Agreement • Percentage Agreement: OK, but not sufficient • If the distribution of classes is highly skewed, then the baseline algorithm of always assigning the most frequent class would have high agreement • Kappa: measures agreement over and above agreement expected by chance • Details available in section 3 of this paper by our group

Computational Discourse Analysis in Natural Language Processing