80 likes | 259 Views
Columbia CCLS: Committed Belief and Dialog Acts. Mona Diab Rebecca Passonneau Owen Rambow {mdiab,becky,rambow}@ccls.columbia.edu. Columbia CCLS Activities: Committed Belief. Identify in text what the writer (speaker) actually believes is true
E N D
Columbia CCLS:Committed Belief and Dialog Acts Mona Diab Rebecca Passonneau Owen Rambow {mdiab,becky,rambow}@ccls.columbia.edu
Columbia CCLS Activities:Committed Belief • Identify in text what the writer (speaker) actually believes is true • “The Marines attacked rebels in the South” – committed belief: the writer believes this is true • “An Iraqi government spokesman said that the Marines attacked rebels in the South” – non-committed belief: the writer could believe this, but does not indicate that s/he believes it to be true • “I have demanded for days that the Marines attack rebels in the South” – not applicable: this is not something that the writer could believe to be true (in this case, because it is a desired state) • Team leader: Mona Diab, with input form Lori Levin (CMU) and Owen Rambow • 1 Summer Student
Columbia CCLS Activities:Dialog Acts • Identify dialog acts in spoken and written dialog • Extend existing work by linking dialog acts across greater distance • Extend existing work by annotating complex dialogs including written interactions (email), in which many dialog acts happen in parallel • Address issue that a single turn (=coherent utterance in spoken dialog, or a single email) may perform several dialog acts; identify Dialog Function Units (DFU) • Team leaders: Becky Passonneau and Owen Rambow • 1 Masters student (partially funded by TTO3)
Committed Belief: Accomplishments • Manual Annotation • Completed double annotation for the basic document collection in English (August 31st, 2008) • Updated Manual based on observations by annotators (August 31st 2008) • Automatic Annotation • Created a preliminary supervised system for the prediction of committed belief (SPCB) • For SPCB, We experimented with the original annotations from April (new annotations are much cleaner, will be able to give inter annotator agreement later — mid september) • SPCB is an IOB sequence model using YAMCHA SVMs • Split the data 80/10/10, train, test, dev • Features: character ngrams, lemma as obtained using lingo, context size, POS tag
Committed Belief: Preliminary Results for Automatic Annotation • Preliminary results on the dev data (all numbers are Precision, Recall and F-measure): • Baseline: Default YAMCHA settings: P: 55.80% R: 27.37% F: 36.73% • Best contextual features window size of +1/-1 words, -2/-1 Tags before the current word: 57.66% 34.69% 43.32% • Adding lemma to #2: 60.98% 33.88% 43.55% • Adding POS tag to #3: 52.94% 46.34% 49.42% • Adding POS tag to #2: 45.53% 47.46% 47.46% • Adding ngram features “only end of word” to #2: 59.62% 42.82% 49.84% • Combining #6 and #4: 54.43% 46.61% 50.22% • Adding up to 4 character ngrams from the beg and end of words to #2: 57.77% 46.34% 51.43% • Combining #8 and #4: 55.94% 48.51% 51.96% • Similar to #9 but the context is different –1/0 words, -2/-1 Tags before the current word: 57.19% 48.51% 52.49%
Committed Belief: Future Work • Manual Annotation (till money runs out) • Calculate inter-annotator agreements • Add more annotations on different genres • Annotate Arabic based on improved manual • Automatic Annotation (subject to new funding) • Create a supervised system for Arabic committed belief annotation • Run cross validation experiments on supervised system using new (clean) data • Experiment with more features such as shallow syntactic features, syntactic dependency features, TAG features, semantic role features, word senses • Experiment with semi supervised approaches • Bootstrap from multilingual data using parallel corpora
Dialog Acts: Accomplishments • Manual Annotation • Hired and trained new annotators • Started annotation of dialog corpora, email corpora to follow • Note: core data includes little dialog, so we are annotating non-core data • Automatic Annotation • Student read background material over summer • Successful launch meeting with student
Dialog Acts: Future Work • Manual Annotation: Ongoing • Automatic Annotation • Phase I: Basic dialog act tagging by Sep 15 • Phase II: Forward and backward links by Sep 30 • Phase III: Dynamic turn segmentation into Dialog Function Units by Oct 31 • Phase IV: Improvements to all three functionalities by Dec 31 • Note 1: schedule negotiated previously • Note 2: student partially funded by TTO3, partially by other sources