Learning to Classify Email into “Speech Acts”

Learning to Classify Email into “Speech Acts” William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell Presented by Vitor R. Carvalho IR Discussion Series - August 12th 2004 - CMU

Imagine an hypothetical email assistant that can detect “speech acts”… 1 urgent Request - may take action - request pending Do you have any data with xml-tagged names? I need it ASAP! Urgent Request - May take action 2 A Commitment is detected. “Should I send Vitor a reminder on Sunday?” “should I add this Commitment to your to-do list?” Sure. I’ll put it together by Sunday. 3 Delivery is sent - to-do list updated Here’s the tar ball on afs : ~vitor/names.tar.gz A Delivery of data is detected. - pending cancelled

Outline • Setting the base • “Email speech act” Taxonomy • Data • Inter-annotator agreement • Results • Learnability of “email acts” • Different learning algorithms, “acts”, etc • Different representations • Improvements • Collective/Relational/Iterative classification

Related Work • Email classification for • topic/folder identification • spam/non-spam • Speech-act classification in conversational speech • email is new domain - multiple acts/msg • Winograd’s Coordinator (1987): users manually annotated email with intent. • Extra work for (lazy) users • Murakoshi et al (1999): hand-coded rules for identifying speech-act like labels in Japanese emails

“Email Acts” Taxonomy From: Benjamin Han To: Vitor Carvalho Subject: LTI Student Research Symposium Hey Vitor When exactly is the LTI SRS submission deadline? Also, don’t forget to ask Eric about the SRS webpage. See you Ben • Single email message may contain multiple acts • An Act is described as a verb-noun pair (e.g., propose meeting, request information) - Not all pairs make sense • Try to describe commonly observed behaviors, rather than all possible speech acts in English • Also include non-linguistic usage of email (e.g. delivery of files) Request - Information Reminder - action/task

A Taxonomy of “Email Acts” Verb Negotiate Other Remind Greet Conclude Initiate Amend Propose Request Deliver Refuse Commit

A Taxonomy of “Email Acts” Noun Information Activity Data Opinion Ongoing Activity Single Event Meeting Logistics Data Other Data Committee Other Short Term Task Meeting <Verb><Noun>

Corpora • Few large, natural email corpora are available • CSPACE corpus (Kraut & Fussell) • Email associated with a semester-long project for GSIA MBA students in 1997 • 15,000 messages from 277 students in 50 teams (4 to 6/team) • Rich in task negotiation • N02F2, N01F3, N03F2: all messages from students in three teams (341, 351, 443 messages). • SRI’s “Project World” CALO corpus: • 6 people in artificial task scenario over four days • 222 messages (publically available) Double-labeled

Inter-Annotator Agreement • Kappa Statistic • A = probability of agreement in a category • R = prob. of agreement for 2 annotators labeling at random • Kappa range: -1…+1

Inter-Annotator Agreementfor messages with only one single “verb”

Learnability of Email ActsFeatures: un-weighted word frequency counts (BOW)5-fold cross-validation(Directive = Req or Prop or Amd)

Using Different Learners (Directive Act = Req or Prop or Amd)

Learning Requests only

Learning Commissives (Commissive Act = Delivery or Commitment)

Learning Deliveries only

Learning to recognize Commitments

Most Informative Features(are common words) Request+Amend+Propose Commit Deliver

Learning: document representation • Variants explored • TFIDF -> TF weighting (don’t downweight common words) • bigrams • For commitment: “i will”, “i agree”, in top 5 features • For directive: “do you”, “could you”, “can you”, “please advise” in top 25 • count of time expressions • words near a time expression • words near proper noun or pronoun • POS counts

Baseline classifier: linear-kernel SVM with TFIDF weighting

Collective Classification (relational)

Collective Classification • BOW classifier output as features (7 binary features = req, dlv, amd, prop, etc) • MaxEnt Learner, Training set = N03f2, Test set = N01f3 • Features: current msg + parent msg + child message (1st child only) • “Related” msgs = messages with a parent and/or child message … useful for “related” messages

Collective/Iterative Classification 0.53 TIME • Start with baseline (BOW) • How to make updates? • Chronological order • Using “family-heuristics” (child first, parent first, etc) • Using posterior probability (Maximum Entropy learner) (Threshold, ranking, etc) 0.65 0.85 0.85 0.95 0.93

Iterative Classification: Commitment

Iterative Classification: Request

Iterative Classification: Dlv+Cmt

Conclusions/Summary • Negotiating/managing shared tasks is a central use of email • Proposed a taxonomy for “email acts” - could be useful for tracking commitments, delegations, pending answers, integrating to-do lists and calendars to email, etc • Inter-annotator agreement → 70-80’s (kappa) • Learned classifiers can do this to some reasonable degree of accuracy (90% precision at 50-60% recall for top level of taxonomy) • Fancy tricks with IE, bigrams, POS offer modest improvement over baseline TF-weighted systems

Conclusions/Future Work • Teamwork (Collective/Iterative classification) seems to helps a lot! • Future work: • Integrate all features + best learners + tricks…tune the system • Social network analysis

Learning to Classify Email into “Speech Acts”