320 likes | 464 Views
Selected Papers from COLING 2004. XIAO Jing. Three Papers. Cross-lingual information extraction system evaluation Detection of question-answer pairs in email conversation Generating overview summaries of ongoing email thread discussions. ?. Query.
E N D
Selected Papers from COLING 2004 XIAO Jing
Three Papers • Cross-lingual information extraction system evaluation • Detection of question-answer pairs in email conversation • Generating overview summaries of ongoing email thread discussions
? Query Cross-lingual information extraction system evaluation (K.Sudo, S.Sekine and R.Grishman) • Query-Driven Information Extraction (QDIE): takes the description of the event type from the user as input and acquires extraction patterns for the given scenario. Pattern Scoring Document Retrieval …... Relevant Document Set Document Preprocessing Pattern Candidate Training Documents
Cross-lingual information extraction system evaluation (K.Sudo, S.Sekine and R.Grishman) • Cross-lingual IE: access the information in languages different from the user’s own; i.e. the target language of an IE system is different from the source language (query). • if the user needs information from a different language source than his/her own English templates English query ? ??? Japanese text
Cross-lingual information extraction system evaluation (K.Sudo, S.Sekine and R.Grishman) • Compare two approaches: (a) acquiring patterns in the source language, performing source language extraction, and then translating the resulting templates to the target language; (cross-lingual QDIE) (b) translating the texts and performing pattern discovery and extraction in the target language. (translation-based QDIE)
? Query …... Cross-lingual information extraction system evaluation (K.Sudo, S.Sekine and R.Grishman) • Translation-based system: (2) Use English pattern discovery system Extraction Patterns Pattern Scoring Pattern Matching Document Retrieval English MT System MT System (1) Translate the training and test documents Japanese Test Documents Training Documents
English (1) Translate the user’s query (3) Translate the extracted table ? Query MT System MT System Extraction Patterns …... Pattern Scoring Pattern Matching ? Document Retrieval Japanese (2) Use Japanese pattern discovery system Test Documents Training Documents Cross-lingual information extraction system evaluation (K.Sudo, S.Sekine and R.Grishman) • Cross-lingual system:
Cross-lingual information extraction system evaluation (K.Sudo, S.Sekine and R.Grishman)
Cross-lingual information extraction system evaluation (K.Sudo, S.Sekine and R.Grishman) Cross-lingual Translation Cross-lingual system does better Translation-base System gets lower max recall
Cross-lingual information extraction system evaluation (K.Sudo, S.Sekine and R.Grishman) • Cross-lingual QDIE system performs better than translation-based QDIE system • Translation-based System suffers from NE recognition errors. • Structural errors and incorrect dependency analysis in MT output caused fewer and noisier pattern candidates
Cross-lingual information extraction system evaluation (K.Sudo, S.Sekine and R.Grishman) • This paper suggests that exploiting some basic tools available for the source language will boost the performance of the whole cross-lingual information extraction system. • Future work will focus on introducing additional techniques for query translation, such as query translation on expanded queries and building a translation dictionary from parallel corpora.
Detection of Question-Answer Pairs in Email Conversations (L. Shrestha and K. Mckeown) • To detect the question-answer pairs in an email conversation for the task of email summarization • Challenges: the email thread as a whole is a collaborative effort with interaction among several participants; replies do not happen immediately, so that responders need to take special precautions to identify relevant elements of the discourse context, for example, by quoting previous messages.
Detection of Question-Answer Pairs in Email Conversations (L. Shrestha and K. Mckeown) • Sample summary obtained with sentence extraction Regarding “acm home/bjarney”, on Apr 9, 2001, Muriel Danslop wrote: Two things: Can someone be responsible for the press releases for Stroustrup? Responding to this on Apr 10, 2001, Theresa Feng wrote: I think Phil, who is probably a better writer than most of us, is writing up something for dang and Dave to send out to various ACM chapters. Phil, we can just use that as our “press release”, right? In another subthread, on Apr 12, 2001, Kevin Danquoit wrote: Are you sending out upcoming events for this week?
Detection of Question-Answer Pairs in Email Conversations (L. Shrestha and K. Mckeown) • Aim to be able to distinguish question pertaining to different issues in an email thread and be able to associate the answers with their questions. • Develop one approach for the detection of questions in email messages, and a separate approach to detect the corresponding answers.
Detection of Question-Answer Pairs in Email Conversations (L. Shrestha and K. Mckeown) Automatic Question Detection • Question mark is not adequate. Question mark may be used to denote uncertainty; people may overlook using a question mark after a question; a question may be stated in a declarative form (I’m wondering if …); not every question is meant to be answered. • Use supervised rule induction (Ripper system) for the detection of interrogative questions. • Features used: PoS tags for the first five and last five terms; length of the utterance and PoS bigrams. • Precision: 96%; Recall: 72%
Detection of Question-Answer Pairs in Email Conversations (L. Shrestha and K. Mckeown) Automatic Answer Detection • One assumption: while a number of issues may be pursued in parallel, users tend to use separate paragraphs to address separate issues in the same email message. Other text segmentation tool could be also possible. • For each question segment in an email message, they make a list of candidate answer segments.
Detection of Question-Answer Pairs in Email Conversations (L. Shrestha and K. Mckeown) Automatic Answer Detection • Features used: thread: t; the container message of the question segment: mq; the container message of the candidate answer segment: ma; question segment: q; answer segment: a. • Number of non stop words in segment q and segment a; • Cosine similarity and euclidean distance between segment q and a; • The number of intermediate messages between mq and ma in t; • The ratio of the number of messages in t sent earlier than mq and all the messages in t, and similarity for ma;
Detection of Question-Answer Pairs in Email Conversations (L. Shrestha and K. Mckeown) Automatic Answer Detection • Feature used (cont.): (e)Whether a is the first segment in the list of candidate answer segments of q; (f) Number of candidate answer segment of q and the number of candidate answer segments of q after a; (g) The ratio of the number of candidate answer segments before a and the number of all candidate answer segments; (h) Whether q is the most similar segment of a among all segments from ancestor messages of ma based on cosine similarity.
Detection of Question-Answer Pairs in Email Conversations (L. Shrestha and K. Mckeown) • Induce question-answer pairs Use Ripper to automatically induce question and candidate answer pair classifiers, using the features as discussed earlier. • Use cosine similarity feature only as the baseline. Experimental results show that the precision using the full feature set is comparable to that of the baseline measure (~60%), the recall is being significantly improved (from ~40% to ~60%).
Detection of Question-Answer Pairs in Email Conversations (L. Shrestha and K. Mckeown) • Haven’t addressed the automatic detection of questions in the declarative form and rhetorical questions. • How to integrate the identified question-answer pairs as part of a full summary is also an open research.
Generating overview summaries of ongoing email thread discussions (S. Wan and K. McKeown) • To extract a set of sentences consisting of one issue, and the corresponding responses – one per participant. • Use the structure of the thread dialogue and word vector techniques to determine which sentence in the thread should be extracted as the main issue. • Find that the sentence containing the issue of the thread being discussed is more informative than subject line.
Generating overview summaries of ongoing email thread discussions (S. Wan and K. McKeown) • Here’s the plaque info. • http://www.affordableawards.com/plaques/ordecon.htm • I like the plaque, and aside for exchanging Dana’s name for “Sally Slater” • and ACM for “Ladies Auxiliary”, the wording is nice. • 4. We just need to contact the plaque folks and ask what format they need • for the logo. Example summary from a conventional sentence extraction summarizer
Generating overview summaries of ongoing email thread discussions (S. Wan and K. McKeown) Issue: Let me know if you agree or disagree w/choice of plaque and (especially) wording. Response 1: I like the plaque, and aside for exchanging Dana’s name for “Sally Slater” and ACM for “Ladies Auxiliary”, the wording is nice. Response 2: I prefer Christy’s wording to the plaque original. Example summary from the paper’s system
Generating overview summaries of ongoing email thread discussions (S. Wan and K. McKeown) • Assumptions: 1. The threads have been correctly constructed and classified as discussions supporting decision-making. 2. The issue being discussed is to be found in the first email. 3. The email thread doesn’t shift task, nor does it contain multiple issues.
Generating overview summaries of ongoing email thread discussions (S. Wan and K. McKeown) • General framework for issue detection 1. Separate thread into issue_email and replies 2. Create “comparison vector” V representing replies 3. For each sentence s in issue_email 3.1 Construct vector representation S for sentence s 3.2 Compare V and S using cosine similarity 4. Rank sentences according to their cosine similarity scores 5. Extract top ranking sentence
Generating overview summaries of ongoing email thread discussions (S. Wan and K. McKeown) • Four methods for building the comparison vector: 1. The Centroid method 2. The SVD (Singular Value Decomposition) Centroid method 3. The SVD Key Sentence method 4. Combinations of Centroid method and SVD ones: Oracles; two oracles by heuristic rules
Generating overview summaries of ongoing email thread discussions (S. Wan and K. McKeown) • Extract the responses to the Issue Simply take the first sentence of the replies of each responding participant. Only extract one response per participant. Another solution which is to apply the issue detection algorithm to the reply email in question is also considered. However, it turns out that most of the tagged responses occurred at the start of each reply email and a more complex approach was unnecessary and potentially introduced more errors.
Generating overview summaries of ongoing email thread discussions (S. Wan and K. McKeown)
Generating overview summaries of ongoing email thread discussions (S. Wan and K. McKeown) • From the experimental results, they drew the following conclusions: Centroid method is impressive; The combination of centroid and SVD showed improved performance, suggesting that such techniques might potentially be useful in an email thread summary.
Generating overview summaries of ongoing email thread discussions (S. Wan and K. McKeown) • The methods described in this paper would form part of a larger email thread summarizer able to identify task boundaries and then initiate the appropriate summarization strategy for that task. • Future work would focus on testing the assumptions; examining issues of scalability, such as domain independence and how issue detection might be integrated with a more complete solution to email thread summarization.
Other IE and summarization papers • IE using kernel methods • IE vs QA; IE vs summarization • Multi-document summarization • Named entity and relation extraction in Bio-medicine texts (Workshop on NLP in Biomedicine and its applications) • ……