70 likes | 224 Views
Novelty Detection in Repeated MEAD Summarization. Richard Murphy EECS 597 06 December 2002. The Problem with MEAD. Works well for one-time summaries Summaries produced are readable, fairly informative News stories are on-going, not one-time
E N D
Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002
The Problem with MEAD • Works well for one-time summaries • Summaries produced are readable, fairly informative • News stories are on-going, not one-time • New, relevant articles may appear after cluster is summarized • Expanded cluster will include new information • Second summary of a cluster will include lots of known information • New information often demoted--further from centroid • Repeated summaries lose value • Reader can be assumed to remember past summaries • Most informative summary will focus on new information with only brief repetition of key points • More repetition = Less new information = Less useful summary
[1] CNN.com - Plane hits skyscraper in Milan - April 18, 2002 [2] CNNenEspanol.com A small plane has hit a skyscraper in central Milan, setting the top floors of the 30-story building on fire, an Italian journalist told CNN. [3] The crash by the Piper tourist plane into the 26th floor occurred at 5:50 p.m. (1450 GMT) on Thursday, said journalist Desideria Cavina. [4] Several storeys of the building were engulfed in fire, she said. [5] Italian TV says the crash put a hole in the 25th floor of the Pirelli building, and that smoke is pouring from the opening. [6] U.N. envoy horror at Jenin camp U.S. bombing kills Canadians Chinese missiles concern U.S. 2002 Cable News Network LP, LLLP. [7] The building houses government offices and is next to the city's central train station. [1] CNN.com - Plane hits skyscraper in Milan - April 18, 2002 [2] The crash by the Piper tourist plane into the 26th floor occurred at 5:50 p.m. (1450 GMT) on Thursday, said journalist Desideria Cavina. [3] The building houses government offices and is next to the city's central train station. [4] Italian TV says the crash put a hole in the 25th floor of the Pirelli building, and that smoke is pouring from the opening. [5] U.N. envoy horror at Jenin camp U.S. bombing kills Canadians Chinese missiles concern U.S. 2002 Cable News Network LP, LLLP. [6] The Pirelli Building in Milan, Italy, was hit by a small plane. [7] (ABCNEWS.com) 8212; A small plane crashed into a skyscraper in downtown Milan today, setting several floors of the 30-story building on fire. [8] The plane crashed into the 25th floor of the Pirelli building in downtown Milan. [9] A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported. [10] WITNESSES REPORTED hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city s central train station. [11] Italian state television said the crash put a hole in the 25th floor of the Pirelli building. [12] CNNenEspanol.com A small plane has hit a skyscraper in central Milan, setting the top floors of the 30-story building on fire, an Italian journalist told CNN.
Solution: MEAD with a memory • Save summaries with cluster information • When summarizing cluster in future, check for archived summaries • During reranking, compare sentences to sentences in old summaries • Existing default-reranker.pl module compares sentences in summary to each other using cosine similarity metric, eliminates those that are too similar to other sentences in the summary • After this process, use cosine similarity to demote sentences in new summary that are too similar to sentences in old summary • Don’t completely eliminate sentences similar to known information--If user requests large enough summary, “background” (already seen) information should appear lower in new summary • User specific • In a MEAD-based system like NewsInEssence, users could log in to get updated summaries of on-going stories
Evaluating Multiple Summaries • Evaluation of single (first) summary • Create manual extract from current cluster • Run meadeval.pl to calculate precision/recall/kappa of automated summary • Evaluation of subsequent summaries • Create manual extract from current cluster and past automated summaries (not past manual summaries--reader will have seen the automated output) • Run meadeval.pl • Always use the cluster which was available to MEAD at time of automated summarization
Default MEAD--Initial summary: Precision: 0.571428571428571 Recall: 0.571428571428571 Kappa: 0.539170506912442 Default MEAD--Second summary: Precision: 0.25 Recall: 0.25 Kappa: 0.147727272727273 Default MEAD--Third summary: Precision: 0.0833333333333333 Recall: 0.0833333333333333 Kappa: -0.0416666666666663 MEAD with memory--Initial: Precision: 0.571428571428571 Recall: 0.571428571428571 Kappa: 0.539170506912442 MEAD with memory--Second: Precision: 0.333333333333333 Recall: 0.333333333333333 Kappa: 0.242424242424242 MEAD with memory--Third: Precision: 0.833333333333333 Recall: 0.833333333333333 Kappa: 0.81060606060606 Settings: demote on cosine-similarity >= 0.7, demote by 0.1 points Comparing MEAD to MEAD with memory
Remaining / Future Work • More testing • More test clusters • Different values of demotion increment, demotion similarity cutoff • Command-line options for demotion settings • Varying levels of demotion based on position in old summary • Multiple users • Currently assumes cluster belongs to an individual user • Add command-line identification of user so that multiple users can summarize cluster without being affected by each others’ archives • News in Essence interface • Remember website visitors, keep unique archives for each