1 / 19

Exploiting Topic Pragmatics for New Event Detection in TDT-2004

Exploiting Topic Pragmatics for New Event Detection in TDT-2004. TDT-2004 Evaluation Workshop December 2-3, 2004 Ronald K. Braun 1107 NE 45th St., Suite 310, Seattle, WA 98105 206-545-2941 FAX: 206-545-7227 rbraun@stottlerhenke.com http://www.stottlerhenke.com. Who We Are.

mills
Download Presentation

Exploiting Topic Pragmatics for New Event Detection in TDT-2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Topic Pragmatics for New Event Detection in TDT-2004 TDT-2004 Evaluation Workshop December 2-3, 2004 Ronald K. Braun 1107 NE 45th St., Suite 310, Seattle, WA 98105 206-545-2941 FAX: 206-545-7227 rbraun@stottlerhenke.com http://www.stottlerhenke.com

  2. Who We Are • Stottler Henke is a small business specializing in AI consulting and R&D. • Seattle focus on information retrieval and text mining. • Work constitutes part of a DARPA-sponsored Small Business Innovation Research (SBIR) contract (#DAAH01-03-C-R108).

  3. Project Overview Leverage topic pragmatics and committee methods to increase accuracy in the new event detection task. • Pragmatics: non-semantic structure arising from how a topic is reported through time. • Committee methods: combining evidence from multiple perspectives (e.g., ensemble learning).

  4. An Informal Experiment • Considered case by case errors made on the TDT3 corpus (topic-weighted CFSD = 0.4912, pMiss = 0.4000, pFA = 0.0186) . • Examined 30 misses and 20 false alarms, asked what % of these are computationally tractable? • 28% of misses and 35% of the false alarms in our sample had computationally visible features. • With copious caveats, we estimate a CFSD limit of 0.35 exists for the TDT3 corpus under current NED evaluation conditions. • Limit might be greater due to one-topic bias.

  5. Error Classes • Annotation effects – limited annotation time, possible keyword biases. • Lack of a priori topic definitions – topic structure not computationally accessible. • Lack of semantic knowledge – causality, abstraction relationships not modeled. • Multiple topics within a story – at event level, single topic per story may be exceptional.

  6. Error Classes (continued) • High overlap of entities due to subject marginality or class membership – “Podunk country syndrome”, topics in same topic category. • Topics joined in later stages of activity – earliest event activities are ossified into shorthand tags. • Sparseness of topical allusions – “a season of crashing banks, plunging rubles, bouncing paychecks, failing crops and rotating governments” == Russian economic crisis. • Outlier / peripheral events – human interest stuff.

  7. TDT5 Corpus Effects • 280000 stories  an order of magnitude larger than TDT3 or TDT4. • Reduced evaluation in part to an exercise in scalability (only 9 seconds per story for all processing). • Lots of optimization. • Threw out several techniques that relied on POS tagging as tagger was not sufficiently efficient.

  8. TDT5 Corpus Effects (continued) • Performed worse on TDT5 relative to TDT4 for topic-weighted CFSD metric, suggesting TDT5 topic set has some different attribute.

  9. TDT5 Corpus Effects (continued) • An increase in p(miss) rate was expected. • Less annotation time per topic implies an increased likelihood of missed annotations. • Possible conflation of stories due to ubiquitous Iraq verbiage.

  10. NED Classifiers • Made use of three classifiers in our official submission. • Vector Cosine (Baseline) • Sentence Linkage • Location Association

  11. Vector Cosine (Baseline) • Traditional full-text similarity. • Stemmed, stopped bag-of-words feature vector. • TF/IDF weighting, vector cosine distance. • Non-incremental raw DF statistics, generated from all manual stories of TDT3.

  12. Sentence Linkage • Detect linking sentences in text that refer to events described or also referenced by previous or future stories. • For TDT-2003, we used a temporal reference heuristic to identify event candidates. • Sentence Linkage generalizes this technique by treating every sentence (>= 15 unique features, >= one capitalized feature) as a potential event reference candidate. • Candidates of new story compared to all previous stories and all future stories.

  13. Sentence Linkage (continued) • If all capitalized features in candidate occur in story and >= threshold of all unique features also overlap, the stories are linked. • Targets error classes: multiple topics within a story, shared event enforcement in high entity overlapping stories, linking across topic activities, and outlier / peripheral stories. • Problems: contextual events, ambient events.

  14. Location Association • Looks for pairs of strongly associated location entities and non-location words in a story. • Co-occurrence frequencies are maintained for all BBN locations and non-location words in moving window (deferment window + twice that past). A + B > 5 A + C > 5 assoc > 0.7

  15. Location Association (continued) • For all interesting pairs in a story, pair is added to feature vector and all location words and the non-location word are removed. • Feature weight is non-location word’s TF/IDF weight + max TF/IDF weight of words in location. • Uses Baseline TF/IDF methodology otherwise. • Addresses high entity overlap stories error class.

  16. Evidence Combination • Authority voting – a single classifier is primary; other classifiers may override with a non-novel judgment based on their expertise. • Non-primary members of the committee are trained to low miss error rates. • Confidence is claimant’s normalized confidence for non-FS and least normalized confidence of all classifiers for FS. • Evaluation run of Baseline + Sentence Linkage.

  17. Evidence Combination (continued) • Majority voting – members of the committee are each polled for a NED judgment and the majority decision is the system decision. • Trained all classifiers to minimize topic-weighted CFSD over TDT3 and TDT4. • Confidence is the average normalized distance between each majority classifier’s confidence value and decision threshold. • Ties: maximal average normalized difference between the novel versus the non-novel voters decides the system. • Used for our official submission SHAI1.

  18. Evaluation Results • 5 runs, three singletons to gauge individual classifier performance and two committees.

  19. Evaluation Results (continued) • Authority committee was non-useful. • Explained by poor threshold on Baseline, making Baseline non-FS promiscuous. • Majority committee did surprisingly well given non-optimized thresholds of classifiers. • Topic-weighted performance worse than last year but story-weighted performance improved. • Committee outperformed all constituent classifiers again this year. • Suggests less sensitivity to initial thresholding than was expected.

More Related