Query Chain Focused Summarization

Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014

Generic Summarization • Generic Extractive Multi-doc Summarization: • Given a set of documents Di • Identify a set of sentences Sjs.t. • |Sj| < L • The “central information” in Di is captured by Sj • Sj does not contain redundant information • Representative methods: • KLSum • LexRank • Key concepts: Centrality, Redundancy

Update Summarization • Given a set of documents split as A = ai / B = bjdefined as background / new sets • Select a set of sentences Sks.t. • |Sk| < L • Sk captures central information in B • Sk does not repeat information conveyed by A • Key concepts: centrality, redundancy, novelty

Query-Focused Summarization • Given a set of documents Di and a query Q • Select a set of sentences Sjs.t.: • |Sj| < L • Sj captures information in Di relevant to Q • Sj does not contain redundant information • Key concepts: relevance, redundancy

Query-Chain Focused Summarization • We define a new task to clarify among key concepts: • Relevance • Novelty • Contrast • Similarity • Redundancy • The task is also useful for Exploratory Search

QCFS Task • Given a set of topic-related documents Di and a chain of queries qj • Output a chain of summaries {Sjk} s.t.: • |Sjk| < L • Sjk is relevant to qj • Sjk does not contain information in Slk for l < j

Query Chains • Query Chains are observed in query logs: • PubMed search log mining • Extract query chains (length 3) of same session / with related terms (manually) • Query Chains evolution may correspond to: • Zoom in (asthma  atopic dermatitis) • Query reformulation (respiratory problem  pneumonia) • Focus Change (asthma  cancer)

Query Chains vs. Novelty Detection TREC Novelty Detection Task (2005) • Task 1: Given a set of documents for the topic, identify all relevant and novel sentences. • Task 2: Given the relevant sentences in all documents, identify all novel sentences. • Task 3: Given the relevant and novel sentences in the first 5 docs only, find the relevant and novel sentences in the remaining docs. • Task 4: Given the relevant sentences from all documents and the novel sentences from the first 5 docs, find the novel sentences in the remaining docs.

Novelty Detection Task • Create 50 topics: • Compose topic (textual description) • Select 25 relevant docs from News collection • Sort docs chronologically • Mark relevant sentences • Among relevant sentences, mark novel ones (not covered in previous relevant sentences). • 28 “events” topics / 22 “opinion” topics

TREC Novelty – Dataset Analysis • Select parts of documents (not full docs). • Relevant rate: events: 25% / opinion: 15% • Consecutive sentences: 85% / 65% • Relevant agreement: 68% / 50% • Novelty rate: 38% / 42% • Novelty agreement: 45% / 29%

TREC Novelty Methods • Relevance = Similarity to Topic. • Novelty = Dissimilarity to past sentences. • Methods: • Tf.idf and okapi with threshold for retrieval • Topic expansion • Sentences expansion • Named entities as features • Coreference resolution • Named entities normalization (entity linking) • Results: • High recall / Low precision • Almost no distinction relevant / novel

QCFS and Contrast • QCFS is different from Query Focus: • When generating S2 – must take S1 into account. • QCFS is different from Update: • Split A/B is not observed. • QCFS is different from Novelty Detection: • Chronology is not relevant Key concepts: • Query Relevance • Query Distinctiveness (how qi+1 contrasts with qi)

Contrastive IR • CWS: A Comparative Web Search SystemSun et al, WWW 2006 • Given 2 queries q1 and q2 • Rank a set of “contrastive pairs” (p1, p2)where p1 and p2 are snippets of relevant docs. • Method: • Retrieve relevant snippets SR1 = {p1i} and SR2 = {p2j} • Score aR(p1, q1) + bR(p2, q2) + cT(p1,p2,q1,q2) • T(p1,p2,q1,q2) = x Sim(url1, url2) + (1-x)Sim(p1\q1, p2\q2) • Greedy ranking of pairs: • rank all pairs (p1,p2) by score – take top • Remove p1top and p2top from all pairs – iterate. • Cluster pairs into comparative clusters • Extract terms from comparative clusters.

Document Clustering • A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search ResultsKummamuru et al, WWW 2004 • Desirable properties of clustering: • Coverage • Compactness • Sibling distinctiveness • Reach time • Incremental algorithm: • Decide on width n of tree (# children / node) • Nodes are represented by “concepts” (terms) • Rank concepts by score and add them under current node • Score(Sak, cj) = a ScoreC(Sak-1, cj) + b ScoreD(Sak-1, cj) • ScoreC = document coverage • ScoreD = sibling distinctiveness

Query Chain Focused Summarization

Query Chain Focused Summarization

Presentation Transcript

New Customer Focused Supply Chain Management

Query-based Opinion Summarization for Legal Blog Entries

Multi-topic based Query-oriented Summarization

Summarization

Text summarization

Query session g uided multi-document summarization

Focused Belief Propagation for Query-Specific Inference

Summarization

Multi-Aspect Query Summarization by Composite Query

Scene Summarization

Summarization

Summarization

(i) Chain Query

Query-oriented Multi-document Summarization via Unsupervised Deep Learning

Summarization

SUMMARIZATION

Summarization

Text summarization

Approaches to Event-Focused Summarization Based on Named Entities and Query Words

Supply Chain Strategy: A Downstream Focused Approach

New! Customer Focused Supply Chain Management

Summarization