1 / 35

Hierarchical Summaries

Hierarchical Summaries. for Search. By: Dawn J. Lawrie University of Massachusetts, Amherst. The Problem. Possible Solution. Possible Solution. Solution: Automatic Hierarchies. Strengths of Automatic Hierarchies. Word-based summary Focus on topics of the documents

hollye
Download Presentation

Hierarchical Summaries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchical Summaries for Search By: Dawn J. Lawrie University of Massachusetts, Amherst

  2. The Problem Dawn J. Lawrie University of Massachusetts, Amherst

  3. Possible Solution Dawn J. Lawrie University of Massachusetts, Amherst

  4. Possible Solution Dawn J. Lawrie University of Massachusetts, Amherst

  5. Solution: Automatic Hierarchies Dawn J. Lawrie University of Massachusetts, Amherst

  6. Strengths of Automatic Hierarchies • Word-based summary • Focus on topics of the documents • Allows users to navigate through the results • Easy to understand • Bonus: Useful for summarizing documents Dawn J. Lawrie University of Massachusetts, Amherst

  7. mammals (1710) marine (128) fish (70) whales (74) marine mammals (188) birds (30) sea lions (22) permits (102) insects (30) jaguars (20) amphibians (10) Critical Habitat (160) deer (11) Endangered Species Act (10) Hawaii (30) Melicope Species (10) manatees (11) California (20) Wainae Plant Cluster Recovery Plan (10) Threatened (10) legislation (64) rats (10) Utah (10) habitat protection (11) Ecosystem Management (20) Waianae Mountains (10) Virginia (10) Example • Hand-generated hierarchy of 50 documents Query: “Endangered Species (Mammals)” Endangered Animals (2910) Endangered plants (70) Dawn J. Lawrie University of Massachusetts, Amherst

  8. Term Selection Algorithm Hierarchy Document Set Language Model Proposed Framework “Term” = word or phrase Dawn J. Lawrie University of Massachusetts, Amherst

  9. Challenges • Selecting terms for the hierarchy • Displaying the hierarchy • Showing that it works Dawn J. Lawrie University of Massachusetts, Amherst

  10. Outline • Introduction • Description of framework for creating hierarchies • Examples • Methods of evaluation • Future Improvements Dawn J. Lawrie University of Massachusetts, Amherst

  11. Methodology • Build probabilistic word model of documents • Find “best” terms • On topic • Predictive • Recursive definition creates hierarchy Dawn J. Lawrie University of Massachusetts, Amherst

  12. Endangered Steller sea lions Term characteristics • Why topicality? • Distinguish topic terms from the rest of the vocabulary The Secretary of Interior listed bald eagles south of the 40th parallel as endangered under the Endangered Species Preservation Act of 1966. • Why predictiveness? • Topic words can be strongly related • Represent different facets of the vocabulary • Example: P(“Endangered”|”Stellar sea lions”) = 1.00 Dawn J. Lawrie University of Massachusetts, Amherst

  13. Statistical Model • AT refers to topicality with respect to topic T • Find if the word w is in set T • Brefers to predictiveness • Precondition for other terms to occur • Find if word w is in set P Dawn J. Lawrie University of Massachusetts, Amherst

  14. Probabilistic Word Model • Captures statistical information about text • Called a “language model” in speech recognition • Provides basis for estimation of probabilities Dawn J. Lawrie University of Massachusetts, Amherst

  15. Estimating Topicality • Use term’s contribution to relative entropy • Compares two models using K-L divergence • Model of documents in hierarchy • Model of general English Dawn J. Lawrie University of Massachusetts, Amherst

  16. marine species fishery mammal KL Example endangered Dawn J. Lawrie University of Massachusetts, Amherst

  17. v P(t|v) mammal species fishery marine t .98 mammal .31 .35 .99 .31 .35 marine .50 .65 species .65 .04 .03 .01 fishery Estimating Predictiveness • Relates the vocabulary to a set of candidate topic terms • Use conditional probability - Px (t|v) • x is the maximum distance between t and v Dawn J. Lawrie University of Massachusetts, Amherst

  18. Interpret predictive language model as graph edges weighted by the conditional probability Finds terms that are connected to lots of terms with a high weight Chooses topic terms until vocabulary is dominated (predicted) Dominating Set Approximation Dawn J. Lawrie University of Massachusetts, Amherst

  19. P(t|v) v t Term Selection Example Dawn J. Lawrie University of Massachusetts, Amherst

  20. Generating a Summary • 4-step process (1) Preprocess document set (2) Generate a language model (3) Select the terms (4) Create a Hierarchy recursive Dawn J. Lawrie University of Massachusetts, Amherst

  21. Outline • Introduction • Description of framework for creating hierarchies • Examples • Methods of evaluation • Future Improvements Dawn J. Lawrie University of Massachusetts, Amherst

  22. Example Hierarchies • Generated from 50 documents retrieved for the query: Endangered Species - Mammals • Demonstrate the difference between using different topic models • Web hierarchy using same query Dawn J. Lawrie University of Massachusetts, Amherst

  23. amended (154) endangered (86) regulations (124) Act (41) fish (117) State (32) permit (146) Committee (43) number (93) address (85) bill (51) operations (43) Secretary (73) incidental take (42) research (105) NMFS (64) population (32) commercial fishing operations (42) Uniform Topic Model Hierarchy species (439) marine mammals (187) plan (192) marine (187) Dawn J. Lawrie University of Massachusetts, Amherst

  24. mammals (126) Endangered Species Act (294) marine mammal stocks (20) endangered species (204) marine mammal species (42) habitat (283) fishery (53) Marine Mammal Commission (21) Secretary (42) fish (277) NMFS (83) National Marine Fisheries Service (113) stock (51) fish species (32) Act (313) MMPA (51) permit (164) incidental (74) protection (244) research (63) KL-Topic Model Hierarchy marine mammals (187) species (439) marine (187) Marine Mammal Protection Act (73) management plan (51) Dawn J. Lawrie University of Massachusetts, Amherst

  25. Web Hierarchies • Submit query to a web search engine • Gather titles and snippets of documents • Text considered a document • Documents are about 30 words Dawn J. Lawrie University of Massachusetts, Amherst

  26. marine species (4) marine mammals (91) marine mammals (97) terrestrial mammals (2) animal species (1) birds (114) Endangered Mammals (22) Critically Endangered Mammals (2) endangered marine species (2) Endangered Mammals (13) threatened (144) Endangered Species Act (8) species of marine mammals (1) birds (140) threatened (78) species of mammals (27) Animal Info (2) Species Management (2) Listed Species (1) species of marine mammals (1) Mammals species (4) Ecosystems (2) Species Information (1) listing of species (1) Scientists (2) Endangered Species Coalition (2) Canadian Endangered Species (3) protected species (2) Protected Resources (2) native species (1) small mammals (13) Endangered Spaces (2) endangered mammal species (4) Candidate species (2) large mammals (12) sea otter (2) 100 species (1) British mammals (4) new species (1) dolphins (7) whales (13) List of Endangered Species (5) federal Endangered Species (1) Cetaceans (2) Example of Web Hierarchy marine (76) Endangered Species (440) endangered (491) mammals (600) Dawn J. Lawrie University of Massachusetts, Amherst

  27. Outline • Introduction • Description of framework for creating hierarchies • Examples • Methods of evaluation • Future Improvements Dawn J. Lawrie University of Massachusetts, Amherst

  28. Evaluations • Summary Evaluation • Tests how well the topic terms chosen predict the vocabulary • Access Evaluation • Compare number of documents a user can find • Relevance Evaluation • Path length to find all relevant documents Dawn J. Lawrie University of Massachusetts, Amherst

  29. Automatic Evaluation Test Set • Use 50 standard queries • Document sets • 500 documents retrieved from TREC volumes 4 and 5 (have relevance judgments) • 200 documents retrieved from a news database • 1000 titles and snippets retrieved using Google™ Search Engine Dawn J. Lawrie University of Massachusetts, Amherst

  30. ? Evaluating Hypotheses • Denotes an evaluation confirmed hypothesis • Denotes evaluation showed no significant difference ? Relevance Summary TREC Collection and News Documents Access Use KL-topic model Use sub-collections Dawn J. Lawrie University of Massachusetts, Amherst

  31. Web Document Evaluation • Results completely different • Best hierarchy uniform topic model • Hierarchies do not look as good to human inspection Dawn J. Lawrie University of Massachusetts, Amherst

  32. User Study • Include 12 to 16 users • Compare ranked list and hierarchy to ranked list alone • Users asked to find all instances that are relevant to the query • Only have to identify one document about a particular instance • Study includes 10 queries Dawn J. Lawrie University of Massachusetts, Amherst

  33. Future Work • Complete user study • Failure Analysis • Explore the use of topic hierarchies in other organizational tasks • Personal collections of documents • E-mails Dawn J. Lawrie University of Massachusetts, Amherst

  34. Conclusions • Developed a formal framework for topic hierarchies • Created hierarchies from full text and snippets of documents • Verified intuition concerning hierarchies generated from full text Dawn J. Lawrie University of Massachusetts, Amherst

  35. Questions? Demo: http://www-ciir.cs.umass.edu/~lawrie/categories/google-qry/ Dawn J. Lawrie University of Massachusetts, Amherst

More Related