1 / 45

Metro Maps of

Metro Maps of. Dafna Shahaf Carlos Guestrin Eric Horvitz. T he abundance of books is a distraction. ‘‘. ,,. Lucius Annaeus Seneca. 4 BC – 65 AD. … and it does not get any better. 129,864,880 Books (Google estimate) Research:

eydie
Download Presentation

Metro Maps of

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metro Maps of DafnaShahaf Carlos Guestrin Eric Horvitz

  2. The abundance of books is a distraction ‘‘ ,, LuciusAnnaeus Seneca 4 BC – 65 AD

  3. … and it does not get any better • 129,864,880 Books (Google estimate) • Research: • PubMed: 19 millionpapers(One paper added per minute!) • Scopus: 40 million papers

  4. InnovativePapers Papers

  5. So, you want to understand a research topic… Now what?

  6. Search Engines are Great • But do not show how it all fits together

  7. Timeline Systems e.g., NewsJunkie [Gabrilovich, Dumais, Horvitz]

  8. Research is not Linear

  9. Metro Map • A map is a set of lines of articles • Each line follows a coherent narrative thread • Temporal Dynamics + Structure labor unions Merkel bailout Germany protests junk status austerity strike

  10. Map Definition • A map M is a pair (G,P) where • G=(V,E) is a directed graph • P is a set of paths in G (metro lines) • Each e Î E must belong to at least one metro line labor unions Merkel bailout Germany protests junk status austerity strike

  11. Game Plan

  12. Properties of a Good Map ??? Coherence

  13. Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Incoherent: Each pair shares different words Coherence is not a property of local interactions: 1 2 3 4 5 Greece Debt default Europe Republican Italy Protest

  14. Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Coherent: a small number of words captures the story A more-coherent chain: 1 2 3 4 5 Greece Debt default Austerity Republican Italy Protest

  15. Words are too Simple Bayesian networks Social networks Sensor networks 1 2 3 Probability Cost Network

  16. Using the Citation Graph • Create a graph per word • All papers mentioning the word • Edge weight = strength of influence[El-Arini, GuestrinKDD‘11] Do papers 8 and 9 mean the same thing? Where did paper 8 get the idea? Network 3 4 2 1 6 7 5 9 8

  17. Words are too Simple Bayesian networks Social networks Sensor networks Incoherent 1 2 3 Probability Cost Network

  18. Properties of a Good Map Is it enough? Coherence

  19. Max-coherence MapQuery: Reinforcement Learning

  20. Properties of a Good Map Coherence 2. Coverage • Should coverdiversetopicsimportantto the user

  21. Coverage: What to Cover? • Perhaps words? • Not enough: SVM in oracle database 10g Milenovaet al VLDB '05 1 Support Vector Machines in Relational Databases Ruping SVM '02 2

  22. Similar Content 1 2

  23. Affected more authors/ venues Very little intersection Different Impact Citing Venues and Authors: 2 1

  24. What to Cover? • Instead of words… • Coverpapers • A paper covers papers thatit had an impact on • High-coverage map: impact on a lot of the corpus • Why descendants? • Soft notion: [0,1]

  25. Many paths (especially short) p has High Impact on q if… p We use the algorithm of… coherent r Note that our protocol is differentfrom previous work… q Formalize with coherent random walks

  26. Map Coverage • Documents cover pieces of the corpus: CorpusCoverage

  27. High-coverage, Coherent Map

  28. Properties of a Good Map Coherence 2. Coverage 3. Connectivity

  29. Definition: Connectivity • Experimented with formulations • Users do not care about connection type • Encourage connections between pairs of lines

  30. Solution: Reward lines that had impact on each other Lines with No Intersection Optimizing Kernels for SVM Perceptrons SVM SVM for Facial Recognition Face Detection Generalized Portrait Method KernelSVM Kernel functions Optimizingkernels Perceptrons Automaticextraction offace features Applying perceptronsto facial feature location View-basedhuman facedetection Training SVMs for face detection Face recognition by SVM

  31. Tying it all Together:Map Objective Consider all coherent maps with maximum possible coverage.Find the most connected one. • Coherence • Either coherent or not: Constraint • Coverage • Must have! • Connectivity • Nice to have

  32. Game Plan

  33. Approach Overview Documents D 1. Coherence graph G 2. Coverage function f f( ) = ? … 3. Increase Connectivity

  34. Coherence Graph: Main Idea 4 5 1 2 5 8 9 6 3 • Vertices correspond to short coherent chains • Directed edges between chains which can be conjoined and remain coherent 1 2 3 5 8 9

  35. Finding High-Coverage Chains 1 4 5 2 5 8 3 6 9 • Paths correspond to coherent chains. • Problem: find a path of length K maximizing coverage of underlying articles Cover( ) > Cover( ) ? 5 6 1 2 3 4 1 2 3 5 8 9

  36. Reformulation • Paths correspond to coherent chains. • Problem: find a path of length K maximizing coverage of underlying articles • Submodular orienteering • [Chekuri and Pal, 2005] • Quasipolynomial time recursive greedy • O(log OPT) approximation a function of the nodes visited Orienteering

  37. Approach Overview: Recap Documents D 1. Coherence graph G 2. Coverage function f f( ) = ? … Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation Encodes all coherent chains as graph paths 3. Increase Connectivity

  38. Example Map: Reinforcement Learning multi-agent cooperative joint team mdp states pomdp transition option control motor robot skills arm bandit regret dilemma exploration arm q-learning bound optimal rmaxmdp

  39. Example Map Detail: SVM

  40. Game Plan

  41. User Study • Tricky! • No double-blind, no within-subject • Domain: understandable yet unfamiliar • Reinforcement Learning (RL)

  42. User Study • 30 participants • First-year grad student, Reinforcement Learning project • Update a survey paper from 1996 • Identify research directions + relevant papers • Google Scholar • Map and Google Scholar • Baselines: Map, Wikipedia

  43. Results (in a nutshell) Map users find better papers, and cover more important areas Better Us Us Google Google

  44. User Comments Helpful great starting point noticed directions I didn't know about … get a basic idea of what science is up to why don't you draw words on edges? Legend is confusing hard to get an idea from paper title alone

  45. Conclusions • Formulated metrics characterizing good maps for the scientific domain • Efficient methods with theoretical guarantees • User studies highlight the promise of the method • Website on the way! • Personalization Thank you!

More Related