460 likes | 788 Views
Metro Maps of. Dafna Shahaf Carlos Guestrin Eric Horvitz. T he abundance of books is a distraction. ‘‘. ,,. Lucius Annaeus Seneca. 4 BC – 65 AD. … and it does not get any better. 129,864,880 Books (Google estimate) Research:
E N D
Metro Maps of DafnaShahaf Carlos Guestrin Eric Horvitz
The abundance of books is a distraction ‘‘ ,, LuciusAnnaeus Seneca 4 BC – 65 AD
… and it does not get any better • 129,864,880 Books (Google estimate) • Research: • PubMed: 19 millionpapers(One paper added per minute!) • Scopus: 40 million papers
InnovativePapers Papers
Search Engines are Great • But do not show how it all fits together
Timeline Systems e.g., NewsJunkie [Gabrilovich, Dumais, Horvitz]
Metro Map • A map is a set of lines of articles • Each line follows a coherent narrative thread • Temporal Dynamics + Structure labor unions Merkel bailout Germany protests junk status austerity strike
Map Definition • A map M is a pair (G,P) where • G=(V,E) is a directed graph • P is a set of paths in G (metro lines) • Each e Î E must belong to at least one metro line labor unions Merkel bailout Germany protests junk status austerity strike
Properties of a Good Map ??? Coherence
Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Incoherent: Each pair shares different words Coherence is not a property of local interactions: 1 2 3 4 5 Greece Debt default Europe Republican Italy Protest
Coherence: Main Idea Connecting the Dots [S, Guestrin, KDD’10] Coherent: a small number of words captures the story A more-coherent chain: 1 2 3 4 5 Greece Debt default Austerity Republican Italy Protest
Words are too Simple Bayesian networks Social networks Sensor networks 1 2 3 Probability Cost Network
Using the Citation Graph • Create a graph per word • All papers mentioning the word • Edge weight = strength of influence[El-Arini, GuestrinKDD‘11] Do papers 8 and 9 mean the same thing? Where did paper 8 get the idea? Network 3 4 2 1 6 7 5 9 8
Words are too Simple Bayesian networks Social networks Sensor networks Incoherent 1 2 3 Probability Cost Network
Properties of a Good Map Is it enough? Coherence
Properties of a Good Map Coherence 2. Coverage • Should coverdiversetopicsimportantto the user
Coverage: What to Cover? • Perhaps words? • Not enough: SVM in oracle database 10g Milenovaet al VLDB '05 1 Support Vector Machines in Relational Databases Ruping SVM '02 2
Similar Content 1 2
Affected more authors/ venues Very little intersection Different Impact Citing Venues and Authors: 2 1
What to Cover? • Instead of words… • Coverpapers • A paper covers papers thatit had an impact on • High-coverage map: impact on a lot of the corpus • Why descendants? • Soft notion: [0,1]
Many paths (especially short) p has High Impact on q if… p We use the algorithm of… coherent r Note that our protocol is differentfrom previous work… q Formalize with coherent random walks
Map Coverage • Documents cover pieces of the corpus: CorpusCoverage
Properties of a Good Map Coherence 2. Coverage 3. Connectivity
Definition: Connectivity • Experimented with formulations • Users do not care about connection type • Encourage connections between pairs of lines
Solution: Reward lines that had impact on each other Lines with No Intersection Optimizing Kernels for SVM Perceptrons SVM SVM for Facial Recognition Face Detection Generalized Portrait Method KernelSVM Kernel functions Optimizingkernels Perceptrons Automaticextraction offace features Applying perceptronsto facial feature location View-basedhuman facedetection Training SVMs for face detection Face recognition by SVM
Tying it all Together:Map Objective Consider all coherent maps with maximum possible coverage.Find the most connected one. • Coherence • Either coherent or not: Constraint • Coverage • Must have! • Connectivity • Nice to have
Approach Overview Documents D 1. Coherence graph G 2. Coverage function f f( ) = ? … 3. Increase Connectivity
Coherence Graph: Main Idea 4 5 1 2 5 8 9 6 3 • Vertices correspond to short coherent chains • Directed edges between chains which can be conjoined and remain coherent 1 2 3 5 8 9
Finding High-Coverage Chains 1 4 5 2 5 8 3 6 9 • Paths correspond to coherent chains. • Problem: find a path of length K maximizing coverage of underlying articles Cover( ) > Cover( ) ? 5 6 1 2 3 4 1 2 3 5 8 9
Reformulation • Paths correspond to coherent chains. • Problem: find a path of length K maximizing coverage of underlying articles • Submodular orienteering • [Chekuri and Pal, 2005] • Quasipolynomial time recursive greedy • O(log OPT) approximation a function of the nodes visited Orienteering
Approach Overview: Recap Documents D 1. Coherence graph G 2. Coverage function f f( ) = ? … Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy O(log OPT) approximation Encodes all coherent chains as graph paths 3. Increase Connectivity
Example Map: Reinforcement Learning multi-agent cooperative joint team mdp states pomdp transition option control motor robot skills arm bandit regret dilemma exploration arm q-learning bound optimal rmaxmdp
User Study • Tricky! • No double-blind, no within-subject • Domain: understandable yet unfamiliar • Reinforcement Learning (RL)
User Study • 30 participants • First-year grad student, Reinforcement Learning project • Update a survey paper from 1996 • Identify research directions + relevant papers • Google Scholar • Map and Google Scholar • Baselines: Map, Wikipedia
Results (in a nutshell) Map users find better papers, and cover more important areas Better Us Us Google Google
User Comments Helpful great starting point noticed directions I didn't know about … get a basic idea of what science is up to why don't you draw words on edges? Legend is confusing hard to get an idea from paper title alone
Conclusions • Formulated metrics characterizing good maps for the scientific domain • Efficient methods with theoretical guarantees • User studies highlight the promise of the method • Website on the way! • Personalization Thank you!