1 / 41

A Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem . Davide Mottin, Alice Marascu , Senjuti Basu Roy Gautam Das, Themis Palpanas , Yannis Velegrakis. Talk by Davide Mottin at Yahoo! Research Barcelona. Who am I?. Born in Marostica Live in Trento

purity
Download Presentation

A Probabilistic Optimization Framework for the Empty-Answer Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Probabilistic Optimization Framework for the Empty-Answer Problem Davide Mottin, Alice Marascu, SenjutiBasu Roy Gautam Das, Themis Palpanas, YannisVelegrakis Talk by Davide Mottin at Yahoo! Research Barcelona

  2. Who am I? • Born in Marostica • Live in Trento • I’mmember of the dbTrentogroup in the University of Trento • Advisors: ThemisPalpanas and Yannis Velegrakis

  3. Empty-Answer Problem

  4. Empty-AnswerProblem CAR DB Alarm, DSL, Manual No answer {}

  5. Issues • Usersneed a productmatchinghis/herpreferences • Difficult to propose an approximateanswerclose to userneeds • The systemdoesnotprovidesufficient help

  6. Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • ExactSolutions • ApproximateSolution • ExperimentalResults • Conclusions

  7. Existing Solutions • Ranking function • Propose ranked results that are close to the user preferences • Both IR [Baeza11] and database solutions [Chaudhuri04] • Query relaxation • Remove or change one of the conditions in user query [Mishra09]

  8. Query Relaxation CAR DB Alarm, DSL, Manual {}

  9. How ManyRelaxations? Exponential in the size of the query

  10. Challenges • Too many relaxation proposed • Lack of a principled method to propose the next relaxation • Exploring all the relaxations is impractical • Limited user interaction with the system • Lack of user-centric model and motivation for a refinement SOLUTION??? Interactive Query Relaxation

  11. Interactive Query Relaxation CAR DB Alarm, DSL, Manual Remove DSL? RemoveAlarm? YES NO Result: {Askari, A10, …} {}

  12. Applications • Small mobile • Hand-helddevices • Interaction with an agent via telephone • Reservations in restaurants • …

  13. Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • ExactSolutions • ApproximateSolution • ExperimentalResults • Conclusions

  14. Query RelaxationTree Relax DSL? DSL isnotrelaxableanymore Relax Alarm? Problem: Find the optimalpaththatmaximizeor minimize some user-centricquantity

  15. Query Relaxation Tree • Nodesrepresent • Next relaxation proposed (relaxationnodes) • Yes/No User choices(choicenode) • Leaves represent • Non-relaxablequery • Non-emptyquery • Choicebrancheshaverelprefyesrelprefnoprobabilities • A refusedrelaxation = cannot be relaxedfurther (hard constraint) • For Eachnodewe compute a costthatdepends on the optimizationadopted (Dynamic, Semi-Dynamic, Static)

  16. User-Centric Model • Prior(t,Q,Q’) • userknowledgeabout the existence of a tuple t satisyingrelaxedqueryQ’ • Pref(t, Q’) [preferencefunction] • User preferenceabout a tuplegiven the queryQ’ t? Q' DB t t

  17. User-Centric Model Q: Whatis the probability the usersays NO to a relaxation? A: The userdoesn’tlikeany of the tuplesthatsatisfiesthe relaxedqueryQ’

  18. Problem Definition Given a query Q and a database D, find the sequence of relaxations in the Query Relaxation Tree that (3 separate goals): • Minimize the number of relaxations (Dynamic) • Maximize the user satisfaction (Semi-Dynamic) • Maximize some profit/benefit (Static)

  19. Costfunction whereoptimize = Minif goal isDynamic (minimum number of steps) and Maxotherwise • Cost of a leaf: • 0 for Dynamic (minimizeNumber of steps) • Maxpreference (usingpref) of tuples for Semi-Dynamic • Maxvalue (e.g. price, revenue) of the tuples for Static

  20. Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • Exact Solutions (FullTree and FastOpt) • ApproximateSolution • ExperimentalResults • Conclusions

  21. Exact Solution (FullTree) Input: queryQ, database D Output: optimalcost • Construct the Query Relaxation • Compute the cost for eachnode (bottom-up) • Returns the cost of the root

  22. FullTreeAlgorithm (Dynamic) 1 1 2 0.3 0.7 0 0 1 1 1 0.3 0.7 0 0

  23. Fast Solution (FastOpt) Idea: prune the unpromisingbranches in advance and expandonly the goodones • Associate an upper and a lowerboundateachnode • Upperboundis the cost of the nodewhen the probabilityis 1 on no nodes • Lower boundis the opposite of the upper • Remove a nodeifhislowerboundisgreaterthan some upperbound of the siblingnodes

  24. FastOptAlgorithm (Dynamic) Prune!!!

  25. Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • Exact Solutions (FullTree and FastOpt) • ApproximateSolution • ExperimentalResults • Conclusions

  26. Approximate Solution (CDR) Idea: compute the costdistribution of eachnodeand expand the onlynodethatmaximizes the probabilitythat the costislessthan the cost of all the siblings. • Associate a b-sizehistogram to eachnode • Construct the tree for the first L levels • Assignsuniformprobability to nodes • Use convolution to find sum (choicenodes)/min (relaxationnodes) distributions for costs • Expand the branchthathas the biggestprobability of having the lowercost

  27. Compute costdistributions Query size 5 Remember the cost formula Choicenodeatlevel 2, costuniformlydistributed in [1,3] Compute relpref * (1 + cost(n'))

  28. Compute costdistributions Probabilitydistribution of nyes Probabilitydistribution of nyes Sum the distributions of yes child and no childusing sum-convolution

  29. Compute CostDistributions Compute the minconvolution of the child of relaxationnode

  30. Choose the Branch to Expand Idea: for each son of the root, compute the probabilitythat the costissmallerthan the siblings and choose the son with the highestprobability Expandthis! Pr(n2<n1) = 0.4 Pr(n1<n2) = 0.6 n1 n2

  31. Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • Exact Solutions (FullTree and FastOpt) • ApproximateSolution • ExperimentalResults • Conclusions

  32. Experimental Setup • Datasets: • US Home dataset: 38k tuples 18 attributes • Car dataset: 100k tuples, 31 attributes • Synteticdatasets: 20k to 500k tuples • Baseline algorithms: • Query refinementalgorithm[Mishra09] (QueryRef) • Random relaxation • Greedy: choose the first non empyotherwise random

  33. Experimental Setup • Effectiveness: • Query time • Size of the tree (number of nodes) • Cost of the root (expectednumber of steps) • CDR calibration: • Impact of L and number of buckets • User study: • 125 users with MechanicalTurk • Random queries with 4-8 attributes • Evaluation of the usefulnesssystem

  34. RootCost • CDR close to optimal • QueryRefis 30% worse on average • Random is 150% worsethanFullTree

  35. Goal comparison • All the objectivefunctionscorrectlyoptimizetheirgoals • Dynamic and Semi-Dynamic are verysimilar in performance

  36. Query Time Exponentialbehaviour Efficient for small queries 1.4 sec for querysize 10!!!

  37. User Study Q1 - Rate the suggested refinements Q2 - Did you like the system guiding you? Q3 - Did the system help you arrive to the results fast? Q4 - Did you prefer using the help of this system to relaxing the query by yourself?

  38. Conclusions • We propose • anovelprincipled, user-centric and interactiveapproach for the empty-answerproblem • two exact algorithms and an approximate algorithm • We show that • the framework can deal with the combinatorialexplosion • the usereffortisminimized • the user is generally satisfied by the system

  39. Bibliography [Mishra09] C. Mishra and N. Koudas, “Interactive query refinement,” in EDBT,2009. [Roy08]S. Basu Roy, H. Wang, G. Das, U. Nambiar, and M. Mohania, “Minimum-effort driven dynamic faceted search in structured databases,” in CIKM, 2008. [Chadhuri04] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, “Probabilistic ranking of database query results,” in VLDB, 2004. [Baeza11] R. A. Baeza-Yates and B. A. Ribeiro-Neto, Modern Information Retrieval,2011.

  40. CostProbability Probabilitythat the cost of n1lesserthancost n2 Relaxation of the root

More Related