410 likes | 530 Views
A Probabilistic Optimization Framework for the Empty-Answer Problem . Davide Mottin, Alice Marascu , Senjuti Basu Roy Gautam Das, Themis Palpanas , Yannis Velegrakis. Talk by Davide Mottin at Yahoo! Research Barcelona. Who am I?. Born in Marostica Live in Trento
E N D
A Probabilistic Optimization Framework for the Empty-Answer Problem Davide Mottin, Alice Marascu, SenjutiBasu Roy Gautam Das, Themis Palpanas, YannisVelegrakis Talk by Davide Mottin at Yahoo! Research Barcelona
Who am I? • Born in Marostica • Live in Trento • I’mmember of the dbTrentogroup in the University of Trento • Advisors: ThemisPalpanas and Yannis Velegrakis
Empty-AnswerProblem CAR DB Alarm, DSL, Manual No answer {}
Issues • Usersneed a productmatchinghis/herpreferences • Difficult to propose an approximateanswerclose to userneeds • The systemdoesnotprovidesufficient help
Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • ExactSolutions • ApproximateSolution • ExperimentalResults • Conclusions
Existing Solutions • Ranking function • Propose ranked results that are close to the user preferences • Both IR [Baeza11] and database solutions [Chaudhuri04] • Query relaxation • Remove or change one of the conditions in user query [Mishra09]
Query Relaxation CAR DB Alarm, DSL, Manual {}
How ManyRelaxations? Exponential in the size of the query
Challenges • Too many relaxation proposed • Lack of a principled method to propose the next relaxation • Exploring all the relaxations is impractical • Limited user interaction with the system • Lack of user-centric model and motivation for a refinement SOLUTION??? Interactive Query Relaxation
Interactive Query Relaxation CAR DB Alarm, DSL, Manual Remove DSL? RemoveAlarm? YES NO Result: {Askari, A10, …} {}
Applications • Small mobile • Hand-helddevices • Interaction with an agent via telephone • Reservations in restaurants • …
Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • ExactSolutions • ApproximateSolution • ExperimentalResults • Conclusions
Query RelaxationTree Relax DSL? DSL isnotrelaxableanymore Relax Alarm? Problem: Find the optimalpaththatmaximizeor minimize some user-centricquantity
Query Relaxation Tree • Nodesrepresent • Next relaxation proposed (relaxationnodes) • Yes/No User choices(choicenode) • Leaves represent • Non-relaxablequery • Non-emptyquery • Choicebrancheshaverelprefyesrelprefnoprobabilities • A refusedrelaxation = cannot be relaxedfurther (hard constraint) • For Eachnodewe compute a costthatdepends on the optimizationadopted (Dynamic, Semi-Dynamic, Static)
User-Centric Model • Prior(t,Q,Q’) • userknowledgeabout the existence of a tuple t satisyingrelaxedqueryQ’ • Pref(t, Q’) [preferencefunction] • User preferenceabout a tuplegiven the queryQ’ t? Q' DB t t
User-Centric Model Q: Whatis the probability the usersays NO to a relaxation? A: The userdoesn’tlikeany of the tuplesthatsatisfiesthe relaxedqueryQ’
Problem Definition Given a query Q and a database D, find the sequence of relaxations in the Query Relaxation Tree that (3 separate goals): • Minimize the number of relaxations (Dynamic) • Maximize the user satisfaction (Semi-Dynamic) • Maximize some profit/benefit (Static)
Costfunction whereoptimize = Minif goal isDynamic (minimum number of steps) and Maxotherwise • Cost of a leaf: • 0 for Dynamic (minimizeNumber of steps) • Maxpreference (usingpref) of tuples for Semi-Dynamic • Maxvalue (e.g. price, revenue) of the tuples for Static
Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • Exact Solutions (FullTree and FastOpt) • ApproximateSolution • ExperimentalResults • Conclusions
Exact Solution (FullTree) Input: queryQ, database D Output: optimalcost • Construct the Query Relaxation • Compute the cost for eachnode (bottom-up) • Returns the cost of the root
FullTreeAlgorithm (Dynamic) 1 1 2 0.3 0.7 0 0 1 1 1 0.3 0.7 0 0
Fast Solution (FastOpt) Idea: prune the unpromisingbranches in advance and expandonly the goodones • Associate an upper and a lowerboundateachnode • Upperboundis the cost of the nodewhen the probabilityis 1 on no nodes • Lower boundis the opposite of the upper • Remove a nodeifhislowerboundisgreaterthan some upperbound of the siblingnodes
FastOptAlgorithm (Dynamic) Prune!!!
Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • Exact Solutions (FullTree and FastOpt) • ApproximateSolution • ExperimentalResults • Conclusions
Approximate Solution (CDR) Idea: compute the costdistribution of eachnodeand expand the onlynodethatmaximizes the probabilitythat the costislessthan the cost of all the siblings. • Associate a b-sizehistogram to eachnode • Construct the tree for the first L levels • Assignsuniformprobability to nodes • Use convolution to find sum (choicenodes)/min (relaxationnodes) distributions for costs • Expand the branchthathas the biggestprobability of having the lowercost
Compute costdistributions Query size 5 Remember the cost formula Choicenodeatlevel 2, costuniformlydistributed in [1,3] Compute relpref * (1 + cost(n'))
Compute costdistributions Probabilitydistribution of nyes Probabilitydistribution of nyes Sum the distributions of yes child and no childusing sum-convolution
Compute CostDistributions Compute the minconvolution of the child of relaxationnode
Choose the Branch to Expand Idea: for each son of the root, compute the probabilitythat the costissmallerthan the siblings and choose the son with the highestprobability Expandthis! Pr(n2<n1) = 0.4 Pr(n1<n2) = 0.6 n1 n2
Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • Exact Solutions (FullTree and FastOpt) • ApproximateSolution • ExperimentalResults • Conclusions
Experimental Setup • Datasets: • US Home dataset: 38k tuples 18 attributes • Car dataset: 100k tuples, 31 attributes • Synteticdatasets: 20k to 500k tuples • Baseline algorithms: • Query refinementalgorithm[Mishra09] (QueryRef) • Random relaxation • Greedy: choose the first non empyotherwise random
Experimental Setup • Effectiveness: • Query time • Size of the tree (number of nodes) • Cost of the root (expectednumber of steps) • CDR calibration: • Impact of L and number of buckets • User study: • 125 users with MechanicalTurk • Random queries with 4-8 attributes • Evaluation of the usefulnesssystem
RootCost • CDR close to optimal • QueryRefis 30% worse on average • Random is 150% worsethanFullTree
Goal comparison • All the objectivefunctionscorrectlyoptimizetheirgoals • Dynamic and Semi-Dynamic are verysimilar in performance
Query Time Exponentialbehaviour Efficient for small queries 1.4 sec for querysize 10!!!
User Study Q1 - Rate the suggested refinements Q2 - Did you like the system guiding you? Q3 - Did the system help you arrive to the results fast? Q4 - Did you prefer using the help of this system to relaxing the query by yourself?
Conclusions • We propose • anovelprincipled, user-centric and interactiveapproach for the empty-answerproblem • two exact algorithms and an approximate algorithm • We show that • the framework can deal with the combinatorialexplosion • the usereffortisminimized • the user is generally satisfied by the system
Bibliography [Mishra09] C. Mishra and N. Koudas, “Interactive query refinement,” in EDBT,2009. [Roy08]S. Basu Roy, H. Wang, G. Das, U. Nambiar, and M. Mohania, “Minimum-effort driven dynamic faceted search in structured databases,” in CIKM, 2008. [Chadhuri04] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, “Probabilistic ranking of database query results,” in VLDB, 2004. [Baeza11] R. A. Baeza-Yates and B. A. Ribeiro-Neto, Modern Information Retrieval,2011.
CostProbability Probabilitythat the cost of n1lesserthancost n2 Relaxation of the root