A Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem Davide Mottin, Alice Marascu, SenjutiBasu Roy Gautam Das, Themis Palpanas, YannisVelegrakis Talk by Davide Mottin at Yahoo! Research Barcelona

Who am I? • Born in Marostica • Live in Trento • I’mmember of the dbTrentogroup in the University of Trento • Advisors: ThemisPalpanas and Yannis Velegrakis

Empty-Answer Problem

Empty-AnswerProblem CAR DB Alarm, DSL, Manual No answer {}

Issues • Usersneed a productmatchinghis/herpreferences • Difficult to propose an approximateanswerclose to userneeds • The systemdoesnotprovidesufficient help

Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • ExactSolutions • ApproximateSolution • ExperimentalResults • Conclusions

Existing Solutions • Ranking function • Propose ranked results that are close to the user preferences • Both IR [Baeza11] and database solutions [Chaudhuri04] • Query relaxation • Remove or change one of the conditions in user query [Mishra09]

Query Relaxation CAR DB Alarm, DSL, Manual {}

How ManyRelaxations? Exponential in the size of the query

Challenges • Too many relaxation proposed • Lack of a principled method to propose the next relaxation • Exploring all the relaxations is impractical • Limited user interaction with the system • Lack of user-centric model and motivation for a refinement SOLUTION??? Interactive Query Relaxation

Interactive Query Relaxation CAR DB Alarm, DSL, Manual Remove DSL? RemoveAlarm? YES NO Result: {Askari, A10, …} {}

Applications • Small mobile • Hand-helddevices • Interaction with an agent via telephone • Reservations in restaurants • …

Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • ExactSolutions • ApproximateSolution • ExperimentalResults • Conclusions

Query RelaxationTree Relax DSL? DSL isnotrelaxableanymore Relax Alarm? Problem: Find the optimalpaththatmaximizeor minimize some user-centricquantity

Query Relaxation Tree • Nodesrepresent • Next relaxation proposed (relaxationnodes) • Yes/No User choices(choicenode) • Leaves represent • Non-relaxablequery • Non-emptyquery • Choicebrancheshaverelprefyesrelprefnoprobabilities • A refusedrelaxation = cannot be relaxedfurther (hard constraint) • For Eachnodewe compute a costthatdepends on the optimizationadopted (Dynamic, Semi-Dynamic, Static)

User-Centric Model • Prior(t,Q,Q’) • userknowledgeabout the existence of a tuple t satisyingrelaxedqueryQ’ • Pref(t, Q’) [preferencefunction] • User preferenceabout a tuplegiven the queryQ’ t? Q' DB t t

User-Centric Model Q: Whatis the probability the usersays NO to a relaxation? A: The userdoesn’tlikeany of the tuplesthatsatisfiesthe relaxedqueryQ’

Problem Definition Given a query Q and a database D, find the sequence of relaxations in the Query Relaxation Tree that (3 separate goals): • Minimize the number of relaxations (Dynamic) • Maximize the user satisfaction (Semi-Dynamic) • Maximize some profit/benefit (Static)

Costfunction whereoptimize = Minif goal isDynamic (minimum number of steps) and Maxotherwise • Cost of a leaf: • 0 for Dynamic (minimizeNumber of steps) • Maxpreference (usingpref) of tuples for Semi-Dynamic • Maxvalue (e.g. price, revenue) of the tuples for Static

Outline • Background • Query relaxation • Interactive queryrelaxation • User Model • Query RelaxationTree • Problemdefinition • Solutions • Exact Solutions (FullTree and FastOpt) • ApproximateSolution • ExperimentalResults • Conclusions

Exact Solution (FullTree) Input: queryQ, database D Output: optimalcost • Construct the Query Relaxation • Compute the cost for eachnode (bottom-up) • Returns the cost of the root

FullTreeAlgorithm (Dynamic) 1 1 2 0.3 0.7 0 0 1 1 1 0.3 0.7 0 0

Fast Solution (FastOpt) Idea: prune the unpromisingbranches in advance and expandonly the goodones • Associate an upper and a lowerboundateachnode • Upperboundis the cost of the nodewhen the probabilityis 1 on no nodes • Lower boundis the opposite of the upper • Remove a nodeifhislowerboundisgreaterthan some upperbound of the siblingnodes

FastOptAlgorithm (Dynamic) Prune!!!

Approximate Solution (CDR) Idea: compute the costdistribution of eachnodeand expand the onlynodethatmaximizes the probabilitythat the costislessthan the cost of all the siblings. • Associate a b-sizehistogram to eachnode • Construct the tree for the first L levels • Assignsuniformprobability to nodes • Use convolution to find sum (choicenodes)/min (relaxationnodes) distributions for costs • Expand the branchthathas the biggestprobability of having the lowercost

Compute costdistributions Query size 5 Remember the cost formula Choicenodeatlevel 2, costuniformlydistributed in [1,3] Compute relpref * (1 + cost(n'))

Compute costdistributions Probabilitydistribution of nyes Probabilitydistribution of nyes Sum the distributions of yes child and no childusing sum-convolution

Compute CostDistributions Compute the minconvolution of the child of relaxationnode

Choose the Branch to Expand Idea: for each son of the root, compute the probabilitythat the costissmallerthan the siblings and choose the son with the highestprobability Expandthis! Pr(n2<n1) = 0.4 Pr(n1<n2) = 0.6 n1 n2

Experimental Setup • Datasets: • US Home dataset: 38k tuples 18 attributes • Car dataset: 100k tuples, 31 attributes • Synteticdatasets: 20k to 500k tuples • Baseline algorithms: • Query refinementalgorithm[Mishra09] (QueryRef) • Random relaxation • Greedy: choose the first non empyotherwise random

Experimental Setup • Effectiveness: • Query time • Size of the tree (number of nodes) • Cost of the root (expectednumber of steps) • CDR calibration: • Impact of L and number of buckets • User study: • 125 users with MechanicalTurk • Random queries with 4-8 attributes • Evaluation of the usefulnesssystem

RootCost • CDR close to optimal • QueryRefis 30% worse on average • Random is 150% worsethanFullTree

Goal comparison • All the objectivefunctionscorrectlyoptimizetheirgoals • Dynamic and Semi-Dynamic are verysimilar in performance

Query Time Exponentialbehaviour Efficient for small queries 1.4 sec for querysize 10!!!

User Study Q1 - Rate the suggested refinements Q2 - Did you like the system guiding you? Q3 - Did the system help you arrive to the results fast? Q4 - Did you prefer using the help of this system to relaxing the query by yourself?

Conclusions • We propose • anovelprincipled, user-centric and interactiveapproach for the empty-answerproblem • two exact algorithms and an approximate algorithm • We show that • the framework can deal with the combinatorialexplosion • the usereffortisminimized • the user is generally satisfied by the system

Bibliography [Mishra09] C. Mishra and N. Koudas, “Interactive query refinement,” in EDBT,2009. [Roy08]S. Basu Roy, H. Wang, G. Das, U. Nambiar, and M. Mohania, “Minimum-effort driven dynamic faceted search in structured databases,” in CIKM, 2008. [Chadhuri04] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum, “Probabilistic ranking of database query results,” in VLDB, 2004. [Baeza11] R. A. Baeza-Yates and B. A. Ribeiro-Neto, Modern Information Retrieval,2011.

CostProbability Probabilitythat the cost of n1lesserthancost n2 Relaxation of the root

A Probabilistic Optimization Framework for the Empty-Answer Problem

A Probabilistic Optimization Framework for the Empty-Answer Problem

Presentation Transcript

A Framework for Unrestricted Whole-Program Optimization

A Problem Solving Framework

A Probabilistic Framework for Video Representation

Probabilistic Optimization Exercise The Snack Can

A Functional Query Optimization Framework

IQR: An Interactive Query Relaxation System for the Empty-Answer Problem

A Probabilistic Framework for Structure-based Alignment

A Logical Framework for XQuery Optimization

Bucket Elimination: A Unifying Framework for Probabilistic Inference

A Probabilistic Framework for Semi-Supervised Clustering

The Consumer’s Optimization Problem

3 Components for a Spreadsheet Optimization Problem

Probabilistic Modeling for Combinatorial Optimization

PR-OWL: A Framework for Probabilistic Ontologies

Fast Probabilistic Modeling for Combinatorial Optimization

Probabilistic Framework for Feature-point Matching

A Probabilistic Pointer Analysis for Speculative Optimization

A Problem Solving Framework

The Problem with Probabilistic Parsing

A Probabilistic Framework for Video Representation

The Consumer’s Optimization Problem

A Probabilistic Pointer Analysis for Speculative Optimization