290 likes | 521 Views
A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis , Kyriakos Mouratidis. Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone Manco. Scenario. Some users want to select objects with specific features, based on their preferences
E N D
A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIESLeong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone Manco
Scenario • Some users want to select objects with specific features, based on their preferences • These requests are performed as database queries • Queries express users’ preferences by different weights on the attributes of the searched objects • These are the so-called Preference Queries
Scenario • The result of a preference query is the object in the database with the highest aggregate score • If multiple preference queries are issued simultaneously, an object may be the best solution for many of them: • Who will be coupled to the object? • Which results will receive other users? A FAIR ASSIGNMENT PROBLEM
Scenario - Example • Internship assignment, based on student’s preferences in terms of: • nature of the job • Salary • office location • other features… • For a single student the system returns a set of top-k results with respect of his/her preference function • An available internship position could be the top-1 choice of many interested students. It can only be assigned to one of them • The system must look for a fair 1-1 matching between the users and the objects • Stable Marriage Problem (SMP)
Scenario - Example • Internship assignment, based on student’s preferences in terms of: • nature of the job • salary • office location • other features… Users’ preference functions f1=0.8X+0.2Y f2=0.5X+0.5Y Positions’ attributes a=(0.5,0.6) b=(0.2,0.7) c=(0.8,0.2) d=(0.4,0.4) Best point (standing) Y f2 b f1 a d c (salary) X
Related Algorithms • 1-1 assignment problem is related to three types of search: • Spatial Assignment problem (model: SMP) • Chain Algorithm • Skyline Queries • Branch-and-Bound Skyline Algorithm • Top-k Search • Threshold Algorithm • Stablepair:Given two datasets A and B, a 1-1 matching M is stable if there are no two pairs (a , b) and (a’ , b’ ) in M , such that a prefers b’ to b, and b prefers a to a ‘(where a, a’ ∈ A and b, b’ ∈ B).
Spatial Assignment Problem – Chain Algorithm • Its goal is to find a stable pair • Its preference function is based on Euclidean distance • a prefers b’ to b if dist(a,b’) < dist(a,b) • A pair (a,b) is stable if and only if a’s closest object is b and b’s closest object is a, where a and b are among the unassigned (remaining) objects in A and B • Chain algorithm: • pick an object from A (randomly) or Q; • find the NN (Nearest Neighbour) of a∈ A (aB∈ B); • find the NN a’ ∈ A of aB∈ B; • if a ≠ a’, aBis pushed into a queue Q; otherwise pair (a,aB) is output as the result pair and a, aB are removed from A and B.
Skyline Queries – BBS Algorithm • A different approach exploits the set’s skyline concept • The skyline of O consists of all points o ∈ O that are not dominated by any other point in O. • It’s faster if the objects are indexed by an R-Tree • BBS algorithm: • Compute the skyline of O by accessing the minimum number of R-tree nodes it is I/O optimal • Access the node of the tree in ascending distance order from the sky point • Sky point is the (imaginary) most preferable object possible. • Once a data object is found, it is added to the skyline and all R-tree nodes/subtrees dominated by it are pruned.
BBS Algorithm–Example sky M1 M2 M3 m1 m2 m3 m4 m5 m6 m7 M1 a m2 M3 m7 c d e g h a c d e i j l k m b f ... m3 m6 b m1 g i h f INN Heap = {m2} INN Heap = {a} INN Heap = {m1, m2, m3, M2,M3} INN Heap = {M1, M2,M3} INN Heap = {e, i, m1, m2, M2,M3} j m4 l k Osky = {e} Osky = {e, a} m5 M2 m
Skyline Queries –DeltaSky Algorithm • Is used in a dinamic dataset, where objects can be added/removed • It determines the intersection between MBR and EDR without explicity calculating the EDR itself • EDR: Exclusive Dominance Region • MBR: Minimum Bounding Rectangle • For each deletion in Osky, DeltaSky • Traverse the R-Tree once • If more deletion are performed, • DeltaSky incurs in high I/O cost
Top-k search – Threshold Algorithm • O is a collection of n objects, an object o has D attributes • D S1, S2, …, SD sorted lists, one for each attribute, ordered by the atomic scores • A top-k query, based on an aggregate function f, retrieves a k-subset Otopk of O (k<n), such that f(o) ≥ f(o’), ∀o ∈ Otopk, o’ ∈ (O−Otopk) • The most used algorithm for top-k queries is Threshold Algorithm (TA) • pops objects form the sorted lists in round-robin manner • for each object o, f(o) is computed • The set of k objects with the highest score is maintained • the search terminates when the k-th score is greater than or equal to threshold T
Top-k search - BRS & Onion • Branch-and-bound Ranked Search: • Visit R-tree nodes in an order determined by a preference function f • Maxscore(M): is an upper bound of the score for any object inside the MBR M • Nodes are accessed in descending maxscore order • Terminate when the score of the k-th best object is no smaller than the next node’s maxscore. • Onion: • Compute the convex hull of the data objects and set it as the layer • Remove the hull object • Expand the layers from the first one moving inwards
Problem Statement • A set of user preference function F over a set of multidimensional objects O. • The score f(o) of an object o is: • Our goal is to find stable 1-1 matching between F and O • A function-object pair (f, o) in F × O is stable, if there is no function f’ ∈ F, f’ ≠ f, f’(o) > f(o) and there is no object o’ ∈ O, o’ ≠ o, f(o’) > f(o), where F and O are the sets of the unassigned (remaining) functions and objects.
Algorithms – Brute Force Search Assumption: F kept in memory, O indexed by an R-tree (Ro) on the disk • Progressive technique • Issue top-1 queries against O, one for every function in F (|F| pairs) • The pair (f,o) with the highest f(o) value should be stable • ois the top-1 preference of f • f’(o) cannot be greater than f(o) for any function f’ ≠ f • After the pair (f,o) is added to the query result • o is removed from Ro • If o was the top-1 object for another function f’ ≠ f, top-1 search must be re-applied for f’ • Improvements: maintaining the search heap for each top-1 query, the search can resume • Drawback: large amount of memory!
Algorithms – Skyline-Based Search • Assumption: if F contains only monotone function, than the top-1 objects should be in Osky • Stable function-object pairs between Osky and F are found and output • Osky is computed and maintained SB(set F, R-tree Ro) Osky := ∅ while |F| > 0 do First we compute the skyline Osky Osky := ComputeSkyline(RO) UpdateSkyline(Osky, o, RO) Then while there are unassigned functions the pair (f,o) with the highest f(o) score is found (f,o):=BestPair(F, Osky) Output (f,o) F := F-f; O := O-o; Osky:= Osky-o Finally, f and o are removed from F and O, and Oskyis updated
Implementation - BestPair • A brute force implementation is not efficient: • Requires |F| * |Osky| comparisons (cross product F x Osky) • Another approach is to index either F or Osky • The indexing of Oskyis not practical (number of updates) • F is indexed since only one deletion is performed in it at each loop • Functions are indexed as sorted lists, one for each coefficient • It’s applied a reverse top-1 search on the lists, where the roles of objects and functions are swapped • Each list L1,…, LD (D is the dimensionality) holds the (f.αi,f) pairs of all functions f ∈ F, sorted on f.αiin descending order • The threshold T can be calculated as • The sum of the coefficients could be greater than 1, then a normalization of the function is required Normalization algorithm • Rank dimensions in descending order based on o’s corresponding values • B=1 , for each dimension i: βi = min{B,li} , B = B-βi
Implementation - BestPair (Example) o = (10,6,8) fa = 0.8X + 0.1Y + 0.1Z fb = 0.2X + 0.8Y + 0.0Z fc = 0.5X + 0.4Y + 0.1Z fd = 0.0X + 0.1Y + 0.9Z fe = 0.2X + 0.4Y + 0.4Z fbest = fa = 9.4 Ttight = 9.6 Ttight = 9 fb(o)=6.8 fd(o)=7.8 fc(o)=8.2 fa(o)=9.4 l1=0.5, l2=0.8, l3=0.9 β1= 0.8 , β2= 0 , β3= 0.2 β1= 0.5 , β2= 0 , β3= 0.5 l1=0.8, l2=0.8, l3=0.9 β1 = min{B,l1} = 0.8 B = B-0.8 = 0.2 β3 = min{B,l3} = 0.2 B=1 B=1 β1 = min{B,l1} = 0.5 B = B-0.5 = 0.5 β3 = min{B,l3} = 0.5 B=0 B=0
Implementation - BestPair (Improvements) • TA access order • The accessing order changes from Round-Robin to li*oi descending values order (li is the last value seen in each Li) • Resuming search • The state of the previous applied search for the object in Osky is stored and the search can be resumed, if necessary • The drawback of this method is the extra memory required • Iterative solution: the queue’s maximum capacity is set to Ω = ω * |F| • the queue stores only the top-Ω functions • Ω is decreased by 1 when an element is popped from the queue; if Ω=0, its value is reset to ω * |F| • this allow to control the tradeoff between execution time and memory usage
Implementation – UpdateSkyline (Example) • To minimize the tree traversal cost during skyline maintenance, the dominated objects by o are pruned and these entries are added to the pruned list o.plist • To minimize the required memory, each pruned object is kept in the plistof only one skyline object Scand := ∅ algorithm UpdateSkyline(set Osky, object o,R-tree RO) new Osky :=ResumeSkyline(Scand , Osky) algorithm ResumeSkyline(set Scand, set Osky) while Q is not empty do else ⊳ not dominated by any skyline object else Scand :={E|E ∈ o.plist, E ∉o’.plist, ∀o’ ∈ Osky } c b m1 M2 de-heap top entry E of Scand a d if E is dominated by any o ∈ Oskythen add E to o.plist M3 if E is non-leaf entry then visit node N pointed by E for all entries E’ ∈ N do push E’ into Scand Osky :=Osky ∪ E Osky = {a, b, c} Osky = {c} Osky = {a, c} Scand = {} Scand = {b, d} Scand = {a, b, d} Scand = {M3, a, b, d} Scand = {M2, M3, a, b, d} Scand = {c, M2, M3, a, b, d} Scand = {m1, c, M2, M3} Scand = {d} b.plist = {d} c.plist = {M2, M3} c.plist = {M2}
Algorithms – Skyline-Based Search (Optimization) • The numbers of loops required can be reduced if multiple stable object-function pairs are output at each loop SB(set F, R-tree RO) Osky :=∅; Odel := ∅ while |F | > 0 do ⊳more unassigned functions if Osky =∅ then Osky :=ComputeSkyline(RO) else UpdateSkyline(Osky, Odel, RO) Odel := ∅ Fbest :=∅ for all o ∈ Osky do find function o.fbest∈F that maximizes f(o) Fbest :=Fbest ∪ o.fbest for all f ∈ Fbest do find object f.obest∈Osky that maximizes f(o) for all f∈Fbest do if (f.obest).fbest=f then F := F − f ; O := O − f.obest Osky := Osky−f.obest; Odel := Odel∪f.obest • Fbest is the subset of F that includes the functions o.fbest that maximize f(o) • For each f∈Fbest, the object f.obest that maximizes f(o) is coupled with the function f • If (f.obest).fbest=f, then (f, f.obest) is stable and the function/object is removed from F/O and Osky • At least one pair is guaranteed to be output
Problem Variants • Objects and Functions with capacities • Multiple objects/functions may share the same features only one object/function with a capacity attribute • Once a pair is found, the capacity of f and o are reduced by 1 • Functions with Different Priorities • f.γ is the priority of the function • To increase the efficiency of TA, a skyline Fsky is built on the functions
Experiments • Three types of synthetic datasets: • independent values are generated uniformly and independently • correlated object’s values are close in all dimensions (if an object is good in one dimension, it is likely to be good on the other ones too) • anti-correlated objects that are good in one dimension tend to be poor in the other ones
Conclusions • SB is proven to be: • I/O optimal by using an incremental skyline maintenance algorithm, which is proven to be I/O optimal • CPU optimal by accelerating the matching between functions and skyline objects and identifying multiple stable pairs in each iteration
Conclusions THANK YOU FOR YOUR ATTENTION Dedicated to Chip…. RIP