1 / 22

Stochastic Skyline Operator

Stochastic Skyline Operator. Xuemin Lin School of Computer Science University of New South Wales Australia. Joint Work with: Ying Zhang (UNSW), Wenjie Zhang (UNSW), Muhammad Aamir Cheema (UNSW). Introduction: Skyline. a user preference ≺ is given on each dimension of R d .

july
Download Presentation

Stochastic Skyline Operator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic Skyline Operator Xuemin Lin School of Computer Science University of New South Wales Australia Joint Work with: Ying Zhang (UNSW), Wenjie Zhang (UNSW), Muhammad Aamir Cheema (UNSW)

  2. Introduction: Skyline • a user preference ≺ is given on each dimension of Rd. • two points in Rd, udominates v (u≺ v) • i (1 ≤ i ≤ d), u.i ≺= v.i; j (1 ≤ j ≤ d), u.j ≺ v.j • Skyline: • Points not dominated by another point. • Multiple criteria optimal decision making: minimum set of candidates of best options regarding any monotonic functions.

  3. Skyline of Uncertain Objects Probabilistic Skyline: (VLDB07, PODS09, etc) • Skyline probabilities by possible worlds. • Providing the probabilities not worse than any other objects. Provide minimal candidate set of optimal solutions? • How to define optimal options? • How to characterize the minimum candidate set?

  4. Expected Utility & Stochastic Order Expected Utility Principle: • Given a set U of uncertain objects and a decreasing utility function f, select U in U to maxmize E[f (U)]. Stochastic Order: • Given a family ℱ of utility functions, U ≺ℱ V if for each f in ℱ E[f(U)] ≥ E [f(V)] Decreasing Multiplicative Functions: • ℱ= where fi is nonnegative decreasing. Low orthant order: the stochastic order is defined over the family of decreasing multiplicative functions.

  5. Example • Utility function: • : nonnegative decreasing • : nonnegative decreasing e.g. ; ; 1. B never preferred by the expected utility principle! 2. Psky(A) = 1, Psky (B) = 0.5, Psky (C) = 0.01

  6. Contributions • Introduce a novel skyline operator: stochastic skyline. • Guarantee the minimal candidate set to the optimal solutions regarding decreasing multiplicative functions. • NP-Completeness of computing stochastic skyline regarding dimensionality d. • Novel statistic base pruning techniques. • Efficient partition base verification algorithms: polynomial if d is fixed.

  7. Problem Statement Stochastic Order (lower orthant order): Given U & V, U stochastically dominates V (U ≺sd V) if for any x, U.cdf (x) ≥ V.cdf (x) and exists y such that U.cdf (y) > V.cdf (y). U.cdf (x): probability mass of U in the rectangular region R ((0,0,…0), x); see the shaded region. Stochastic Skyline: the objects in U not stochastically dominated by any others, called stochastic skyline. Problem Statement: efficiently compute stochastic skyline regarding discrete cases.

  8. Minimality of stochastic skyline Stochastic skyline removes all objects not preferred by any non-negative decreasing functions!

  9. Framework • Phase 1: filtering. Remove non-promising objects. • Phase 2: verification. Test stochastic dominance between two objects. BBS combing with a heap: • the “near” progressiveness • only need to test either U ≺sd V or V ≺sd U in most cases (but not both).

  10. Testing if U ≺sd V • Violation point: a point x in Rd+ is a violation point regarding U ≺sd V if U.cdf (x) < V.cdf (x). • Testing algorithm: if no violation points, then U ≺sd V. • Not enough to test instances.

  11. Reduce to Grid Points • Test if U.cdf ≥ V.cdf against grid points only (see (a)). • Testing the switching grid points only (see solid lines (b)).

  12. Algorithm • Given a rectangular region R (x, y), if U.cdf (x) ≥ V.cdf (y), then no violation point in R (x, y). • Partition base testing algorithm: • Get switching points • Initial check • Iteratively partition the grid to throw away non-promising sub-grids

  13. Complexity • The algorithm runs O (dm log m + md (T (Uartree) + T (Vartree))) where m is the number of instances in V. • NP-Complete regarding d. • Covert (the decision version of) the minimal set cover problem to a special case of the testing problem.

  14. Filtering Techniques Pruning Rule 1: throw away fully dominated entries.

  15. Filtering Techniques Pruning Rules 2: applying Cantelli’s Inequality to get upper-bonds.

  16. Size Estimation: Expected size: size of stochastic skyline in Rd is bounded by that of conventional skyline in Rd+1; i.e., lnd (n)/(d+1)!

  17. Empirical Study • C++ with STL compiled with GNU GCC on 2.4GHz Debian • Real data set: NBA player’s game-by-game statistics • Synthetic dataset: anti-correlated, correlated, independent

  18. Summary • a novel skyline operator: stochastic skyline • guarantee minimality . • NP-complete to test stochastic order (lower orthant order) . • novel efficient algorithms to compute stochastic order. Future work: • F is a set of all decreasing functions?

  19. Thank you!

More Related