1 / 20

Mining Favorable Facets

Mining Favorable Facets. Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University). KDD ’ 07, August 12-15, 2007, San Jose, California, USA. Outline. Introduction

Download Presentation

Mining Favorable Facets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University) KDD’ 07, August 12-15, 2007, San Jose, California, USA

  2. Outline • Introduction • Skyline • Algorithm • Empirical Study • Conclusion

  3. 1. Introduction Suppose we want to look for a vacation package We want to have cheaper price. We want have a higher hotel-class. Suppose we compare package a and b • We know that package a is “better” • than package b • because • Price of package a is smaller • Hotel-class of package a is higher 3 packages 1000 Package a “dominates” package b 5

  4. 1. Introduction Thus, we do not need to consider package b. • We know that • Package a has a cheapest price • Package c has a highest hotel-class Packge a and c don’t dominate by other points Thus, package a and package c are all of the “best” possible choices. We call that package a and package c are skyline points.

  5. Suppose a customer have the following preferences. H < T < M Suppose another customer have the following preferences. H < M < T The skyline points are packages a and c. The skyline points are packages a, c and e. Suppose we want to look for a vacation package 6 packages Different customers may have different preferences on Hotel-group. In other words, different preferences give differentn skyline points.

  6. Suppose hotel-group Mozilla wants to promote its own packages (e.g., package f) to potential customers. 1. Introduction Alice T < M {a, c} Bob No special preference {a, c, e, f} What preferences make package f a skyline point? {a, c, e} Chris H < M {a, c, e} David H < M < T Emily H < T < M {a, c} {a, c, e, f} Fred M < T Bob and Fred are the potential customers.

  7. 1. Introduction Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets

  8. {T < M} {T < H} {H < M} {H < T} {M < T} {M < H} {T < M, H < M} {T < M, T < M} {H < T, H < M} {T < H, M < H} … SKY={a,c} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} {T < M, T < M, H < M} {T < M, T < M, M < H} SKY={a,c} SKY={a,c} T Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? 1. Introduction We can solve the problem by a naive method: Lattice Search {} SKY={a, c, e, f} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={}

  9. , {M < H} {T < M} {H < M} {T < M, H < M} {T < M, T < M} {H < T, H < M} … SKY={a,c} SKY={a,c} SKY={a,c,e} {T < M, T < M, H < M} {T < M, T < M, M < H} SKY={a,c} SKY={a,c} T Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? 1. Introduction We can solve the problem by a naive method: Lattice Search Consider package f Preferences: {} , {T < H} , {M < T} , {H < T} {} , {T < H, M < H} SKY={a, c, e, f} {T < H} {H < T} {M < T} {M < H} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} {T < H, M < H} SKY={a,c,e,f} SKY={}

  10. We need to compute all skyline points for each possible preference • There are many preferences which qualify package f as a skyline point • This approach has two disadvantages. 1. Computation is costly. 2. It is difficult to interpret the results.

  11. {T < M} {H < M} {T < M, H < M} {T < M, T < M} {H < T, H < M} … SKY={a,c} SKY={a,c} SKY={a,c,e} {T < M, T < M, H < M} {T < M, T < M, M < H} SKY={a,c} SKY={a,c} T Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? 1. Introduction We can solve the problem by a naive method: Lattice Search Consider package f We find that whenever the preference contains “T < M” or “H < M”, package f is not a skyline point. {} border for f SKY={a, c, e, f} {T < H} {H < T} {M < T} {M < H} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} We can say that “T < M” or “H < M” is a minimal disqualifying condition (MDC). {T < H, M < H} SKY={a,c,e,f} SKY={}

  12. 3. Algorithm • How to find MDCs of a point? Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

  13. 3. Algorithm Point q is said to quasi-dominate point p if all attributes of point q are NOT worse than those of point p. e.g. Package a quasi-dominates package f because 1. Package a has a lower (or better) price than package f 2. Package a has a higher (or better) hotel-class than package f If package a quasi-dominates package f, we define Raf as follows. {T < M}

  14. Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point? 3. Algorithm • Two Algorithms • MDC-O: Computing MDC On-the-fly • Does not store MDCs of points • Compute MDC of a given points on-the-fly • MDC-M: A Materialization Method • Store MDCs of all points • Indexing Method for Speed-up • R*-tree

  15. Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point? 3.1 MDC-O: Computing MDC On-the-fly • On-the-fly Algorithm • Given • data point p • Variable • MDC(p): minimal disqualifying condition • Algorithm • MDC(p)   • For each data point q which quasi-dominates p • if MDC(p) does not contain Rqp • insert Rqp to MDC(p) • Return MDC(p)

  16. 3.2 MDC-M: A Materialization Method Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point? • Materialization Algorithm • Variable • MDC(p): minimal disqualifying condition • Algorithm MDC(p)   • For each data point p • For each data point q which quasi-dominates p • if MDC(p) does not contain Rqpthen insert Rqp to MDC(p) • Store MDC(p)

  17. 4. Empirical Study • Datasets • Synthetic Dataset • Real Dataset (from UCI) • Nursery Dataset • Automobile Dataset • Default Values (Synthetic) • No. of tuples = 500K • No. of numeric dimensions = 3 • No. of categorical dimensions = 1 • No. of values in a nominal dimension = 20

  18. 4. Empirical Study Without indexing: MDC-O: Slowest Search Time MDC-M: Faster Search Time Storage of MDC: 8MB With indexing: MDC-O and MDC-M: Fast Search Time

  19. 4. Empirical Study A salesperson should NOT promote this car to the customer who prefers Toyota to Honda. • Automobile • Three car models A salesperson should NOT promote this car to the customer who prefers Toyota to Honda. A salesperson should promote this car to ANY customers.

  20. 5. Conclusion • Skyline • Favorable Facets • Minimal Disqualifying Condition • Algorithm • On-the-fly • Materialization • Empirical Study

More Related