220 likes | 354 Views
Mining Favorable Facets. Presenter : Wei-Hao Huang Authors : Raymond Chi-Wing Wong, Jian Pei, Ada Wai-Chee Fu, Ke Wang SIGKDD, 2008. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.
E N D
Mining Favorable Facets Presenter : Wei-Hao Huang Authors : Raymond Chi-Wing Wong, Jian Pei, Ada Wai-Chee Fu, Ke Wang SIGKDD, 2008
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • The importance of dominance and skyline analysis in multi-criteria decision making applications. • Fixed order v.s. different customers may have different preferences on nominal attributes. • Finding favorable facets.
Objectives • Propose to minimal disqualifying condition (MDC) which can summarize favorable facets and is meaningful to the user. • Develop two algorithms: • Computing MDC On-the-fly (MDC-O) • A Materialization Method (MDC-M) • Use real data sets and synthetic data set to verify effectiveness and efficiency
Methodology • Skyline analysis • Naïve Method • Minimal Disqualifying Conditions(MDC) • MDC On-the-fly (MDC-O) • A Materialization Method (MDC-M)
Minimal Disqualifying Conditions • Used to summarize favorable facets effectively. R’={(T,M)} R’’={(H,M)} MDC(f)={(T,M),(H,M)}
MDC-O: Computing MDC On-the-fly Point: P Data Set: D Template: R Process MDC(P)
MDC-M: A Materialization Method Data Set: D Template: R Process SKY(R) MDC
Indexing for Speed-up • Use R-tree index structure • An R-tree can be built the totally ordered attributes T • Find points that quasi-dominates p, a range search is conducted on the R-tree
Experiments • Synthetic Data Set • Dimension • Numeric attributes • Nominal attributes • Tuples • Template Size • Cardinality of Nominal Attributes • Zipfian Parameter • Real Data Set • Nursery • Automobile
Synthetic Data Set-Tuples 500k -> 1000k
Real Data Set • Nursery Data Set • There are 12,960 instances and 8 attributes. • The results in the performance are similar to synthetic data sets. • Automobile Data Set • Computation times were negligibly small. • Honda, Mitsubishi and Toyota.
Conclusions • MDC is effective in summarizing the favorable facets. • The experimental results show proposed methods are efficacious. • Future work is used to dynamic data and ordering is an interesting topic.
Comments • Advantages • Finding favorable facets which has not been studied before. • Effectiveness and the efficiency of the mining. • Applications • Information retrieval