190 likes | 321 Views
Department of Information & Computer Education, NTNU. Mining Association Rules from Stars. Eric Ka Ka Ng, Ada Wai-Chee Fu, and Ke Wang, 2002 IEEE International Conference on Data Mining (ICDM'02) , December 09 - 12 2002, Maebashi City, Japan. Advisor : Jia-Ling Koh Speaker : Chen-Yi Lin.
E N D
Department of Information & Computer Education, NTNU Mining Association Rules from Stars Eric Ka Ka Ng, Ada Wai-Chee Fu, and Ke Wang, 2002 IEEE International Conference on Data Mining (ICDM'02),December 09 - 12 2002, Maebashi City, Japan. Advisor:Jia-Ling Koh Speaker:Chen-Yi Lin
Department of Information & Computer Education, NTNU Outline • Introductions • Problem Definition • The Proposed Method • Experimental Results • Conclusions
Department of Information & Computer Education, NTNU Dimension table Fact table (FT) Introductions • In real life, a database is typically made up of multiple tables and one important case is where some of the tables form a star schema.
Department of Information & Computer Education, NTNU Problem Definition (1/2) • Dimension table contains primary key (tid), some other attributes and no foreign keys. • The attributes in the dimension tables are unique. • The attributes take categorical values. • Fact table (FT) • stores the tids from dimension tables as foreign keys.
Department of Information & Computer Education, NTNU tid categorical value Problem Definition (2/2) Dimension table and its binary representation
Department of Information & Computer Education, NTNU The Proposed Method (1/8) • tid_list is an ordered list of elements of the form tid(count). • : e.g. • : e.g. • : e.g.
Department of Information & Computer Education, NTNU count=6 count=5 Hence the itemset is frequent The Proposed Method (2/8) Minsup=5
Department of Information & Computer Education, NTNU The Proposed Method (3/8) • Binding multiple Dimension Tables • (1) To assign each combination of tid from A and tid from B in FT a new tid • (2) and to set the tid in the tid_lists for items in AB to the corresponding new tid.
Department of Information & Computer Education, NTNU The set of frequent itemsets with items from tables A and/or B The Proposed Method (4/8) The set of frequent itemsets with items from tables A An example of “binding” order
Department of Information & Computer Education, NTNU (1) (2) The Proposed Method (5/8)
Department of Information & Computer Education, NTNU The Proposed Method (6/8) • The fact table FT is scanned once and the information is stored into a data structure • Prefix Tree • each node has a label (a tid) and a counter.
Department of Information & Computer Education, NTNU The Proposed Method (7/8) counter tid Prefix tree structure representing
Department of Information & Computer Education, NTNU The Proposed Method (8/8) Collapsing the prefix tree
Department of Information & Computer Education, NTNU Experimental Results (1/5) • All experiments are conducted on SUN Ultra-Enterprise Generic_106541-18 with SunOS 5.7 and 8192MB Main Memory. • Programs are written in C++.
Department of Information & Computer Education, NTNU Experimental Results (2/5) • In the first dataset, items in A and B are strongly related, such that frequent itemsets contain items across A and B, while items in C are not involved. • In the second dataset, items in A, B and C are all strongly related, so that maximal frequent itemsets always contain items from all of A, B and C.
Department of Information & Computer Education, NTNU Experimental Results (3/5) masl: implementing tid_list as a linked list structure masb: implementing tid_list as a fixed-size bitmap and an array of count fpt: the join-before-mine approach with FP-tree algorithm [HPY00] Running time for (A, B) related and (A, B, C) related datasets
Department of Information & Computer Education, NTNU Experimental Results (4/5) • Mixture datasets • 10% of transactions contain frequent itemsets from only A, B, C, respectively. • 15% contain frequent itemsets from AB, BC, AC, respectively. • 10% contain frequent itemsets from ABC. • 15% are random noise.
Department of Information & Computer Education, NTNU Experimental Results (5/5) Running time for mixture datasets
Department of Information & Computer Education, NTNU Conclusions • In the paper, the proposed method is a new algorithm for mining association rules on a star schema without performing the natural join. • The proposed method can be generalized to be applied to a snowflake structure.