370 likes | 481 Views
Inverted Matrix : Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining. KDD 2003. Mohammad El-Hajj. Osmar R. Zaïane. Department of Computing Science University of Alberta, Canada. Introduction Pre-processing Mining Phase Experiments Conclusion.
E N D
Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining KDD 2003 Mohammad El-Hajj Osmar R. Zaïane Department of Computing Science University of Alberta, Canada
Introduction Pre-processing Mining Phase Experiments Conclusion Outline • Introduction • Pre-Processing Phase Transactional Layouts • Mining Phase Building COFI-trees Mining COFI-trees • Experimental Studies • Conclusion and Future work
Frequent Itemset Mining Association Rules Generation 1 2 Introduction Pre-processing Mining Phase Experiments Conclusion Association Rule Mining Association rule mining is crucial in many applications and plays an essential role in many important mining tasks. Antecedent Consequent Body Head FIM
Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Expensive candidacy generation step OR Huge Memory based Data structures
Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Support > 4 Frequent 1-itemsets {A, B, C, D, E, F} Non frequent items {G, H, I, J, K, L, M, N, O, P, Q, R}
Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Support > 9 Frequent 1-itemsets {A, B, C} Non frequent items {D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R}
Evaluation and Presentation Knowledge Data Mining Selection and Transformation Patterns Data warehouse Databases Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Changing the support level means expensive steps (whole process is redone)
Introduction Pre-processing Mining Phase Experiments Conclusion Motivation • New association Rule mining algorithm that has the following features 1. Low Memory Dependency 2. Remove Superfluous Processing 3. Interactive Mining Ready Without compromising scalability
Introduction Pre-processing Mining Phase Experiments Conclusion Transactional Layouts • Horizontal Layout Candidacy generation can be removed (FP-Growth) Superfluous Processing
Introduction Pre-processing Mining Phase Experiments Conclusion Transactional Layouts • Vertical Layout Minimize Superfluous Processing Candidacy generation is required
Introduction Pre-processing Mining Phase Experiments Conclusion Suggested Layout • Inverted Matrix Layout: Combines the horizontal and vertical layouts 2 I/O passes
Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout Pass 1, generates sorted item list (based on frequency)
T# Items T1 A G D C B T2 B C H E D T3 B D E A M T4 C E F A N T5 A B N O P T6 A C Q R G T7 A C H I G Transactional Array Loc Index T8 L E F K B 1 2 3 4 5 6 7 8 9 10 11 T9 A F M N O 1 R 2 T10 C F P J R 2 Q 2 T11 A D B H I 3 P 3 T12 D E B K L 4 O 3 T13 M D C G O 5 N 3 T14 C F P Q J 6 M 3 T15 B D E F I 7 L 3 T16 J E B A D 8 K 3 T17 A K E F C 9 J 3 T18 C D L B A 10 I 3 11 H 3 12 G 4 (15,1) 13 F 7 14 E 8 15 D 9 (16,1) 16 C 10 (17,1) 17 B 10 (18,1) 18 A 11 (¤, ¤) Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout Pass 2, Generate the transactional array of the IM
T# Items T1 A G D C B T2 B C H E D T3 B D E A M T4 C E F A N T5 A B N O P T6 A C Q R G T7 A C H I G Transactional Array Loc Index T8 L E F K B 1 2 3 4 5 6 7 8 9 10 11 T9 A F M N O 1 R 2 T10 C F P J R 2 Q 2 T11 A D B H I 3 P 3 T12 D E B K L 4 O 3 T13 M D C G O 5 N 3 T14 C F P Q J 6 M 3 T15 B D E F I 7 L 3 T16 J E B A D 8 K 3 T17 A K E F C 9 J 3 T18 C D L B A 10 I 3 11 H 3 (14,1) 12 G 4 (15,1) 13 F 7 14 E 8 (15,2) 15 D 9 (16,1) (16,2) 16 C 10 (17,1) (17,2) 17 B 10 (18,1) (¤, ¤) 18 A 11 (¤, ¤) Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout
Transactional Array Loc Index 1 2 3 4 5 6 7 8 9 10 11 1 R 2 (2,1) (3,2) 2 Q 2 (12,2) (3,3) 3 P 3 (4,1) (9,1) (9,2) 4 O 3 (5,2) (5,3) (6,3) 5 N 3 (13,1) (17,4) (6,2) 6 M 3 (14,2) (13,3) (12,4) 7 L 3 (8,1) (8,2) (15,9) 8 K 3 (13,2) (14,5) (13,7) 9 J 3 (13,4) (13,5) (14,7) 10 I 3 (11,2) (11,3) (13,6) 11 H 3 (14,1) (12,3) 15,4) 12 G 4 (15,1) (16,4) (16,5) (15,6) 13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8) 14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9) 15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10) (¤, ¤) (¤, ¤) (¤, ¤) 16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (18,10) (17,10) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) 17 B 10 (18,1) (18,2) (18,4) (18,8) (18,9) (18,11) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) 18 A 11 Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts There is no minimum support involved in building the Inverted Matrix. • Inverted Matrix Layout
Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout Support > 4 Border Support
Introduction Pre-processing Mining Phase Experiments Conclusion Transactional Layouts • Inverted Matrix Layout
Introduction Pre-processing Mining Phase Experiments Conclusion Transactional Layouts • Inverted Matrix Layout
Introduction Pre-processing Mining Phase Experiments Conclusion Sub transactions generated from IM Frequent sub-transaction with item E Frequent sub-transaction with item F Frequent sub-transaction with item D Frequent sub-transaction with item C Frequent sub-transaction with item B
Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Participation Count Frequency Count Building F-COFI-tree
Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree
Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree
Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree
Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree
Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree
Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree
Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree
Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree
Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree Support = Frequency count – Participation count
Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree
Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree
Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree
Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees D-COFI-tree DBA:5 DA:5 DB:8 B-COFI-tree BA:6 C-COFI-tree CA:6
Introduction Pre-processing Mining Phase Experiments Conclusion Experimental Studies Time needed to mine 1M transactions with different support levels Pentium 700Mhz with 256 MB of RAM
Introduction Pre-processing Mining Phase Experiments Conclusion Experimental Studies Accumulated time needed to mine 1M transactions using 4 different support levels Time needed in seconds to mine different transaction sizes Pentium 700Mhz with 256 MB of RAM
Introduction Pre-processing Mining Phase Experiments Conclusion Conclusion and Future work New AR algorithm • Low memory dependency • No Superfluous processing • Interactive mining ready • scalable Future work Updateable Inverted Matrix for native storage of transactions Compressing the size of Inverted Matrix Parallelizing the mining process as well as the construction of the Inverted Matrix