Mohammad El-Hajj

Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining KDD 2003 Mohammad El-Hajj Osmar R. Zaïane Department of Computing Science University of Alberta, Canada

Introduction Pre-processing Mining Phase Experiments Conclusion Outline • Introduction • Pre-Processing Phase Transactional Layouts • Mining Phase Building COFI-trees Mining COFI-trees • Experimental Studies • Conclusion and Future work

Frequent Itemset Mining Association Rules Generation 1 2 Introduction Pre-processing Mining Phase Experiments Conclusion Association Rule Mining Association rule mining is crucial in many applications and plays an essential role in many important mining tasks. Antecedent  Consequent Body  Head FIM

Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Expensive candidacy generation step OR Huge Memory based Data structures

Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Support > 4 Frequent 1-itemsets {A, B, C, D, E, F} Non frequent items {G, H, I, J, K, L, M, N, O, P, Q, R}

Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Support > 9 Frequent 1-itemsets {A, B, C} Non frequent items {D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R}

Evaluation and Presentation Knowledge Data Mining Selection and Transformation Patterns Data warehouse Databases Introduction Pre-processing Mining Phase Experiments Conclusion Challenges for FIM 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) 3. Non interactive mining Changing the support level means expensive steps (whole process is redone)

Introduction Pre-processing Mining Phase Experiments Conclusion Motivation • New association Rule mining algorithm that has the following features 1. Low Memory Dependency 2. Remove Superfluous Processing 3. Interactive Mining Ready Without compromising scalability

Introduction Pre-processing Mining Phase Experiments Conclusion Transactional Layouts • Horizontal Layout Candidacy generation can be removed (FP-Growth) Superfluous Processing

Introduction Pre-processing Mining Phase Experiments Conclusion Transactional Layouts • Vertical Layout Minimize Superfluous Processing Candidacy generation is required

Introduction Pre-processing Mining Phase Experiments Conclusion Suggested Layout • Inverted Matrix Layout: Combines the horizontal and vertical layouts 2 I/O passes

Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout Pass 1, generates sorted item list (based on frequency)

T# Items T1 A G D C B T2 B C H E D T3 B D E A M T4 C E F A N T5 A B N O P T6 A C Q R G T7 A C H I G Transactional Array Loc Index T8 L E F K B 1 2 3 4 5 6 7 8 9 10 11 T9 A F M N O 1 R 2 T10 C F P J R 2 Q 2 T11 A D B H I 3 P 3 T12 D E B K L 4 O 3 T13 M D C G O 5 N 3 T14 C F P Q J 6 M 3 T15 B D E F I 7 L 3 T16 J E B A D 8 K 3 T17 A K E F C 9 J 3 T18 C D L B A 10 I 3 11 H 3 12 G 4 (15,1) 13 F 7 14 E 8 15 D 9 (16,1) 16 C 10 (17,1) 17 B 10 (18,1) 18 A 11 (¤, ¤) Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout Pass 2, Generate the transactional array of the IM

T# Items T1 A G D C B T2 B C H E D T3 B D E A M T4 C E F A N T5 A B N O P T6 A C Q R G T7 A C H I G Transactional Array Loc Index T8 L E F K B 1 2 3 4 5 6 7 8 9 10 11 T9 A F M N O 1 R 2 T10 C F P J R 2 Q 2 T11 A D B H I 3 P 3 T12 D E B K L 4 O 3 T13 M D C G O 5 N 3 T14 C F P Q J 6 M 3 T15 B D E F I 7 L 3 T16 J E B A D 8 K 3 T17 A K E F C 9 J 3 T18 C D L B A 10 I 3 11 H 3 (14,1) 12 G 4 (15,1) 13 F 7 14 E 8 (15,2) 15 D 9 (16,1) (16,2) 16 C 10 (17,1) (17,2) 17 B 10 (18,1) (¤, ¤) 18 A 11 (¤, ¤) Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout

Transactional Array Loc Index 1 2 3 4 5 6 7 8 9 10 11 1 R 2 (2,1) (3,2) 2 Q 2 (12,2) (3,3) 3 P 3 (4,1) (9,1) (9,2) 4 O 3 (5,2) (5,3) (6,3) 5 N 3 (13,1) (17,4) (6,2) 6 M 3 (14,2) (13,3) (12,4) 7 L 3 (8,1) (8,2) (15,9) 8 K 3 (13,2) (14,5) (13,7) 9 J 3 (13,4) (13,5) (14,7) 10 I 3 (11,2) (11,3) (13,6) 11 H 3 (14,1) (12,3) 15,4) 12 G 4 (15,1) (16,4) (16,5) (15,6) 13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8) 14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9) 15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10) (¤, ¤) (¤, ¤) (¤, ¤) 16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (18,10) (17,10) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) 17 B 10 (18,1) (18,2) (18,4) (18,8) (18,9) (18,11) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) 18 A 11 Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts There is no minimum support involved in building the Inverted Matrix. • Inverted Matrix Layout

Introduction Pre-processing Mining Phase Experiments Conclusion TransactionalLayouts • Inverted Matrix Layout Support > 4 Border Support

Introduction Pre-processing Mining Phase Experiments Conclusion Transactional Layouts • Inverted Matrix Layout

Introduction Pre-processing Mining Phase Experiments Conclusion Sub transactions generated from IM Frequent sub-transaction with item E Frequent sub-transaction with item F Frequent sub-transaction with item D Frequent sub-transaction with item C Frequent sub-transaction with item B

Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Participation Count Frequency Count Building F-COFI-tree

Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree Building F-COFI-tree

Introduction Pre-processing Mining Phase Experiments Conclusion Co-Occurrences Frequent Item tree

Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree

Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree Support = Frequency count – Participation count

Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees E-COFI-tree

Introduction Pre-processing Mining Phase Experiments Conclusion Mining COFI-trees D-COFI-tree DBA:5 DA:5 DB:8 B-COFI-tree BA:6 C-COFI-tree CA:6

Introduction Pre-processing Mining Phase Experiments Conclusion Experimental Studies Time needed to mine 1M transactions with different support levels Pentium 700Mhz with 256 MB of RAM

Introduction Pre-processing Mining Phase Experiments Conclusion Experimental Studies Accumulated time needed to mine 1M transactions using 4 different support levels Time needed in seconds to mine different transaction sizes Pentium 700Mhz with 256 MB of RAM

Introduction Pre-processing Mining Phase Experiments Conclusion Conclusion and Future work New AR algorithm • Low memory dependency • No Superfluous processing • Interactive mining ready • scalable Future work Updateable Inverted Matrix for native storage of transactions Compressing the size of Inverted Matrix Parallelizing the mining process as well as the construction of the Inverted Matrix

Mohammad El-Hajj

Mohammad El-Hajj

Presentation Transcript

By Dr. Mohammad El- Ramly

Hajj

Hajj

Hajj

Hajj

HAJJ

hajj

Hajj

Hajj

Hajj

Mohammad Abou El Naga

Hajj

HAJJ

Hajj

Hajj

Hajj

THE HAJJ

Hajj

Best Hajj Packages | Hajj Tour Packages | Hajj 2020

Hajj Presentation