1 / 27

Data Mining

Data Mining. Frequent-Pattern Tree Approach Towards ARM. Lecture 11-12. Is Apriori Fast Enough? — Performance Bottlenecks. The core of the Apriori algorithm: Use frequent ( k – 1)-itemsets to generate candidate frequent k- itemsets

vonda
Download Presentation

Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture 11-12

  2. Is Apriori Fast Enough? — Performance Bottlenecks • The core of the Apriori algorithm: • Use frequent (k – 1)-itemsets to generate candidate frequent k-itemsets • Use database scan and pattern matching to collect counts for the candidate itemsets • The bottleneck of Apriori: candidate generation • Huge candidate sets: • 104 frequent 1-itemset will generate 107 candidate 2-itemsets • To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one needs to generate 2100  1030 candidates. • Multiple scans of database: • Needs (n +1 ) scans, n is the length of the longest pattern

  3. Mining Frequent Patterns Without Candidate Generation • Steps • Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure • highly condensed, but complete for frequent pattern mining • avoid costly database scans • Develop an efficient, FP-tree-based frequent pattern mining method • A divide-and-conquer methodology: decompose mining tasks into smaller ones • Avoid candidate generation: sub-database test only!

  4. FP-tree Construction TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p}{f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500{a, f, c, e, l, p, m, n}{f, c, a, m, p} • Steps: • Scan DB once, find frequent 1-itemset (single item pattern) • Order frequent items in frequency descending order • Scan DB again, construct FP-tree Item frequency head f 4 c 4 a 3 b 3 m 3 p 3

  5. FP-tree Construction (contd.) • Steps Contd. (Example) • Scan of the first transaction leads to the construction of the first branch of the tree listing (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p} {} f:1 c:1 a:1 m:1 p:1

  6. FP-tree Construction (contd.) • Steps Contd. (Example) • Scan of the first transaction leads to the construction of the first branch of the tree listing • Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 • Two new nodes are created and linked as children of (a:2) and (b:1) respec. (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p} {} f:2 c:2 a:2 m:1 b:1 p:1 m:1

  7. FP-tree Construction (contd.) • Steps Contd. (Example) • Scan of the first transaction leads to the construction of the first branch of the tree listing • Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 • Two new nodes are created and linked as children of (a:2) and (b:1) respec. • Similarly for the third transaction (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p} {} f:3 c:2 b:1 a:2 m:1 b:1 p:1 m:1

  8. FP-tree Construction (contd.) • Steps Contd. (Example) • Scan of the first transaction leads to the construction of the first branch of the tree listing • Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 • Two new nodes are created and linked as children of (a:2) and (b:1) respec. • Similarly for the third transaction • The scan of the fourth transaction leads to the construction of the second branch of the tree, (c:1), (b:1), (p:1). (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p} {} f:3 c:1 c:2 b:1 b:1 a:2 p:1 m:1 b:1 p:1 m:1

  9. FP-tree Construction (contd.) {} f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1 • Steps Contd. (Example) • Scan of the first transaction leads to the construction of the first branch of the tree listing • Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 • Two new nodes are created and linked as children of (a:2) and (b:1) respec. • Similarly for the third transaction • The scan of the fourth transaction leads to the construction of the second branch of the tree, (c:1), (b:1), (p:1). • For the last transaction, since its frequent item list is identical to the first one, the path is shared. (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p}

  10. FP-tree Construction (contd.) {} Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1 • Create a Header table • Each entry in the frequent-item-header table consists of two fields, (1) item-name (2) head of node-link (a pointer pointing to the first node in the FP-tree carrying the item-name).

  11. Mining frequent patterns using FP-tree • Mining frequent patterns out of FP-tree is based upon following Node-link property • For any frequent item ai , all the possible patterns containing only frequent items and ai can be obtained by following ai ’s node-links, starting from ai’s head in the FP-tree header. • Lets go through an example to understand the full implication of this property in the mining process.

  12. Mining frequent patterns of p • For node p, its immediate frequent pattern is (p:3), and it has two paths in the FP-tree: (f :4, c:3, a:3,m:2,p:2) and (c:1, b:1, p:1) • These two prefix paths of p, “{( f cam:2), (cb:1)}”, form p’s conditional pattern base • Now, we build an FP- tree on P’s conditional pattern base. • Leads to an FP tree with one branch only i.e. C:3 hence the frequent patter n associated with P is just CP Header Table Item head f c a b m p {} f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

  13. Mining frequent patterns of m {} • m-conditional pattern base: • fca:2, fcab:1 Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 f:4 c:1 All frequent patterns concerning m m, fm, cm, am, fcm, fam, cam, fcam c:3 b:1 b:1   a:3 p:1 m:2 b:1 p:2 m:1 {} f:3 c:3 a:3 m-conditional FP-tree • Constructing an FP-tree on m, we derive m’s conditional FP-tree, f :3, c:3, a:3, a single frequent pattern path. • This conditional FP-tree is then mined recursively.

  14. Mining frequent patterns of m {} f:3 c:3 am-conditional FP-tree {} f:3 c:3 a:3 m-conditional FP-tree Cond. pattern base of “am”: (fc:3) {} Cond. pattern base of “cm”: (f:3) f:3 cm-conditional FP-tree {} Cond. pattern base of “cam”: (f:3) f:3 cam-conditional FP-tree

  15. Mining Frequent Patterns by Creating Conditional Pattern-Bases Item Conditional pattern-base Conditional FP-tree p {(fcam:2), (cb:1)} {(c:3)}|p m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m b {(fca:1), (f:1), (c:1)} Empty a {(fc:3)} {(f:3, c:3)}|a c {(f:3)} {(f:3)}|c f Empty Empty

  16. Single FP-tree Path Generation • Suppose an FP-tree T has a single path P • The complete set of frequent pattern of T can be generated by enumeration of all the combinations of the sub-paths of P {} All frequent patterns concerning m m, fm, cm, am, fcm, fam, cam, fcam f:3  c:3 a:3 m-conditional FP-tree

  17. Why Is Frequent Pattern Growth Fast? • Our performance study shows • FP-growth is an order of magnitude faster than Apriori, and is also faster than tree-projection • Reasoning • No candidate generation, no candidate test • Use compact data structure • Eliminate repeated database scan • Basic operation is counting and FP-tree building

  18. FP-Growth vs. Apriori: Scalability With the Support Threshold Data set T25I20D10K

  19. Frequent Itemset Using FP-Growth (Example) Transaction Database null B:3 A:7 B:5 C:3 C:1 D:1 Header table D:1 C:3 E:1 D:1 E:1 D:1 E:1 D:1 Pointers are used to assist frequent itemset generation

  20. Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining null Build conditional pattern base for E: P = {(A:1,C:1,D:1), (A:1,D:1), (B:1,C:1)} Recursively apply FP-growth on P B:3 A:7 B:5 C:3 C:1 D:1 C:3 D:1 D:1 E:1 E:1 D:1 E:1 D:1

  21. Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Conditional Pattern base for E: P = {(A:1,C:1,D:1,E:1), (A:1,D:1,E:1), (B:1,C:1,E:1)} Count for E is 3: {E} is frequent itemset Recursively apply FP-growth on P (Conditional tree for D within conditional tree for E) null B:1 A:2 C:1 C:1 D:1 D:1 E:1 E:1 E:1 Conditional tree for E:

  22. Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Conditional pattern base for D within conditional base for E: P = {(A:1,C:1,D:1), (A:1,D:1)} Count for D is 2: {D,E} is frequent itemset Recursively apply FP-growth on P(Conditional tree for C within conditional treeD within conditional tree for E) null A:2 C:1 D:1 D:1 Conditional tree for D within conditional tree for E:

  23. Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Conditional pattern base for C within D within E: P = {(A:1,C:1)} Count for C is 1: {C,D,E} is NOT frequent itemset Recursively apply FP-growth on P (Conditional tree for A within conditional treeD within conditional tree for E) null A:1 C:1 Conditional tree for C within D within E:

  24. Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Count for A is 2: {A,D,E} is frequent itemset Next step: Construct conditional tree C within conditional tree E null A:2 Conditional tree for A within D within E:

  25. Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining null Recursively apply FP-growth on P (Conditional tree for C within conditional tree for E) B:1 A:2 C:1 C:1 D:1 D:1 E:1 E:1 E:1 Conditional tree for E:

  26. Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Conditional pattern base for C within conditional base for E: P = {(B:1,C:1), (A:1,C:1)} Count for C is 2: {C,E} is frequent itemset Recursively apply FP-growth on P(Conditional tree for B within conditional treeC within conditional tree for E) null B:1 A:1 C:1 C:1 E:1 E:1 Conditional tree for C within conditional tree for E:

  27. Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Transaction Database null B:3 A:7 B:5 C:3 C:1 D:1 Header table D:1 C:3 E:1 D:1 E:1 D:1 E:1 D:1

More Related