1 / 24

Frequent-Pattern Tree

Frequent-Pattern Tree. Bottleneck of Frequent-pattern Mining. Multiple database scans are costly Mining long patterns needs many passes of scanning and generates lots of candidates To find frequent itemset i 1 i 2 …i 100 # of scans: 100

lavey
Download Presentation

Frequent-Pattern Tree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Frequent-Pattern Tree

  2. Bottleneck of Frequent-pattern Mining • Multiple database scans are costly • Mining long patterns needs many passes of scanning and generates lots of candidates • To find frequent itemset i1i2…i100 • # of scans: 100 • # of Candidates: (1001) + (1002) + … + (110000) = 2100-1 = 1.27*1030 ! • Bottleneck: candidate-generation-and-test • Can we avoid candidate generation?

  3. Mining Freq Patterns w/o Candidate Generation • Grow long patterns from short ones using local frequent items • “abc” is a frequent pattern • Get all transactions having “abc”: DB|abc (projected database on abc) • “d” is a local frequent item in DB|abc  abcd is a frequent pattern • Get all transactions having “abcd” (projected database on “abcd”) and find longer itemsets

  4. Mining Freq Patterns w/o Candidate Generation • Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure • Highly condensed, but complete for frequent pattern mining • Avoid costly database scans • Develop an efficient, FP-tree-based frequent pattern mining method • A divide-and-conquer methodology: decompose mining tasks into smaller ones • Avoid candidate generation: examine sub-database (conditional pattern base) only!

  5. Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p}{f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500{a, f, c, e, l, p, m, n}{f, c, a, m, p} min_sup= 50% Steps: • Scan DB once, find frequent 1-itemset (single item pattern) • Order frequent items in frequency descending order: f, c, a, b, m, p (L-order) • Process DB based on L-order

  6. Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p}{f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500{a, f, c, e, l, p, m, n}{f, c, a, m, p} {} Header Table Item frequency head f 0 nil c 0 nil a 0 nil b 0 nil m 0 nil p 0 nil Initial FP-tree

  7. Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p}{f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500{a, f, c, e, l, p, m, n}{f, c, a, m, p} {} Header Table Item frequency head f 1 c 1 a 1 b 0 nil m 1 p 1 f:1 c:1 a:1 m:1 Insert {f, c, a, m, p} p:1

  8. Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p}{f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500{a, f, c, e, l, p, m, n}{f, c, a, m, p} {} Header Table Item frequency head f 2 c 2 a 2 b 1 m 2 p 1 f:2 c2 a:2 m:1 b:1 Insert {f, c, a, b, m} p:1 m:1

  9. Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p}{f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500{a, f, c, e, l, p, m, n}{f, c, a, m, p} {} Header Table Item frequency head f 3 c 2 a 2 b 2 m 2 p 1 f:3 c:2 b:1 a:2 m:1 b:1 Insert {f, b} p:1 m:1

  10. Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p}{f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500{a, f, c, e, l, p, m, n}{f, c, a, m, p} {} Header Table Item frequency head f 3 c 3 a 2 b 3 m 2 p 2 f:3 c:1 c:2 b:1 b:1 a:2 p:1 m:1 b:1 Insert {c, b, p} p:1 m:1

  11. Construct FP-tree from a Transaction DB TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p}{f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500{a, f, c, e, l, p, m, n}{f, c, a, m, p} {} Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 Insert {f, c, a, m, p} p:2 m:1

  12. Benefits of FP-tree Structure • Completeness: • Preserve complete DB information for frequent pattern mining (given prior min support) • Each transaction mapped to one FP-tree path; counts stored at each node • Compactness • One FP-tree path may correspond to multiple transactions; tree is never larger than original database (if not count node-links and counts) • Reduce irrelevant information—infrequent items are gone • Frequency-descending ordering: more frequent items are closer to tree top and more likely to be shared

  13. How Effective Is FP-tree? Dataset: Connect-4 (a dense dataset)

  14. Mining Frequent Patterns Using FP-tree • General idea (divide-and-conquer) • Recursively grow frequent pattern path using FP-tree • Frequent patterns can be partitioned into subsets according to L-order • L-order=f-c-a-b-m-p • Patterns containing p • Patterns having m but no p • Patterns having b but no m or p • … • Patterns having c but no a nor b, m, p • Pattern f

  15. Mining Frequent Patterns Using FP-tree • Step 1 : Construct conditional pattern base for each item in header table • Step 2: Construct conditional FP-tree from each conditional pattern-base • Step 3: Recursively mine conditional FP-trees and grow frequent patterns obtained so far • If conditional FP-tree contains a single path, simply enumerate all patterns

  16. Step 1: Construct Conditional Pattern Base • Starting at header table of FP-tree • Traverse FP-tree by following link of each frequent item • Accumulate all transformed prefix paths of item to form a conditional pattern base Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 {} Conditional pattern bases item cond. pattern base c f:3 a fc:3 b fca:1, f:1, c:1 m fca:2, fcab:1 p fcam:2,cb:1 f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

  17. {} Item frequency head c 3 c:3 Step 2: Construct Conditional FP-tree • For each pattern-base • Accumulate count for each item in base • Construct FP-tree for frequent items of pattern base min_sup= 50% # transaction =5 Conditional pattern bases item cond. pattern base c f:3 a fc:3 b fca:1, f:1, c:1 m fca:2, fcab:1 p fcam:2, cb:1 p conditional FP-tree

  18. Item Conditional pattern-base Conditional FP-tree p {(fcam:2), (cb:1)} {(c:3)}|p m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m b {(fca:1), (f:1), (c:1)} Empty a {(fc:3)} {(f:3, c:3)}|a c {(f:3)} {(f:3)}|c f Empty Empty Mining Frequent Patterns by Creating Conditional Pattern-Bases

  19. FP-tree: c(3) Step 3: Recursively mine conditional FP-tree Collect all patterns that end at p suffix: p(3) FP: p(3) CPB: fcam:2, cb:1 Suffix: cp(3) FP: cp(3) CPB: nil

  20. Step 3: Recursively mine conditional FP-tree FP-tree: f(3) Collect all patterns that end at m FP-tree: suffix: m(3) f(3) c(3) FP: m(3) CPB: fca:2, fcab:1 a(3) suffix: am(3) suffix: cm(3) suffix: fm(3) FP: cm(3) CPB: f:3 FP: fm(3) CPB: nil Continue next page suffix: fcm(3) FP: fcm(3) CPB: nil

  21. FP-tree: FP-tree: f(3) c(3) f(3) Collect all patterns that end at m (cont’d) suffix: am(3) FP: am(3) CPB: fc:3 suffix: fam(3) suffix: cam(3) FP: fam(3) CPB: nil FP: cam(3) CPB: f:3 suffix: fcam(3) FP: fcam(3) CPB: nil

  22. FP-growth vs. Apriori: Scalability With the Support Threshold Data set T25I20D10K

  23. Why Is Frequent Pattern Growth Fast? • Performance study shows • FP-growth is an order of magnitude faster than Apriori • Reasoning • No candidate generation, no candidate test • Use compact data structure • Eliminate repeated database scan • Basic operations are counting and FP-tree building

  24. Weaknesses of FP-growth • Support dependent; cannot accommodate dynamic support threshold • Cannot accommodate incremental DB update • Mining requires recursive operations

More Related