Data Mining

Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture 11-12

Is Apriori Fast Enough? — Performance Bottlenecks • The core of the Apriori algorithm: • Use frequent (k – 1)-itemsets to generate candidate frequent k-itemsets • Use database scan and pattern matching to collect counts for the candidate itemsets • The bottleneck of Apriori: candidate generation • Huge candidate sets: • 104 frequent 1-itemset will generate 107 candidate 2-itemsets • To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one needs to generate 2100  1030 candidates. • Multiple scans of database: • Needs (n +1 ) scans, n is the length of the longest pattern

Mining Frequent Patterns Without Candidate Generation • Steps • Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure • highly condensed, but complete for frequent pattern mining • avoid costly database scans • Develop an efficient, FP-tree-based frequent pattern mining method • A divide-and-conquer methodology: decompose mining tasks into smaller ones • Avoid candidate generation: sub-database test only!

FP-tree Construction TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p}{f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500{a, f, c, e, l, p, m, n}{f, c, a, m, p} • Steps: • Scan DB once, find frequent 1-itemset (single item pattern) • Order frequent items in frequency descending order • Scan DB again, construct FP-tree Item frequency head f 4 c 4 a 3 b 3 m 3 p 3

FP-tree Construction (contd.) • Steps Contd. (Example) • Scan of the first transaction leads to the construction of the first branch of the tree listing (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p} {} f:1 c:1 a:1 m:1 p:1

FP-tree Construction (contd.) • Steps Contd. (Example) • Scan of the first transaction leads to the construction of the first branch of the tree listing • Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 • Two new nodes are created and linked as children of (a:2) and (b:1) respec. (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p} {} f:2 c:2 a:2 m:1 b:1 p:1 m:1

FP-tree Construction (contd.) • Steps Contd. (Example) • Scan of the first transaction leads to the construction of the first branch of the tree listing • Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 • Two new nodes are created and linked as children of (a:2) and (b:1) respec. • Similarly for the third transaction (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p} {} f:3 c:2 b:1 a:2 m:1 b:1 p:1 m:1

FP-tree Construction (contd.) • Steps Contd. (Example) • Scan of the first transaction leads to the construction of the first branch of the tree listing • Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 • Two new nodes are created and linked as children of (a:2) and (b:1) respec. • Similarly for the third transaction • The scan of the fourth transaction leads to the construction of the second branch of the tree, (c:1), (b:1), (p:1). (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p} {} f:3 c:1 c:2 b:1 b:1 a:2 p:1 m:1 b:1 p:1 m:1

FP-tree Construction (contd.) {} f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1 • Steps Contd. (Example) • Scan of the first transaction leads to the construction of the first branch of the tree listing • Second transaction shares a common prefix with the existing path the count of each node along the prefix is incremented by 1 • Two new nodes are created and linked as children of (a:2) and (b:1) respec. • Similarly for the third transaction • The scan of the fourth transaction leads to the construction of the second branch of the tree, (c:1), (b:1), (p:1). • For the last transaction, since its frequent item list is identical to the first one, the path is shared. (ordered) frequent items {f, c, a, m, p} {f, c, a, b, m} {f, b} {c, b, p} {f, c, a, m, p}

FP-tree Construction (contd.) {} Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1 • Create a Header table • Each entry in the frequent-item-header table consists of two fields, (1) item-name (2) head of node-link (a pointer pointing to the first node in the FP-tree carrying the item-name).

Mining frequent patterns using FP-tree • Mining frequent patterns out of FP-tree is based upon following Node-link property • For any frequent item ai , all the possible patterns containing only frequent items and ai can be obtained by following ai ’s node-links, starting from ai’s head in the FP-tree header. • Lets go through an example to understand the full implication of this property in the mining process.

Mining frequent patterns of p • For node p, its immediate frequent pattern is (p:3), and it has two paths in the FP-tree: (f :4, c:3, a:3,m:2,p:2) and (c:1, b:1, p:1) • These two prefix paths of p, “{( f cam:2), (cb:1)}”, form p’s conditional pattern base • Now, we build an FP- tree on P’s conditional pattern base. • Leads to an FP tree with one branch only i.e. C:3 hence the frequent patter n associated with P is just CP Header Table Item head f c a b m p {} f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1

Mining frequent patterns of m {} • m-conditional pattern base: • fca:2, fcab:1 Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 f:4 c:1 All frequent patterns concerning m m, fm, cm, am, fcm, fam, cam, fcam c:3 b:1 b:1   a:3 p:1 m:2 b:1 p:2 m:1 {} f:3 c:3 a:3 m-conditional FP-tree • Constructing an FP-tree on m, we derive m’s conditional FP-tree, f :3, c:3, a:3, a single frequent pattern path. • This conditional FP-tree is then mined recursively.

Mining frequent patterns of m {} f:3 c:3 am-conditional FP-tree {} f:3 c:3 a:3 m-conditional FP-tree Cond. pattern base of “am”: (fc:3) {} Cond. pattern base of “cm”: (f:3) f:3 cm-conditional FP-tree {} Cond. pattern base of “cam”: (f:3) f:3 cam-conditional FP-tree

Mining Frequent Patterns by Creating Conditional Pattern-Bases Item Conditional pattern-base Conditional FP-tree p {(fcam:2), (cb:1)} {(c:3)}|p m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m b {(fca:1), (f:1), (c:1)} Empty a {(fc:3)} {(f:3, c:3)}|a c {(f:3)} {(f:3)}|c f Empty Empty

Single FP-tree Path Generation • Suppose an FP-tree T has a single path P • The complete set of frequent pattern of T can be generated by enumeration of all the combinations of the sub-paths of P {} All frequent patterns concerning m m, fm, cm, am, fcm, fam, cam, fcam f:3  c:3 a:3 m-conditional FP-tree

Why Is Frequent Pattern Growth Fast? • Our performance study shows • FP-growth is an order of magnitude faster than Apriori, and is also faster than tree-projection • Reasoning • No candidate generation, no candidate test • Use compact data structure • Eliminate repeated database scan • Basic operation is counting and FP-tree building

FP-Growth vs. Apriori: Scalability With the Support Threshold Data set T25I20D10K

Frequent Itemset Using FP-Growth (Example) Transaction Database null B:3 A:7 B:5 C:3 C:1 D:1 Header table D:1 C:3 E:1 D:1 E:1 D:1 E:1 D:1 Pointers are used to assist frequent itemset generation

Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining null Build conditional pattern base for E: P = {(A:1,C:1,D:1), (A:1,D:1), (B:1,C:1)} Recursively apply FP-growth on P B:3 A:7 B:5 C:3 C:1 D:1 C:3 D:1 D:1 E:1 E:1 D:1 E:1 D:1

Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Conditional Pattern base for E: P = {(A:1,C:1,D:1,E:1), (A:1,D:1,E:1), (B:1,C:1,E:1)} Count for E is 3: {E} is frequent itemset Recursively apply FP-growth on P (Conditional tree for D within conditional tree for E) null B:1 A:2 C:1 C:1 D:1 D:1 E:1 E:1 E:1 Conditional tree for E:

Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Conditional pattern base for D within conditional base for E: P = {(A:1,C:1,D:1), (A:1,D:1)} Count for D is 2: {D,E} is frequent itemset Recursively apply FP-growth on P(Conditional tree for C within conditional treeD within conditional tree for E) null A:2 C:1 D:1 D:1 Conditional tree for D within conditional tree for E:

Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Conditional pattern base for C within D within E: P = {(A:1,C:1)} Count for C is 1: {C,D,E} is NOT frequent itemset Recursively apply FP-growth on P (Conditional tree for A within conditional treeD within conditional tree for E) null A:1 C:1 Conditional tree for C within D within E:

Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Count for A is 2: {A,D,E} is frequent itemset Next step: Construct conditional tree C within conditional tree E null A:2 Conditional tree for A within D within E:

Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining null Recursively apply FP-growth on P (Conditional tree for C within conditional tree for E) B:1 A:2 C:1 C:1 D:1 D:1 E:1 E:1 E:1 Conditional tree for E:

Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Conditional pattern base for C within conditional base for E: P = {(B:1,C:1), (A:1,C:1)} Count for C is 2: {C,E} is frequent itemset Recursively apply FP-growth on P(Conditional tree for B within conditional treeC within conditional tree for E) null B:1 A:1 C:1 C:1 E:1 E:1 Conditional tree for C within conditional tree for E:

Frequent Itemset Using FP-Growth (Example) FP Growth Algorithm: FP Tree Mining Transaction Database null B:3 A:7 B:5 C:3 C:1 D:1 Header table D:1 C:3 E:1 D:1 E:1 D:1 E:1 D:1

Data Mining

Data Mining

Presentation Transcript

Data Mining

DATA MINING

Data Mining

Data Mining

Data Mining: Data

Data Mining

DATA MINING

Data Mining: Data

Data Mining: Data

Data Mining: P enelitian Data Mining

Data Mining

Data Mining: Data

Data Mining

Data Mining: Data

Data-mining

Data Mining

Data Mining: Data

Data Mining: Data

Data Mining: Data

Data Mining: Data

Data Mining: Data