180 likes | 444 Views
Frequent Itemset Mining of Uncertain Data Streams Using the Damped Window Model. Carson Kai-Sang Leung, Fan Jiang SAC 2011. Outline. Motivation Background Method Experimental Result Conclusion. Motivation.
E N D
Frequent Itemset Mining of Uncertain Data StreamsUsing the Damped Window Model Carson Kai-Sang Leung, Fan Jiang SAC 2011
Outline • Motivation • Background • Method • Experimental Result • Conclusion
Motivation • When compared with FIM of traditional static DBs, FIM of streaming data is more challenging as • (i) data streams are continuous and unbounded • (ii) data in the streams are not necessarily uniformly distributed.
Background • x:item • X:itemset • DB: transaction database • ti:transaction • the expected support of X in the DB can be computed by summing (over all transactions) the product (of existential probabilities of items within X):
Logic Algorithm • Minsup:1,preMinsup:0.8 • expSup(a)=0.9+0.9=1.8 • expSup(b)=1.0+0.7=1.7 • expSup(c)=0.8+0.8=1.6 • expSup(d)=0.6+0.1+0.6=1.3 • expSup(e)=0.4<preMinsup
Logic Algorithm • {d}-projected database • expSup({a,d},B1)= • expSup({c,d},B1)= • expSup({a,c,d},B1)= • All:{a},{a, c},{a, c, d},{a, d},{b},{c},{c, d},{d}
Logic Algorithm • When Batch B1arrive,{a},{a, c},{a, c, d},{a, d},{b},{c},{c,d}, {d} with expected support of 1.8,1.44,0.86,1.08,1.7,1.6,0.96,1.3
Logic Algorithm • When arrive,we can find : • {a},{a,c},{b},{b,d},{c},{d} with expected support of 1.0,0.8,2.0,1.0,0.9,1.0
Improved Algorithm • When Batch B1 arrive,{a},{a, c},{a, c, d},{a, d},{b},{c},{c,d}, {d} with expected support of 1.8,1.44,0.86,1.08,1.7,1.6,0.96,1.3 • When arrive,we can find :{a},{a,c},{b},{b,d},{c},{d} with expected support of 1.0,0.8,2.0,1.0,0.9,1.0
DUF-streaming Algorithm • When Batch B1 arrive,{a},{a, c},{a, c, d},{a, d},{b},{c},{c,d}, {d} with expected support of 1.8,1.44,0.86,1.08,1.7,1.6,0.96,1.3 • When arrive,we can find :{a},{a,c},{b},{b,d},{c},{d} with expected support of 1.0,0.8,2.0,1.0,0.9,1.0 • Example: • expSup({a,c},B1:2) • expSup({a,c},B1:3)
Conclusion • Experimental results showed the effectivenessof our DUF-streaming algorithm in using the dampedwindow model for mining frequent itemsets from streams ofuncertain data.