260 likes | 645 Views
Outline. AbstractIntroductionRelated workPreliminaries estMax methodExperimentsConclusions . 2. Abstract(1/2). The number of frequent item sets in a typeical data set is very large:Solution: Frequent item sets needs to be represented in a more compact notations: maximal frequent item s
E N D
1. estMax : Tracing Maximal Frequent Item Sets Instantly over Online Transactional Data Streams Source: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, VOL. 21, NO. 10, OCTOBER 2009
Authors: Ho Jin Woo ,Won Suk Lee
Reporter: Cheng-Ting Hsieh
1
2. Outline Abstract
Introduction
Related work
Preliminaries
estMax method
Experiments
Conclusions
2
3. Abstract(1/2) The number of frequent item sets in a typeical data set is very large:
Solution: Frequent item sets needs to be represented in a more compact notations:
maximal frequent item set(MFI)
closed frequent item set(CFI)
Finding such item sets over online transactional data streams is not easy
Solution: estMax method
(tracing the set of MFIs instantly over an online data strean) 3
4. Abstract(2/2) Advantages of estMax method:
without any checking superset/subset mechanism
extracting MFIs at any moment over online data streams 4
5. Introduction estMax method
based on estDec method
the underlying node structure is prefix tree 5
6. Related Work MOMENT
Closed Enumeration Tree(CET)
Direct Update tree(DIU)
INSTANT
Single-phase algorithm for finding MFIs over a data stream
6
7. Preliminaries(1/5) Prefix tree:
the root node has a null value
each node has two fields
item-id
cnt
n-item set e (i1,i2,…,in)
nroot ?i1?i2?… ?in
7
8. Preliminaries(2/5) detestDec method
determines each item set generated in a new transaction
operations of estDec method
delay insertion
pruning operation
8
9. Preliminaries(3/5) delay insertion
Case1: When a new
1-item appears in a new generated transaction
9
10. Preliminaries(4/5) Case2: n-item set e(n?2) used to insignificant just becomes significant
, set e is significant
10
11. Preliminaries(5/5) pruning operation
When the current support of an n-item set(n ?2) becomes less than Ssig
11
12. estMax method(1/8) ML:Maximal lifetime
,
IS_MAX
If the item set e is an MFI, IS_MAX = true
cnt : count Ck(e)
err : estimate error e(e)
tid : the identifier of the least transaction that contains the item set e
12
13. estMax method(2/8) Top-q t-max
13
14. estMax method(3/8) Error reduction
14
15. estMax method(4/8) estMax method:
Parameter updating phase
Count updating phase
Item set insertion phase
MFI selection phase
15
16. estMax method(5/8) Parameter updating phase
The total number of transactions in the current data stream Dk is updated 16
17. estMax method(6/8) Count updating phase
If the v.cnt < Ssig * |Dk|
? prune this node and all of it’s descendent nodes
If the v.cnt ? Ssig * |Dk| , If the v.err ?Serr * |Dk |
?the item set e is a new MFI
? v.IS_MAX = True
17
18. estMax method(7/8) Item set insertion phase
(i) new 1-items (not contained in Pk-1) ? insert into Pk-1
ML=k, IS_MAX=True
(ii) Filtering Tk with Ssig
significant item sets
(iii) Finding new significant item sets and insert into Pk-1
18
19. estMax method(8/8) MFI selection phase
retraversing the prefix tree Pk
Sk(e) ?Smin and IS_MAX(e)=True then e is MFI
19
20. Experiments(1/6) 20
21. Experiments(2/6) 21
22. Experiments(3/6) False negative errors
False position errors 22
23. Experiments(4/6) 23
24. Experiments(5/6) 24
25. Experiments(6/6) 25
26. Conclusions By using these two parameters ML and IS_MAX
tracing MFIs without superset/subset checking
By several predefined thresholds
diminish the false positive and negative errors
Serr
controlling the accuracy of MFIs
top-q-Tk-maxes
providing a nice trade-off between accuracy and processing time 26
27. Thank for your attention 27