130 likes | 260 Views
An Adaptive Algorithm for Finding Frequent Sets in Landmark Windows. Xuan Hong Dang, Kok-Leong Ong and Vincent Lee Dept. Computer Science, Aarhus University, Denmark School of IT, Deakin University, Australia Faculty of IT, Monash University, Australia. Applications.
E N D
An Adaptive Algorithm for Finding Frequent Sets in Landmark Windows Xuan Hong Dang, Kok-Leong Ong and Vincent Lee Dept. Computer Science, Aarhus University, Denmark School of IT, Deakin University, Australia Faculty of IT, Monash University, Australia
Applications • Sensors of all sorts are generating a lot of data streams • Many applications consume these data streams to discover evolving knowledge about the data stream
Problem • Data rates can exceed compute capacity • Machine must adapt to produce results on time • HOW?
A solution for finding frequent sets • Our method • Approximate frequency counts • Built adaptability in processing through load shedding • Applicable to landmark, forgetful and sliding windows
StreamL • Given a transaction stream • {t1, t2, t3, ……………………………………………………., ti, tj, …} • ti = {x1, x2, …}, where xa is a literal landmark window
StreamL • Capacity is bounded by number of transactions in the window and the size of each transaction • How to measure this capacity? • A simple way is use MFS to estimate how many itemsets to process in each transaction, i.e.,
StreamL • For n transactions in the window, the number of itemsets to process is • If r is the rate, then the capacity to process each transaction can be
StreamL • For n transactions in the window, the number of itemsets to process is • If r is the rate, then the capacity to process each transaction can be
StreamL • When rate increases, the idea is to add a P such that • to maintain a non-overload situation. • To achieve a load of C, the adjust made by P is therefore achieved by dropping transactions
StreamL • When transactions are dropped in a window, • {t1, t2, t3, ……………………………………………………., ti, tj, …} • Frequency of X becomes inaccurate • Qualify this with an error e landmark window
StreamL • Qualify this with an error e, which is the result of dropping transactions with probability 1 - P (< 1) • We can use e to compute a guarantee using the Chernoff bounds, i.e., • How confident it is that true support of X deviates from the estimated support of X by +/- e
Details • We presented the idea sketch • See paper for algorithm for landmark window • The idea can be extended to other windows; see technical report for forgetful and sliding window