70 likes | 94 Views
Frequent Itemsets Mining in Distributed Wireless Sensor Networks. Manjunath Rajashekhar. Motivation. Sensor network: Battery powered, wireless communication Limited RAM (10K – 32M), large flash (512MB – 1GB) Communication over wireless Speed (4MHz – 40MHz)
E N D
Frequent Itemsets Mining in Distributed Wireless Sensor Networks Manjunath Rajashekhar
Motivation • Sensor network: • Battery powered, wireless communication • Limited RAM (10K – 32M), large flash (512MB – 1GB) • Communication over wireless • Speed (4MHz – 40MHz) • Centralized Distributed • I/O Communication • Different Data Rates • Can think of data as baskets? • Data is not uniform distributed across all nodes! • Trivial solution
Algorithm (1) • Preprocessing • Each node sends {node-id, #baskets-count} to the base station • Sampling • Query the network to collect the random sample • Generation of Frequent Itemsets • Apriori algorithm • Scaled threshold
Algorithm (2) • Verification of Frequent Itemsets • Eliminate False Negatives • Negative Border • Aggregate counts of negative border over the network • Fails: Repeat the whole algorithm • Eliminate False Positives • Aggregate counts of frequent itemsets over the network
Experiments • Setup: # Nodes = 100 # Baskets = 10400, baskets are distributed non-uniformly across nodes. Threshold scaling factor = 0.9 Support threshold = 25% Synthetic dataset Values averaged over 100 trials. • ~73 % saving in communication • Insights?
Analysis • Preprocessing: (C1) • size-of {node-id, count} * # nodes * cumulative-communication-distance • Sampling (C2) • average-size-of-baskets * size-of-random-sample * cumulative-communication-distance • False Negatives (C3) • size-of-negative-border * # nodes * aggregation-distance • False Positives (C4) • size-of-frequent-itemsets * # nodes * aggregation-distance • Total Cost = C1 + C2 + C3 + C4