110 likes | 181 Views
A learning approach for reducing data packets in sensor networks. Yinghui Na. Problems. Sensors have very limited computation capability Traditionally, the sensors report ALL collected data to base station In some situation, most of these data is not interested
E N D
A learning approach for reducing data packets in sensor networks Yinghui Na
Problems • Sensors have very limited computation capability • Traditionally, the sensors report ALL collected data to base station • In some situation, most of these data is not interested • E.g., rare event (intrusion) detection purpose • Is there a way to report only interested data to BS, and thus to save limited BS resource to maximize lifetime of sensors
Classification • Interaction between data mining algorithms and network protocols • Classification: a task of induction of finding patterns • Assign objects to one of predefined categories • A ‘supervised’ approach to classify the unknown (test data) based on well-know (training data).
Approach • We denote class label of the i-th example xi by yi, where yi∈ Y={0,1}. O is negative and 1 is positive • Collected data points can be labeled at the base station as positive (interesting) and negative (not interesting) • Process • Initialization: at beginning, the BS has no data points; the sensors send all data points until the first model from the base station is received • Classification model creation: BS forms the classification model based the received minimum number of positive examples • Sensors report collected data selectively: Sensors report all positive data points and part of negative data points based on the model • BS updates the model: BS retains all received data and update the model
Cost comparison • If report all collected data, the cost is Cb=N*c, the N is the number of all data points, and we assume that c, the cost of sending a data point, is a constant • In the proposed approach, the total cost is C=Ns*c+NfpCfp+NfnCfn+NmCm, where is the number of selected data points from sensor to BS; Nfp and Nfn are numbers of false positives and false negatives respectively; Cfp and Cfn are their corresponding costs per data point; Nm is the number of models sent by the base station to the sensors and Cm is the cost of such communication • The approach is profitable only if the cost of proposed approach is lower than the cost of traditional approach
Cost matrix • The penalties of classifying the data points in BS can be represented by a 2*2 matrix with element c(i,j). C(0,0) denotes the penalty for not sending a negative data point, c(0,1) the penalty for a false negative, c(1,0) the penalty for sending a negative example, and c(1,1) the penalty for sending a true positive data point. • We assume that c(0,0)=0 and c(1,1) =c. penalties c(0,1)= Cfp and c(1,0)=c+ Cfn are varied.
Classification modeling • Naïve Bayes classifier • This is a direct application of Bayes’ • P(C|X) = P(X|C)P(C)/P(X): X – a vector of x1, x2,.., xn • We used NBC as the classification modeling technique in base station
Performance evaluation • Simulation • Tossim • The network simulation parameters were: packet sizes of 32 bytes (sensor data), and 140 bytes (BS learning model); 100 nodes • PRTools • We used the PRTools data generators [] and obtained examples using gendatd and gendatb routines. The dataset contained 1,000,000 examples. Furthermore, we assume that the probability of a positive example was 0.02 and the probability of negative example was 0.98.
Results • We assume that the cost for c(1,0) ∈ {1,4,16,64} and c(0,1) ∈ {1,4,16,64,256,1024,4096}. • Traditionally, the cost of transmitting 1,000,000 data points will have total 1,000,000 cost • In the table, the increase of the false negative penalty beyond 1024 resulted in a non-profitable system.
References [1] F. Zhao and L. Guibas. Wireless Sensor Networks: An Information Processing Approach. Morgan Kaufmann, 2004. [2] H. Kargupta. Distributed Data Mining for Sensor Networks, PKDD 2004,Tutorial. [3] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan. Energy- e±cient communication protocol for wireless microsensor networks. In In Proccedings of the Hawaii Conference on System Sciences, January 2000. [4] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan. An application-specific protocol architecture for wireless microsensor net- works. IEEE Transactions on Wireless Communications, 1(4):660-670, 2002. [5] P. Radivojac, U. Korad, K. M. Sivalingam, and Z. Obradovic. Learning from class-imbalanced data in wireless sensor networks. In 58th IEEE Semiannual Conf. Vehicular Technology Conference (VTC), volume 5, pages 3030-3034, Orlando, FL, October, 2003. [6] S. S. Ghiasi, A. Srivastava, X. Yang, and M. Sarrafzadeh. Optimal energy aware clustering in sensor networks. Sensors, 2:258-269, 2002.[7] O. Younis and S. Fahmy. Heed: A hybrid, energy-efficient, distributed clustering approach for ad-hoc sensor networks. IEEE Transactions on Mobile Computing, 3(4), 2004. [8] W. Chen, J. C. Hou, and L. Sha. Dynamic clustering for acoustic tar- get tracking in wireless sensor networks. IEEE Transactions on Mobile Computing, 3(3):258-271, 2004. [9] D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tso- tras, M. Vlachos, N. Koudas, and D. Srivastava. The threshold join algorithm for top-k queries in distributed sensor networks. In DMSN '05: Proceedings of the 2nd international workshop on Data management for sensor networks, pages 61-66, New York, NY, USA, 2005. ACM Press. [10] T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Dis- tributed deviation detection in sensor networks. SIGMOD Record, 32(4):77-82, December, 2003. [11] Loo K., Tong I., Kao B., and Cheung D. Online Algorithms for Mining Inter-Stream Associations From Large Sensor Networks. In Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2005 [12] S. Vucetic, D. Pokrajac, H. Xie and Z. Obradovic. Dection of underrepresented biological sequences using class-conditional distribution models, in proceeding of Third SIM Interational Conference on Data Mining, May 2003 [13] Department of University of California, Berkeley. TOSSIM: Simulating TinyOS Networks. http://www.cs.berkeley.edu/~pal/research/tossim.html [14] ‘PRTools, a Matlab Toolbox for pattern Recognition,’ http://www.ph.tn.tudelft.nl/prtools, 2002