640 likes | 650 Views
This article explores the utility of sensor data in mobile and pervasive computing, including analysis tools, anomaly detection, learning engines, and data characterization.
E N D
Sensor Data ProcessingCNT 5517-5564Mobile & Pervasive Computing Dr. Sumi Helal Computer & Information Science & Engineering Department University of Florida, Gainesville, FL 32611 helal@cise.ufl.edu Credit - Slides designed from two articles that appeared in the IEEE Pervasive Computing Magazine: Dr. Diane Cook, “Making Sense of Sensor Data,” April-June 2007 Issue Dr. Hani Hagras, “Embedding Computational Intelligence in Pervasive Spaces, Sept-Dec 2007 Issue.
Utility of Sensor Data 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1 Hot Warm Having a Good Day Dehydrated Insecure ….. Feeling Good! 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1 0.02344442 0100111101101 0.997342 67 66 69 0.000023 0.23E-2 0.0 1.0 -1 -1 -4 17 2.00523 1 1
The Utility of Sensor Data Pervasive & Ubiquitous Systems Analysis, Learning and Discovery Smart Space Application Analysis Tools Detecting anomalies Activities Learning Engines (fuzzy logic) Classification Detecting trends Events Phenomena Detection Data Characterization Sensor Fusion Detecting patterns Sensor Virtualization: For Reliability & Availability Filtering Row Sensor Data
Noise – The curse of Large Dimensionality • Background noise could challenge sensor data analysis, where the data appear to have large dimensionality. • It is important to reduce data dimensionality to a minimum to focus on the most important data and to filter out the noise.
Reducing Dimensionality • Principle Component Analysis (PCA) • Project data on PC’s that reflect the greatest variance in the data • Subsequent orthogonal PC’s to capture variance missed by first PC’s • Ignore data that are not within pc’s. they should be considered as noise. • PCA is good in finding new, more informative, uncorrelated features; it reduces dimensionality by rejecting low variance features PC1 PC2
Finding Principle Components • Looking for a transformation of the data matrix X (nxp) such that Y= TX=1 X1+ 2 X2+..+ p Xp • Where =(1 , 2 ,.., p)Tis a column vector of wheights with 1²+ 2²+..+ p²=1
Maximize the Variance: Var(T X) Good Better
Characterizing Sensor Data • In dealing with a large amount of sensor data, it may be crucial to understand the nature of the data - that is to characterize the data in a way meaningful to the analyst. • Clustering is a well known technique to achieve this goal. • Also known as unsupervised learning.
Clustering Example: Euclidean metric • Definition Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each cluster (ideally) share some common trait - often proximity according to some defined distance measure.
The k-means Algorithm 1. Decide on a value for k. 2. Initialize the k cluster centers (randomly, if necessary). 3. Decide the class memberships of the N objects by assigning them to the nearest cluster center. 4. Re-estimate the k cluster centers, by assuming the memberships found above are correct. 5. If none of the N objects changed membership in the last iteration, exit. Otherwise go to 3.
10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Goal of k-means Algorithm Objective Function
k3 k1 k2 K-means Clustering: Step 1 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 3 2 1 0 0 1 2 3 4 5
k3 k1 k2 K-means Clustering: Step 2 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 3 2 1 0 0 1 2 3 4 5
k3 k1 k2 K-means Clustering: Step 3 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 3 2 1 0 0 1 2 3 4 5
k3 k1 k2 K-means Clustering: Step 4 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 3 2 1 0 0 1 2 3 4 5
k1 k2 k3 K-means Clustering: Step 5 Algorithm: k-means, Distance Metric: Euclidean Distance
Pros and Cons • Pros • Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. • Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms • Cons • Applicable only when mean is defined, then what about categorical data? • Need to specify k, the number of clusters, in advance • Unable to handle noisy data and outliers • Not suitable to discover clusters with non-convex shapes
Selecting K 1.00E+03 9.00E+02 8.00E+02 7.00E+02 6.00E+02 Objective Function 5.00E+02 4.00E+02 3.00E+02 2.00E+02 1.00E+02 0.00E+00 k 1 2 3 4 5 6
For in-depth study of clustering Duda, R.O., Hart, P.E., Stork, D.G. Pattern classification (2nd edition), Wiley, 2001. Bishop , C., Pattern Recognition and Machine Learning (1st edition), Springer, 2006.
Detecting Patterns • A form of knowledge discovery, to allow us understand the raw sensors data better, and to learn more about the phenomena generating the data • Approach I: Association analysis (a.k.a Affinity analysis) finds frequent co-occurrences between values of raw sensors (aka frequent item set) • Frequent item sets are important patterns that can lead to if-then rules that guides classification of future data points. • Approach II: Episode Detection Algorithm
Example Sensor 2 1.5 1.0 0.5 0 -0.5 -1.0 • Example shows an association that was generated from sensors on a body-worn monitoring device. The rule generated from this association is: • IF (heat flux is in [4.48,12.63]) AND (Accelerometer is in [0.86, 1.04]) THEN Action = Sleep with accuracy = 99.94%. -40 -20 0 20 40 60 80 100 Sensor 1
The Apriori Algorithm • Description • The Algorithm is based on finding the “frequent” subsets in the power set of possible sensor data by scanning a database of occurrences of subsets of the sensor data and counting how many times the subset appears (Support ). • Then, a series of rules are generated by looking at the frequent subsets. • For example if {a,b,c,d} is frequent • generate {a,b,c}{d}. • Divide the support by the support of If accept the rule!
For more on Affinity Analysis • Agrawal R, Imielinski T, Swami AN. "Mining Association Rules between Sets of Items in Large Databases." SIGMOD. June 1993, 22(2):207-16.
Episode Discovery Algorithm • Based on: a sliding window • .. and: the Minimum Description Length (MDL) principle • Any regularity in the data can be used to compress the data, i.e. to describe it using fewer symbols than the number of symbols needed to describe the data literally. The more regularities there are, the more the data can be compressed
Classifying Sensor Data • Unlike data clustering, classification is a supervised learning process • Common Data Analysis goal is to map sensor data points to pre-defined class label, i.e. classifying the data. • For example, each individual in a house is a class, then we want to map each time-ordered data to each of the individuals. • Examples of Classifiers • Bayesian Classifiers. • Linear Classifiers. • Neural networks. • Support Vector Machines • more
Example - Linear Classifiers Y is the output X are the input data for a point W are the weights for each input data A Line has below points and above points, you can use that for classification. Class 1 Multiple Linear Classifiers Examples Class 2
For more in-depth information about classifiers Duda, R.O., Hart, P.E., Stork, D.G. Pattern classification (2nd edition), Wiley, 2001. Bishop, C., Pattern Recognition and Machine Learning (1st edition), Springer, 2006. Haykin, S., Neural Networks: A Comprehensive Foundation, Prentice Hall, 2007
Detecting Trends • Because sensor data has a time component, it can be analyzed to determine trends. • Increasing patterns. • Decreasing patterns • Cyclic patterns, • Stable patterns. • Two main methods • Temporal Autocorrelation plots. • Anomaly Detection
Temporal Autocorrelation Cyclic Data High Correlation Autocorrelation plot Temporal autocorrelation refers to the correlation between time-shifted values of a time series. It reflects the fact that the signal value at a given time point is not completely independent of its past signal values.
A little bit of Math Autocorrelation Function looks like E is the expectation. represents the signal at time t. represents the shifted signal at a different time s.
Then the process can be seen as… • Trying to match the signal that you have against a shifted version of the same signal with respect to time.
For more information Denning, D., "An Intrusion Detection Model," Proceedings of the Seventh IEEE Symposium on Security and Privacy, May 1986, pages 119-131. Teng, H. S., Chen, K., and Lu, S. C-Y, "Adaptive Real-time Anomaly Detection Using Inductively Generated Sequential Patterns," 1990 IEEE Symposium on Security and Privacy Jones, A. K., and Sielken, R. S., "Computer System Intrusion Detection: A Survey," Technical Report, Department of Computer Science, University of Virginia, Charlottesville, VA, 1999
Phenomena Clouds • A phenomenon cloud P is expressed as a 5-tuple: P = <a, b, pT, m, n>; where: [a, b] defines the range of magnitude of the phenomenon, pT defines threshold probability, m defines sliding window size and, n defines minimum quorum.
Classification of Roles assigned to Sensors Tracking Candidate Potential Candidate Idle
Role Transition Rules • R1: Candidate → Tracking: If a sensor satisfies the Phenomenon-Condition then it transitions into the tracking category. • R2: Potential Candidate → Candidate: A potential candidate sensor will transition to a candidate sensor if any of its neighbors transitions into a tracking sensor. • R3: Idle → Potential Candidate: An idle sensor transitions into a potential candidate if any of its neighbors becomes a candidate sensor. • R4: Tracking → Candidate: A tracking sensor will transition down to the candidate category; if it is unable to satisfy the Phenomenon-Condition. • R5: Candidate → Potential Candidate: A candidate sensor will transition to a potential candidate sensor if none of its neighbors are tracking sensors. • R6: Potential Candidate → Idle: A potential candidate transitions into an idle sensor if all its neighbors are either potential candidates or idle, that is none of its neighbors are in the candidate category.
Role Transition Rules Neighboring Sensor
Steps in the Detection & Tracking Process Initial Selection & Monitoring Growth of Phenomenon Cloud Initial Occurrence Shrinking of Phenomenon Cloud Candidate Tracking Idle Sensor Potential Candidate
A Practical Demonstration of Phenomena Detection and Tracking Effect of a Footstep on the Smart Floor Gator Tech Smart House (GTSH) Smart Floor
Experimental Analysis - Detection Performance Effect of varying quorum ‘n’ on detection performance
The Role of Computational Intelligence in Pervasive Spaces • Learn over time from user interactions and behavior, and from the environment • Robustly manage User Uncertainty • Intra-User Uncertainty: user approach same problem differently over time • Inter-User uncertainty: different users approach same problem differently. • Similarly, robustly manage environmental uncertainty such as noise, change of season etc. CI techniques are tolerant to imprecision, uncertainty, approximation, and partial truths.
Three widely used techniques in Computational Intelligence • Fuzzy Systems • They mimic the human gray-logic. • Neural Networks • They are function approximation devices. • Evolutionary Systems • They mimic the evolutionary abilities of complex organisms.
Fuzzy Systems • They mimic human gray logic • We do not say “If the temperature is above 24 degrees and the cloud cover is less than 10 percent, and I have three hours time, I will go for a hike with a probability of 0.47.” • We say “If the weather is nice and I have a little time, I will probably go for a walk.”
Type-1 fuzzy sets This function m represent the confidence of an element in the fuzzy set (A,m). Definition A fuzzy set is a pair (A,m) where A is a set and
Example: Represent Linguistic Variables Quality of a car in a scale from 0-1000
Relationship to Boolean Logic • The Boolean set is a special case of the fuzzy set. • No crisp boundaries between the different fuzzy sets. This means that paradox are acceptable. • For example, a plate can be cold and hot at the same time.
Fuzzy Logic as a Generalization of Boolean Logic Union is the operator max Intersection is the operator min Complement is defined as Implication rule
Example Warm Cold Hot Assume the set of temperatures with range 0 to 35 Celsius, we have the following fuzzy sets