250 likes | 357 Views
Discovery of Activity Patterns using Topic Models Tâm Huýnh , Mario Fritz and Bernt Schiele Computer Science Department TU Darmstadt, Germany. John Wilkinson CS88/188 10/23/08. Motivation. Recognition of daily routines Routine is higher level than an activity
E N D
Discovery of Activity Patterns using Topic ModelsTâm Huýnh, Mario Fritz and Bernt SchieleComputer Science DepartmentTU Darmstadt, Germany John Wilkinson CS88/188 10/23/08
Motivation • Recognition of daily routines • Routine is higher level than an activity • Usually composed of several lower level activities and varying depending on context. • Examples: Commuting, Office Work, Meeting
Contributions of this Paper • Daily routines = probabilistic combination of activity patterns • Use of topic models on sensor data to detect patterns • Works on supervised and unsupervised data • Experimental validation of approach
Activities are Hierarchical • routinessuch as commuting, office work, lunch routine or dinner routine are combinations of lower level activities. • Patterns of multiple activities • Cover longer time period than activities • Can vary significantly from one instance to another • Example: office work “mostly consists of sitting”, but “may (or may not) contain small amounts of using the toilet, or discussions at the whiteboard” • Example: commuting “mostly consists of driving car, but usually contains short walking instances as well”
Topic Models • Popular tool in text processing • Document is modeled as a mixture of topics where each topic is a mixture of words. • Authors use latent Dirichlet allocation (LDA) which is an extension of probabilistic latent semantic analysis (pLSA).
Topic Models (cont.) • pLSA (Hofmann 1999) • d is a document • w is a word in the corpus • z is a latent variable representing a topic. • Marginalize over topics to find probability of word given a document • Matrix representation
Words (w) Document (d) Topic (z) Topic activation in document p(z|d)
LDA • pLSI extended to LDA (Blei, Ng, Jordan 2003) • Add dirichlet prior α to document-topic distribution • Add prior β on topic-word distribution • Fitting model is equivalent to finding parameters α and β that maximize the likelihood:
Data Collection • One subject over 16 weekdays (164 hours total – 28 hours due to sensor failure) • Two sensors • Right hip pocket • Dominant wrist (right) • Sensor data • 3D accelerometer • Time • 9 binary tilt switches • Temperature • 2 light sensors • Samples collected at 100hz, sub-sampled to 2.5hz • 7 days of data were annotated with routines and activities
Annotation • 3 methods used to collect annotations • Experience sampling: subject periodically prompted on cell phone • Time diary: Written log • Camera snapshots: subject took photographs with camera on cell phone • Time diary was most effective, subject was often near laptop to record annotations • Experience sampling missed short events, posed redundant questions, and was less accurate for start and stop times of activities • All 3 methods were analyzed by the researchers for final labeling of activities and routines.
Activities in Dataset • 75 distinct activities. Filtering out those that occur only once or for very short duration and merging similar activities into single class they get 34 activities.
Routines in Dataset • 4 daily routines annotated (plus unlabelled class)
Activity Classification • HMM, SVM, and Naïve Bayes were evaluated • Naïve Bayes chosen for speed and only slightly lower accuracy (72.7%) • Meanand variance of accelerometer data used as features along with time of day. Frequency features did not improve recognition.
Topic Model • A document is constructed from a sliding window of length D • Posterior probability from the classifier used instead of the hard classifications • Summed over the window then normalized Example (not their data)
Results • 30 minute sliding window moved 2.5 min at a time. T=10 topics, α=0.01. • 6 days of data on top, one day left out on bottom
Quantifying Performance • Correlation • Perform LDA estimation on 6 of 7 days then assign each activity the topic to which the correlation to ground truth is highest • Then perform LDA inference on 7th day, note for each activity the correlation with its assigned topic • Recognition Performance • Use topic activation vectors as features for supervised learning • Perform estimation and inference on 6 of 7 days and classify activation vectors using nearest neighbor
Baseline Results • To obtain a baseline, an HMM was trained based on same features used for LDA
Supervised LDA Results • T=10 topics, document length 30 min. • Less than 30 minutes hurts performance • More than 10 topics can help but hurts interpretability • Shows improvement over HMM baseline LDA Baseline
Unsupervised Approach • Vocabulary • Use K-means clustering and set K to desired vocabulary size • For each feature istore distances to each of the Kcentroids then convert to weights with • Smaller distances = larger weights • Weights sum to 1
Unsupervised Results (cont.) Unsupervised Supervised
Supervised vs. Unsupervised • Supervised • Requires time consuming annotation • Results slightly better • Topics easily interpreted • Unsupervised • No need for annotation • Discover topics that were not obvious to annotators • Topics cannot be interpreted easily from the vocabulary
Strengths • Authors’ view • Concurrent activities • Overlapping activities • Transitions between activities shown by rising and falling activations • Decompose routines into their low-level constituents • Validation against existing approaches (HMM) • Correlation and recognition measures
Weaknesses/Extensions • Authors’ extensions • Semi-supervised learning • More sophisticated classification of activation vectors • Additional features such as location • Topic model is inherently a static model being adapted to dynamic data • Text processing community has developed hybrid HMM/LDA techniques as in “Integrating Topics and Syntax” by Griffiths, Steyvers, Blei, and Tenenbaum • The vocabulary could be n-grams instead of single activities