230 likes | 365 Views
Contextual and Social Media Understanding and Usage. Contexts-As-Clustering Making Sense of Social Contexts from Low-level Sensory Data. Dinh Phung Curtin University of Technology, Australia (joint work with Brett Adams, Svetha Venkatesh ). Motivation.
E N D
Contextual and Social Media Understanding and Usage Contexts-As-ClusteringMaking Sense of Social Contexts from Low-level Sensory Data Dinh Phung Curtin University of Technology, Australia (joint work withBrett Adams, Svetha Venkatesh )
Motivation • Contexts provides fundamental units for context-aware applications • But, what sorts of context? how to extract them? • Sparseness problem: • in user’s behaviours: power on/off inconsistently • during data collection: signal loss, measurement errors, • in structure of social activity
Approaches • Non-parametric clustering: DBSCAN, Affinity Propagation • Scale well with data size, can deal with online and incremental nature • Robust to outliers noise, easy to incorporate constraints • Applications: • Extraction of significant places (social functions) from GPS data • Social rhythms as a combinatorial clustering process • Probabilistic clustering: latent Dirichlet allocation • Can jointly model statistical strengths across multi-modality, co-occurrences, dynamic behaviours, temporal behaviours, group membership, dyadic data, etc. • Applications: • Computable high-order patterns from location data • Topic inference in blogspheres
Case 1: Locations from GPS • Minor signal noise from fix tolerance and lag • Major signal noise from signal loss • Usage inconsistency
Locations • Preprocessing • Removal of points above a speed threshold • Often missing precisely the samples we want! (e.g. buildings) • Interpolation within a day and across days
ε p q D Density-base clustering • Clustering using DBSCAN • Handles arbitrary cluster shapes (GPS is trajectory data) • No initialization required and is deterministic • Excludes noise, outliers and abnormal points • Incremental version: Maximally density connected Directly density reachable Density reachable
Applications • Much more super-contexts can be derived: • social tie, co-location patterns, rough measure of entropy of daily day, or implications on social relationship. • Socially collaborativeinference! • Social Context-Aware Media Browsing:
Case 2: Rhythms Extractions Social rhythms: complex set of projections of repeated occurrences on dimensions of people, place and time. Extraction of rhythms as a combinatorial clustering process by folding in certain dimensions!
Rhythms extraction • Rare vs. Frequent rhythms • Functional of (normalized) time experienced at a place • Timed vs Optional rhythms • Social aspect of punctuality • Relational rhythms • People-based clustering
Rhythms extraction • Relational
Case 3: Computable Patterns • Repeated patterns found in daily activities over time. • Driven by a social theme – a cognitive aspect of mind: • need to go work everyday, pick up children every Tuesday • going to church on Sunday, going to gyms to keep fit, … • Go beyond simple counts: require order of activities over time to derive patterns. • Challenge: need to exploit and do clustering on ngram-style order statistics.
Approach Translate social data into ‘text’ documents and use Bayesian document modelling tools! Translate social data into ‘text’ documents and use Bayesian document modelling tools!
Social codebook • Social footprint = <start time, duration, location label> • Translate each footprint into a code • start time = 1,2,3,…, 24 (24 hours) • duration = short, medium, long • location label = {set of unique names} • Each day is translated into a document = social page • A collection of social pages = social corpus
Latent Social Dirichlet Alloc. (LSDA) • Extend latent Dirichlet allocation (LDA, Blei ’03) to generate ngrams rather than single words. • Can be viewed as (i) a clustering method, or (ii) discrete dimensionality reduction method: • a word = a social code • a topic = distribution over ngram of codes • Inference can be done efficiently with Gibbs. • A personalized version of Ngram Topic Model (Wang ’07)
Experiments • Data collected over 1.5 years, 8 subjects, 10+ millions GPS samples • noisy, fragmented and very sparse • Family, friends and workmates • Exhibits all types of noise • 1709 footprints found, codebook size = 381 • Small hyper-parameters ( << 1) to favour discriminative patterns and themes.
Top themes and patterns WEEKDAYS
Top themes and patterns WEEKENDS
Conclusion • What is context? • Traditional view: as a representation problem (Dourish ’04) • Context is a form a information • Context is delineable • Context is stable • Context and Activities (contents) are separable • Dourish’s view: context as an interaction problem! • Contexts possess dynamic properties • Contexts driven by contents and vice versa. What to proceed from here? Is context domain-specific? Is there a unified framework?
Rhythms extraction • Ranked timed rhythms