GDC: Group Discovery using Co-location Traces

GDC: Group Discovery using Co-location Traces Steve Mardenfeld Daniel Boston Susan Juan Pan Quentin Jones† Adriana Iamntichi‡ Cristian Borcea Department of Computer Science, New Jersey Institute of Technology †Department of Information Systems, NJIT ‡Department of Computer Science, USF

Physical Groups Informally: groups of people that meet face to face Formal definition: Homans’ sociology book “The Human Group” Groups can be used in social or socially aware applications Recommender systems: recommend concerts to people who go to concerts together Data forwarding in delay-tolerant ad hoc networks: give priority to members of same group as destination when selecting next hop How to detect groups automatically?

Group Detection Using Location Traces Users carry mobile phones and upload location to central server Server analyzes location traces to detect groups In previous work, we developed an algorithm for group/place detection Achieved 96% accuracy with low false positives Problems: Location privacy Battery power

A B C GDC: Use Bluetooth Co-location Traces • Advantages • Improved location privacy • Low power consumption • Practicality due to Bluetooth ubiquity in mobile phones • Accuracy due to Bluetooth transmission range INTERNET

Challenges • Attendance at a group is variable • People may be merely passing near a group, not remaining part of it • Group members spend different lengths of time with the group • Sampling frequency and user mobility can affect data completeness • Each user may have a different perspective on the same meeting

Outline GDC Algorithm User Study Results Distributed GDC Conclusions

GDC in a Nutshell • Transform raw Bluetooth records into meeting records between pairs of users • Discover and record all combinations of users appearing at the same meeting (user clusters) • Resolve differences in user perspectives on shared clusters • Select all significant clusters and output as user groups

Creating Pair-wise Meeting Records Decreasing Meeting Granularity (MG) from 5 min to2 ½ min produces noticeable changes

Creating User Clusters

Creating Global Clusters • Resolve Perspective Differences • Use Minimum Group Time (MGT) • Use Minimum Group Meeting Frequency (MGMF)

Selecting the User Groups • Identify and remove subgroups of significant groups • Keep a subgroup if it meets double the time of the group that includes it

Complexity Analysis R - total number of Bluetooth records N - total number of users in the dataset L - maximum number of users in a group Small value because relatively few users are in the transmission range (10m) Our experiments: max = 15, avg = 6.8

Evaluation • Goals • Analyze effect of group meeting frequency and time • Compare GDC and K-Clique • K-Clique uses a time threshold to select graph edges and analyzes the graph for k-cliques • Experiments • Collect data from mobile phones carried by 100+ volunteer students on campus for one month • Run GDC and K-Clique on collected data • Also tested on Reality Mining data from MIT • Ask users to rank groups using Likert Scale • 1 to 5, 5 is best

Data Collection Details • 78 users each contributed less than 24 hours of recorded data • Sparse data: random volunteers, many students are commuters • Demographics: 72% male, 28% female, 25% graduate, 75% undergraduate

Detection accuracy increases significantly with meeting frequency and total meeting time Effect of Meeting Time and Frequency

GDC vs. K-Clique • Overall, GDC groups rated 30% better than the popular K-Clique algorithm • GDC groups are guaranteed to meet • Not all K-Clique groups meet • Some GDC groups are rated poorly because members don’t know their names GDC: MGT = 2000s MGMF = 2 K-Clique: Threshold 2000s

GDC Groups: NJIT Dataset vs. Reality Mining Dataset • Group distributions as a function of size are relatively similar despite the fact that Reality Mining is a denser dataset • NJIT: MGT = 2000s, MGMF = 1 • Reality Mining: MGT = 18000s, MGMF = 9 (normalized for 9 months)

Outline GDC Algorithm User Study Results Distributed GDC Conclusions

Distributed GDC (D-GDC) • GDC executed on the phones • Benefits • Better privacy • Avoid “Big Brother” scenario • Ability to control message exchange on a per-case basis • Resiliency: no bottleneck & no single point of failure • Flexibility: each user controls how often to runD-GDC

D-GDC Implementation • Collect Bluetooth records locally through message exchange • No global aggregation like in GDC • Control exchange with heuristic policies • These policies can be specified by users • Allows greater individual privacy control • Run remainder of GDC device-local • Evaluated using replay simulation over our real traces

Preliminary Results • Overall similarity: compute similarity of each user’s GDC groups against the closest matches in D-GDC and average the results • Compared D-GDC with a version running only on data collected locally by phones • D-GDC performs significantly better than local-only version

Conclusion • Physical groups enable new socially-aware features in applications • GDC: practical, high-accuracy, no location collection • Validated by users and outperforms K-Clique by 30% • Higher accuracy can be achieved by increasing frequency and time parameters • A decentralized version improves privacy and produces promising results

Thank You! • Mobius project: http://www.cs.njit.edu/~borcea/mobius/ • Acknowledgement: NSF grants CNS-0831753 and CNS-0834585

GDC: Group Discovery using Co-location Traces