Mining Trajectory Profiles for Discovering User Communities

Mining Trajectory Profiles for Discovering User Communities Chih-Chieh Hung, Chih-Wen Chang, Wen-ChihPeng Speaker : Chih-Wen Chang National Chiao Tung University, Taiwan 2009.11.03

Outline • Motivation • Goal • Framework • Preprocess • Construct User’s Profiles • Formulate Distance function • Identify Community • Experiments • Conclusion

Motivation (1/2) • Rapid development of positioning techniques, users can easily collect their trajectories • GPS Logger, smart phones and navigation devices

Motivation (2/2) • Many GPS community sites are established • Users can share their own trajectories • Users can search trajectories Query Every Trail My tracks

Goal • Mine user communities from raw trajectories • User Communities • Sets of users who have similar moving behaviors • Applications • Find new friends • Recommendation • Rank of trajectories

3. Identify users communities 2. Formulate distance function 1. Construct User’s Profile Profile Community 1 Profile Measure Distance Between Users Community 2 Profile

Framework Preprocess Construct User’s Profile Measure Distance Between Users Identify Community

Preprocessing • Step 1: • Find frequent regions • Input: all trajectories of users • Output: frequent regions • Density-based approach • Step 2: • Transform trajectories into sequences of frequnet region id • T1 : <A, B, D>

Construct User’s Profiles (1/2) • User’s Profile • Probabilistic Suffix Tree (abbreviated as PST) • Find and organize trajectory patterns • Record the probability of next movements Frequently moving sequence Conditional tables (next possible movements)

Construct User’s Profiles (2/2) • Construct PST • Level by level • Two operations: • Create a child node • The counts of Before symbol > MinSup • Add a symbol into the related conditional table • The counts of After symbol > MinSup ABE ABA AC B ADF H JHI EDH ABE ABA AC B ADF H JHI EDH ABE ABA AC B ADF H JHI EDH root MinSup = 0.2 B A After symbol A : 1  1/2 = 0.5 E : 1  1/2 = 0.5 B:0.375 A:0.5 B:0.375 Before symbol A : 2  2/3 × 0.375 = 0.25 A AB:0.25

Formulate Distance function (1/3) • Determine distance of users • Transform the PST into Moving Sequence List Each element in moving sequence list is a branch of PST with their probability L1 [1..2] = <[(A,0.5)],[(B,0.375)(AB,0.33)]>

Formulate Distance function (2/3) • Define the distance between PSTs • Find the minimal dist(Li[1..m], Lj[1..n]) • Use three editing operations • Insertion L1={m1:0.3,m2:0.2,m3:0.3} L2={m1:0.3,m2:0.2} T1 Cost = 0.3 T2 0.2 L1={m1:0.3,m2:0.2,m3:0.3} L2={m1:0.3,m2:0.2,m3:0.3} 0.1 Insert

Formulate Distance function (3/3) • Deletion • Replacement L1={m1:0.2,m2:0.2,m3:0.2} L2={m1:0.2,m2:0.2,m3:0.2} T1 T2 0.3 Cost = 0.3 Replace L1={m1:0.2,m2:0.2,m3:0.2} L2={m1:0.2,m2:0.2,m4:0.3} L1={m1:0.2,m2:0.3} L2={m1:0.2,m2:0.3,m3:0.3} L1={m1:0.2,m2:0.3} L2={m1:0.2,m2:0.3,____} Delete Cost = 0.3+0.2 = 0.5 T1 T2 0.3 0.2 0.2

Identify Community (1/4) • User community • The same community: δMLS(Ti,Tj) < thresholdδ • The number of communities is minimal • Transform the relation between PSTs into a graph • A vertex represents a user • An edge exists between two vertices when δMLS(Ti,Tj) < thresholdδ O1 O4 O2 O3 O5

Identify Community (2/4) • Model as a minimum clique problem • A clique is a set of pair-wise adjacent vertices Example O4 O1 O5 O2 O3

Identify Community (3/4) • Select a representative PST for each community • Represent all PSTs in the same community • Advantages • Reduce the overhead of storages • Speed up query processing • Identify new users for their communities Add into ? Representative PST

Identify Community (4/4) • Two factors • Sizeof representative PST • The number of tree nodes, denoted as N(Ti) 2. Distance between the selected PST and others in the same community • The error sum, denoted as ES - Sum of the distance between selected PST and others • Representative PST • Minimize

Experiments (1/4) • Simulator Model • Use real trajectories from CarWebto simulate the group mobility of users • Total : 2400 trajectories

Experiments (2/4) • Compare to General Sequential Pattern mining algorithm (GSP) • Set of sequential patterns Ex. sp1, sp2, ..., spn • Trajectory profile of a user represented as a • Distance function between profiles • Cosine similarity measurement, similarity(Vi, Vj) = Example Similarity : <1,1,0,0> ． <0,1,1,1> |<1,1,0,0>||<0,1,1,1>|

Experiments (3/4) • Impact of Trajectory Profiles GSP are always larger than PST Especially in MinSup smaller than 0.15 Storage Prediction

Experiments (4/4) • Impact of the thresholdδ and MinSup • Smaller thresholdδ will find more number of communities Storage Prediction

Conclusion • Explore the problem of mining communities from trajectories Preprocess Find frequent regions Replace trajectories by region ids Construct User’s Profile Build probabilistic suffix tree (abbreviated as PST) Measure Distance Between Users Formulate distance function Identify Community Cluster users by distance function Select Representative PSTs

Thank you!

Mining Trajectory Profiles for Discovering User Communities