290 likes | 502 Views
Mining Trajectory Profiles for Discovering User Communities. Chih-Chieh Hung, Chih-Wen Chang, Wen-Chih Peng. Speaker : Chih-Wen Chang National Chiao Tung University, Taiwan 2009.11.03. Outline. Motivation Goal Framework Preprocess Construct User’s Profiles
E N D
Mining Trajectory Profiles for Discovering User Communities Chih-Chieh Hung, Chih-Wen Chang, Wen-ChihPeng Speaker : Chih-Wen Chang National Chiao Tung University, Taiwan 2009.11.03
Outline • Motivation • Goal • Framework • Preprocess • Construct User’s Profiles • Formulate Distance function • Identify Community • Experiments • Conclusion
Motivation (1/2) • Rapid development of positioning techniques, users can easily collect their trajectories • GPS Logger, smart phones and navigation devices
Motivation (2/2) • Many GPS community sites are established • Users can share their own trajectories • Users can search trajectories Query Every Trail My tracks
Goal • Mine user communities from raw trajectories • User Communities • Sets of users who have similar moving behaviors • Applications • Find new friends • Recommendation • Rank of trajectories
3. Identify users communities 2. Formulate distance function 1. Construct User’s Profile Profile Community 1 Profile Measure Distance Between Users Community 2 Profile
Outline • Motivation • Goal • Framework • Preprocess • Construct User’s Profiles • Formulate Distance function • Identify Community • Experiments • Conclusion
Framework Preprocess Construct User’s Profile Measure Distance Between Users Identify Community
Preprocessing • Step 1: • Find frequent regions • Input: all trajectories of users • Output: frequent regions • Density-based approach • Step 2: • Transform trajectories into sequences of frequnet region id • T1 : <A, B, D>
Framework Preprocess Construct User’s Profile Measure Distance Between Users Identify Community
Construct User’s Profiles (1/2) • User’s Profile • Probabilistic Suffix Tree (abbreviated as PST) • Find and organize trajectory patterns • Record the probability of next movements Frequently moving sequence Conditional tables (next possible movements)
Construct User’s Profiles (2/2) • Construct PST • Level by level • Two operations: • Create a child node • The counts of Before symbol > MinSup • Add a symbol into the related conditional table • The counts of After symbol > MinSup ABE ABA AC B ADF H JHI EDH ABE ABA AC B ADF H JHI EDH ABE ABA AC B ADF H JHI EDH root MinSup = 0.2 B A After symbol A : 1 1/2 = 0.5 E : 1 1/2 = 0.5 B:0.375 A:0.5 B:0.375 Before symbol A : 2 2/3 × 0.375 = 0.25 A AB:0.25
Framework Preprocess Construct User’s Profile Measure Distance Between Users Identify Community
Formulate Distance function (1/3) • Determine distance of users • Transform the PST into Moving Sequence List Each element in moving sequence list is a branch of PST with their probability L1 [1..2] = <[(A,0.5)],[(B,0.375)(AB,0.33)]>
Formulate Distance function (2/3) • Define the distance between PSTs • Find the minimal dist(Li[1..m], Lj[1..n]) • Use three editing operations • Insertion L1={m1:0.3,m2:0.2,m3:0.3} L2={m1:0.3,m2:0.2} T1 Cost = 0.3 T2 0.2 L1={m1:0.3,m2:0.2,m3:0.3} L2={m1:0.3,m2:0.2,m3:0.3} 0.1 Insert
Formulate Distance function (3/3) • Deletion • Replacement L1={m1:0.2,m2:0.2,m3:0.2} L2={m1:0.2,m2:0.2,m3:0.2} T1 T2 0.3 Cost = 0.3 Replace L1={m1:0.2,m2:0.2,m3:0.2} L2={m1:0.2,m2:0.2,m4:0.3} L1={m1:0.2,m2:0.3} L2={m1:0.2,m2:0.3,m3:0.3} L1={m1:0.2,m2:0.3} L2={m1:0.2,m2:0.3,____} Delete Cost = 0.3+0.2 = 0.5 T1 T2 0.3 0.2 0.2
Framework Preprocess Construct User’s Profile Measure Distance Between Users Identify Community
Identify Community (1/4) • User community • The same community: δMLS(Ti,Tj) < thresholdδ • The number of communities is minimal • Transform the relation between PSTs into a graph • A vertex represents a user • An edge exists between two vertices when δMLS(Ti,Tj) < thresholdδ O1 O4 O2 O3 O5
Identify Community (2/4) • Model as a minimum clique problem • A clique is a set of pair-wise adjacent vertices Example O4 O1 O5 O2 O3
Identify Community (3/4) • Select a representative PST for each community • Represent all PSTs in the same community • Advantages • Reduce the overhead of storages • Speed up query processing • Identify new users for their communities Add into ? Representative PST
Identify Community (4/4) • Two factors • Sizeof representative PST • The number of tree nodes, denoted as N(Ti) 2. Distance between the selected PST and others in the same community • The error sum, denoted as ES - Sum of the distance between selected PST and others • Representative PST • Minimize
Outline • Motivation • Goal • Framework • Preprocess • Construct User’s Profiles • Formulate Distance function • Identify Community • Experiments • Conclusion
Experiments (1/4) • Simulator Model • Use real trajectories from CarWebto simulate the group mobility of users • Total : 2400 trajectories
Experiments (2/4) • Compare to General Sequential Pattern mining algorithm (GSP) • Set of sequential patterns Ex. sp1, sp2, ..., spn • Trajectory profile of a user represented as a • Distance function between profiles • Cosine similarity measurement, similarity(Vi, Vj) = Example Similarity : <1,1,0,0> . <0,1,1,1> |<1,1,0,0>||<0,1,1,1>|
Experiments (3/4) • Impact of Trajectory Profiles GSP are always larger than PST Especially in MinSup smaller than 0.15 Storage Prediction
Experiments (4/4) • Impact of the thresholdδ and MinSup • Smaller thresholdδ will find more number of communities Storage Prediction
Outline • Motivation • Goal • Framework • Preprocess • Construct User’s Profiles • Formulate Distance function • Identify Community • Experiments • Conclusion
Conclusion • Explore the problem of mining communities from trajectories Preprocess Find frequent regions Replace trajectories by region ids Construct User’s Profile Build probabilistic suffix tree (abbreviated as PST) Measure Distance Between Users Formulate distance function Identify Community Cluster users by distance function Select Representative PSTs