270 likes | 290 Views
Efficient Trajectory Joins using Symbolic Representations. Petko Bakalov, Marios Hadjieleftheriou, Eamonn Keogh, Vassilis J. Tsotras University of California, Riverside. Overview. Motivation Problem description Trajectory join evaluation Symbolic representation and distance measure
E N D
Efficient Trajectory Joins using Symbolic Representations Petko Bakalov, Marios Hadjieleftheriou, Eamonn Keogh, Vassilis J. Tsotras University of California, Riverside
Overview • Motivation • Problem description • Trajectory join evaluation • Symbolic representation and distance measure • Join algorithm • Index structure • Experimental results • Conclusion and future work
Motivation (1) • Spatiotemporal data is generated in many novel applications • Typical queries involve “range” or “nearest neighbor” • Example: X time
Motivation (2) • In this paper we address a novel query, the Trajectory Join: “identify all pairs of similar trajectories between two datasets”. • Example - “Find the pairs of supply trucks that were never apart from each other for more than 1 mile this morning”
Problem description • A moving object trajectory is defined as a sequence of location/time-instant pairs. • Given: two sets of object trajectories R and S, distance function D(), a thresholdε and time intervalδt • Trajectory join result: set V of pairs <Ri, Sj>, where RiЄ R and Sj Є S, such that during time-interval δt their distance is Dδt(Ri, Sj) < ε
Problem description (2) T ε δt Y X
Trajectory join evaluation • Naïve Solution - compare each trajectory contained in the first dataset with all trajectories contained in the second. – O(n2) • Proposed solution • Use appropriate trajectory approximation to reduce the size of the problem • Define a lower bound distance function for this approximation • Using this distance function, prune as many trajectory pair similarity evaluations as possible.
Trajectory approximation • Approximation Requirements: • Should support lower-bounding measures for a large number of trajectory distance functions • Should allow varying approximation accuracy according to given space constraints • Be amenable to efficient indexing that enables fast computation of the lower-bounding distances • Piecewise Aggregate Approximation (PAA) has the aforementioned properties
PAA • Proposed by Lin et al. in DMKD 2003 • The algorithm takes as input a trajectory of size n and divides it into m equal-sized frames • The trajectory value in each frame is substituted with the average of the values X D C B A Time
PAA X C C B C D B D C B A Time Hence the above trajectory is represented by the “symbolic string”: CCBCDB
Lower bound distance function • Having defined a symbolic representation for trajectories we need a distance function that lower bounds the Euclidian distance between trajectories:
Lower bound distance function • We propose the use of the following distance function, defined over symbolic representations of trajectories • Here, d() is the distance between two alphabet symbols
EFDBAABCD DDDDCAAAC Lower bound function X F E D C B Hence, the distance btw two trajectories is now computed between their “symbolic strings” A Time
Problem reduction • Nevertheless: • Hence two trajectories join if in their “symbolic strings” each pair of corresponding symbols is no more than apart. • Here is computed from and the grid discretization.
Join algorithm over strings (1) • Each “symbolic string” of length t can be viewed as a point in the t-dimensional space • We can define an order among these strings by choosing an origin O and sorting them according to their distance to O. • For all strings in both sets we compute their distance to origin O and then place them on a 1-dimensional line:
Join algorithm over strings (2) • Two strings join if their corresponding symbols are no further apart than the threshold • To do the join we use a sliding window algorithm on the sorted list of strings. The size of the window is • Since is a metric: if the distance between the strings is bigger than then the actual trajectories lie apart more than for at least one time instant.
Join algorithm over strings (3) • The algorithm works in two steps • Step 1 • Position the center of the window over the first string of dataset • For all strings in the dataset report all candidate pairs • Continue with the next string from dataset • Step 2 – Load the actual trajectory data for all candidate pairs and verify the result – required since the first step may produce false positives (but never false negatives)
Join algorithm - Example • Position the window over the first element in the set , string • There are no strings from set falling inside the window so nothing is reported.
Join algorithm - Example • Position the window over the second element in the set string • falls inside the window so we report candidate pair -
Join algorithm - Example • Position the window over the third element in the set string • There are no strings from set falling inside this window, so nothing is reported.
Join algorithm - Example • In phase 2, the trajectories of the candidate pair , produced in step 1 are retrieved from disk and checked if they indeed satisfy the join criteria.
Indexing scheme (1) • A join query identifies a time interval δt. Given δt we need to join only the trajectory substrings within δt. We need only this trajectory substrings which correspond to the time interval δt. • To do this faster we have an indexing scheme that quickly identifies trajectory substrings by time frame.
Conclusion and future work • We proposed a technique that uses symbolic trajectory representations to build a very small index structure that can quickly evaluate answers to join queries • As a future work we plan to extend our work for more general trajectory joins without any temporal constraints