341 likes | 508 Views
Index for city trajectories. Mobile Group, IDKE Renmin University of China Caifeng Lai. 城市路网的特点:. 城市路网中的路线一般较为平直,且地面一般较为平坦,现代城市的路网规划也较为合理,因此,我们考虑可以用一种很简单的方法对其降维。 城市路网中的车流量较大,在同一路线上的距离相近的移动对象的运动状态基本相似,移动对象也比较密集,因此我们可以考虑将这类的移动对象封装起来,在其之上建立索引,然后进行相应的操作。. 两个解决的方法. 降维 分为两种情况:
E N D
Index for city trajectories Mobile Group, IDKE Renmin University of China Caifeng Lai
城市路网的特点: • 城市路网中的路线一般较为平直,且地面一般较为平坦,现代城市的路网规划也较为合理,因此,我们考虑可以用一种很简单的方法对其降维。 • 城市路网中的车流量较大,在同一路线上的距离相近的移动对象的运动状态基本相似,移动对象也比较密集,因此我们可以考虑将这类的移动对象封装起来,在其之上建立索引,然后进行相应的操作。
两个解决的方法 • 降维 分为两种情况: • 如果路线的斜率小于1,则将路线上的点映射到x轴上,求出其相对该路线的相对位置 • 反之,则映射到y轴上,求出其相对该路线的相对位置 • 分组移动对象 • 将在同一路线上的、距离相近、移动方向一致、速度接近的移动对象组合成一组,将其称为AU。
Group moving objects Previous approach –TPR-tree ( t interval ) Our approach –AU ( t interval ) Previous approach –TPR-tree ( t+1 interval ) Our approach –AU ( t+1 interval )
AU index Architecture • The AU index scheme consists of four components: • The top R*-tree captures the network connectivity. Each leaf node contains several road segments. • The bottom R*-tree contains current-AU index and past-AU index. • The current-AU is a main memory index which represents the current state of each AU. • The past-AU represents the past state of each AU and was left in a container on disk.
root R* tree current AU leaf past AU T AU index Architecture (cont.)
R-tree的特点 • R-tree是B-Tree对多维对象(点和区域)的扩展 • R-tree是一棵平衡树 • 一个多维对象只能被分到一个子空间中去 • 若用动态插入算法构建R-tree,在树的结点中会引起过多的空间重叠和死区(dead-space),使算法性能降低
R-tree的典型算法 • 查找 • 插入 • 选择叶子结点 • 分裂结点(有多种算法) • 调整树 • 必要时增加树的高度 • 删除 • 找到包含要删除记录的叶子结点 • 删除 • 压缩树 • 必要时减小树的高度 • 更新 • 先删除老的记录索引,在插入新的记录索引
R-tree的典型算法 • 查找 • 插入 • 选择叶子结点 • 分裂结点(有多种算法) • 调整树 • 必要时增加树的高度 • 删除 • 找到包含要删除记录的叶子结点 • 删除 • 压缩树 • 必要时减小树的高度 • 更新 • 先删除老的记录索引,在插入新的记录索引
R*-Tree(N. Beckmann SIGMOD’1990) • R*-Tree通过修改插入、分裂算法,并通过引入强制重插机制对R-Tree的性能进行改进。 • R*-Tree和R-Tree一样允许矩形的重叠, • R*-Tree在选择插入路径时同时考虑矩形的面积、空白区域和重叠的大小,而R-Tree只考虑面积的大小。
R*-tree • Recursively cluster objects into minimum bounding rectangles (MBR). • Organize the MBRs into a dynamic, disk-based, balanced tree structure, similar to the B+-tree.
P7 R4 R6 R5 R6 R3 R4 P8 P6 P1 R3 R1 R2 R1 P5 P2 P3 R5 P1 P2 P3 P4 P5 P6 P7 P8 R2 P4 The R*-tree (cont.)
MAP current-AU • At a certain time, in each road segments, several current-AU exits, the movement of current-AU is performed as function of time. • current-AU is created by relatively strict rules according to the features of traffic system, so they scarcely expend too much and will not cause significant overlaps.
past-AU • Past-AU is designed for indexing vehicle’s historical movements. • An interesting property about the history records is that the tuples are sorted by their end time stamps, so we can store these history records in the end time order. • We save the current-AU as past-AU, to meet the need of history query, we store these history records in the form of data stream.
Current-AU & Past-AU • Current-AU is a finite sequences of tuples. • Past-AU is a possibly infinite sequences of tuples, so it can be represented to be a stream. • Streams come with the natural (partial) order implied by their time attribute.
Indexing Moving Objects • Index on moving objects can be classified into two different types: • one is indexing current and anticipated future positions of moving object • TPR-tree (time-parameterized R-tree) [11] is a balanced, multi-way tree with the structure of an R-tree that can answer prediction queries on moving objects. • the other is indexing histories of the positions of moving objects • mainly focus on summarized data (or aggregate information) • Aggregate R-tree
What is data stream ? • A data stream is a continuous, time-varying, unbounded sequence of data-items. Stream items usually take the form of relational tuples that are disposed after they get processed, implying that online stream algorithms are restricted to only one pass over the data. Data arrival rates are arbitrary and cannot be controlled by the processing system. They are unpredictable and fluctuate vastly over time. Moreover, explicitly storing a stream in its entirety is impossible, due to its unbounded nature.
数据流 • 数据流处理方法比传统的处理方法要快很多.尽管该方法得到的结果并不精确,但是往往并不影响用户的最终决策. • 数据流的处理思路就是设计高效的单遍数据集扫描算法,在一个远小于数据规模的内存空间里不断更新一个代表数据集的结构——概要数据结构,从而实时、高效地获得近似查询结果. • 我将考虑采用一种比较简单的数据流处理方法来获得移动对象的聚集信息。
Type Definition • City trajectory network • Regions: regionid, MBR • roads: rid, ps,pe // start point,end point • lines: lid, lps,lpe // linear start point, linear end point. • Current-AU: auid, ts,pos, length, v, m// start time, end time, the relative position of the polyline,v is a vector, the total moving object of the AU • Past-AU: auid, ts,te, plid, pos, length, v, m • Mo: moid, (x,y), v // v is a vector
current-AU operations • Current-AU supports the following operations: • Create_AU() returns an AU at the entrance of a road segment. • the algorithm selects appropriate moving objects which have the same direction, nearly velocity and distance, to buildup an AU, and take the AU as one moving objects. • Drop_AU() is specific in AU index. • When vehicles goes out the road segment, their direction may change thus can not fit the roles of AU. So we drop it to disk for the AU records all details of the movements in this segment. • Delete_obj() and Insert_obj() is invoked when the position of the vehicle is outside of the AU at a certain time. • It finds the nearby AU in a greedy manner (including the original AU) to check yes or not it fit the rules for create AU. If an AU is fined, an Insertion will be operated, otherwise, a new AU is created for this moving object.
Query • Window query • Range query • NN queries: “find which object became the closest to a given point s during time interval T,” • Aggregate queries: “find how many objects passed through area Q during time interval T,” or, “find the fastest object that will pass through area Q in the next 5 minutes from now” • similarity queries: “find objects that moved similarly to the movement of a given object o over an interval T”
MON-Tree index Trajectory Bundle tree (TB-tree) Strictly preserves trajectories: Each leaf node only contains line segments belonging to the same trajectory. Strength of TB-tree is the efficient retrieval of trajectory for a small number of moving objects. SEB-tree Range query Time slice query Related work
MON-Tree index • The index structure consists of three components: • a 2D R-Tree (the top R-Tree) • indexing polyline bounding boxes • a set of 2D R-Trees (the bottom R-Trees) • indexing objects’ movements along the polylines. • a hash structure in the top level containing entries of the form hpolyid, bottreepti, The hash structure is organized by polyid. • polyidis the polyline identification • bottreept is a pointer to the corresponding bottom R-Tree. • Hence, we have two top level index structures: an R-Tree and a hash structure
SEB-tree (Start/End time stamp B-tree) (MDM2003) • Suppose n is the population of objects. In zone z, after a period of time, N tuples, which include both the current status records and the history records, are generated. There is at most one current status record for each object, therefore, at most n among N records could be the current status record, where n N.The format of each tuple is (id, z, ts, te). Since all records have identical zoneids, we omit field z. Now each tuple has three fields (id, ts, te), where ts < te.The maximum number of records that can be stored in one disk page is B.
An interesting property about the history records is that the tuples are sorted by their end time stamps, To index the history records, the first step is to construct a 2-dimensional space where the start time stamp and the end time stamp are the horizontal and the vertical axes. Each record are mapped to a point in the space. Since in each record ts < te, the points are all in upper-left half of the plane.
Three steps are executed when we insert a new point: 1) check the start stamp list and find the column where the point is inserted; 2) check the end stamp list and find the last node in the column. If the node is full, create a new node in the column; 3) insert the point into the node. The cost for each step is O(logB N), O(1) and O(1), and the overall cost for one insertion is O(logB N).
The query becomes to retrieve all points in the polygon. Observe that if a leaf node is inside the polygon, the points in the node must be in the query results; and if a leaf node is intersected with the polygon, the points in it require further check. The total query cost is O((logB N) + ((n logB n)/B) + ((n + B)/B) + (t/B)), where t is the size of the results. The first part logB N is for searching the columns that contains t1 and t2; the second part (n logB n)/B is for searching one node of each n/B 4 columns before column i; the third part (n+B)/B is for searching column j and the fourth part t/B is for retrieving the nodes that fall into the query range.
TPR-Tree • Entries of TPR-Tree in leaf nodes are pairs of the position of a moving point which is represented by a reference position and a corresponding velocity – (x, v). Reference position is the location at time tref which usually takes the index creating time tl. • A internal node of TPR-Tree is represented with • (1) minimum bounding rectangle (MBR) • (2) a velocity vector
Query future location • When answering query about future location, TPR-tree employs a simple linear function x(t) =x(tref ) + v(t − tref ) to predict object position at future time t. • The MBRs at the future query time will also be computed based on the current extents and velocity vectors. • So the queries at future time can be processed in exactly the same way as in the tradition R-Tree, except that the extents of the MBRs are computed dynamically and then compared with the query window. • TPR-Tree is a successful index method in answering queries on moving objects. But due to its simple prediction method, it is unavoidably inaccuracy on prediction.