260 likes | 409 Views
STAR-Tree Spatio-Temporal Self Adjusting R-Tree. John Tran Duke University Department of Computer Science Adviser: Pankaj K. Agarwal. Problem. Large Moving Data Sets Many static data structures exist, but not many account for motion, which is realistic. Examples of Use.
E N D
STAR-TreeSpatio-Temporal Self Adjusting R-Tree John Tran Duke University Department of Computer Science Adviser: Pankaj K. Agarwal
Problem • Large Moving Data Sets • Many static data structures exist, but not many account for motion, which is realistic
Examples of Use • Geographic Information Systems • Air-Traffic Control • Protein Interactions • Traffic Patterns
Defining the data • Can represent data as points in Rd • For our problem: • Set of data points in R2: S = {p1, p2, …, pn} • Can parameterize points to pi = (xi(t), yi(t)) • Piecewise differentiable velocities • Bounding boxes can be represented by 2 points
Queries • Query 1 – Report all points of S that lie inside rectangle R at time t
Queries • Query 2 – Report all points of S that lie inside rectangle R at any time between t1 and t2
Queries • Query 3 – Report the nearest neighbor of point in S
R-Tree • Bounding Box Hierarchy • All Children nodes are bound by parents bounding box • Points are stored in leaf nodes
STAR-Tree • Same concept as R-Tree • Incorporate movement into tree structure
Conflicts • As bounding boxes change, overlap occurs • Need to adjust for these overlap conflicts
Road Simplification • Road data from US Bureau of Census (TIGER) • Paths are determined using Dijkstra’s Shortest Path Algorithm • Shapes of these paths are typically simple but include many vertices • Simplify path using Douglas-Peucker heuristic (5 vertices max)
Road Simplification • Simplify road network • TIGER data is not perfect • Polygonal chain with vertex lists • Sometimes does not match roads that should be matched
Analysis of RDU Roads Vertices with n streets n streets
Analysis of RDU Roads Streets with n vertices n vertices
Problem • Match two proteins based on similarity or dissimilarity using intramolecular distance comparison
Data • Start from PDB files • Parse to get vertex list
Calculating Distance Matrix • Given a vertex list
Calculating Distance Matrix • Given a vertex list
Defining cost -GCTGATACTAGCT | |||| ||||| GGGTGAT-GTAGCT • Let g(k) = +(k-1) • is the cost of starting a new indel gap • is the cost of continuing a gap
Cost Function • E(i,j) = min{D(i,j-1) + , E(i,j-1) + } • F(i,j) = min{D(i-1,j) + , F(i-1,j) + } • D(i,j) = min{D(i-1,j-1) + (i,j), E(i,j), F(i,j)} • Where (i,j) = normalized sum of difference distance between Ai and all the matched vertices and Bj to the corresponding matched vertices