1.28k likes | 1.3k Views
Explore methods like k-d trees, Point Quadtrees, and R-trees to store and query geometric data efficiently on disk. Understand how to answer point, range, k-nn queries, and spatial joins using spatial access methods.
E N D
Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) I (slides are based on notes by C. Faloutsos)
General Overview • Multimedia Indexing • Spatial Access Methods (SAMs) • k-d trees • Point Quadtrees • MX-Quadtree • z-ordering • R-trees
SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees
Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer spatial queries (like??)
Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)
Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)
Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)
Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)
Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ within ε)
SAMs - motivation • Q: applications?
SAMs - motivation traditional DB GIS age salary
SAMs - motivation traditional DB GIS age salary
SAMs - motivation CAD/CAM find elements too close to each other
SAMs - motivation CAD/CAM
SAMs - motivation eg,. std S1 F(S1) 1 365 day F(Sn) Sn eg, avg 1 365 day
SAMs: solutions • K-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees • (grid files) Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page)
SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees
k-d trees • Used to store k dimensional point data • It is not used to store region data • A 2-d tree (i.e., for k=2) stores 2-dimensional point data while a 3-d tree stores 3-dimensional point data, etc.
2-d trees – node structure • Binary trees • Info: information field • Xval,Yval: coordinates of a point associated with the node • Llink, Rlink: pointers to children • Properties (N: node): • If level N even -> • for all nodes M in the subtree rooted at N.Llink: M.Xval < N.Xval • for all nodes P in the subtree rooted at N.Rlink: P.Xval >= N.Xval • If level N odd -> • Similarly use Yvals
2-d trees: Insertion/Search • To insert a node N into the tree pointed by T • If N and T agree on Xval, Yval then overwrite T • Else, branch left if N.Xval < T.xval, right otherwise (even levels) • Similarly for odd levels (branching on Yvals)
2-d trees – Example of Insertion Splitting of region by Banja Luka Splitting of region by Derventa Splitting of region by Toslic Splitting of region by Sinj
2-d trees: Deletion • Deletion of point (x,y) from T • If N is a leaf node easy • Otherwise either Tl (left subtree) or Tr (right subtree) is non-empty • Find a “candidate replacement” node R in Tl or Tr • Replace all of N’s non-link fields by those of R • Recursively delete R from Ti • Recursion guaranteed to terminate - Why?
2-d trees: Deletion • Finding candidate replacement nodes for deletion • Replacement node R must bear same spatial relation to all nodes in Tl and Tr as node N
2-d trees: Range Queries • Q: Given a point (xc, yc) and a distance r find all points in the 2-d tree that lie within the circle • A: Each node N in a 2-d tree implicitly represents a region RN – If the circle (specified by the query) has no intersection with RN then there is no point in searching the subtree rooted at node N
SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • z-ordering • R-trees
Point Quadtrees • Represent point data • Always split regions into 4 parts • 2-d tree: a node N splits a region into two by drawing one line through the point (N.xval, N.yval) • Point quadtree: a node N splits a region by drawing a horizontal and a vertical line through the point (N.xval, N.yval) • Four parts: NW, SW, NE, and SE quadrants • Q: Quadtree nodes have 4 children?
Point Quadtrees • Nodes in point quadtrees represent regions
Point quadtrees - Insertion Splitting of region by Banja Luka Splitting of region by Derventa Splitting of region by Toslic Splitting of region by Tuzla Splitting of region by Sinj
Point quadtrees: Deletion • Deletion of point (x,y) from T • If N is a leaf node easy • Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is non-empty • Find a “candidate replacement” node R in one of the subtrees such that: • Every other node R1 in N.NW is to the NW of R • Every other node R2 in N.SW is to the SW of R • etc… • Replace all of N’s non-link fields by those of R • Recursively delete R from Ti • In general, it may not always be possible to find such as replacement node • Q: What happens in the worst case?
Point quadtrees: Deletion • Deletion of point (x,y) from T • If N is a leaf node easy • Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is non-empty • Find a “candidate replacement” node R in one of the subtrees such that: • Every other node R1 in N.NW is to the NW of R • Every other node R2 in N.SW is to the SW of R • etc… • Replace all of N’s non-link fields by those of R • Recursively delete R from Ti • In general, it may not always be possible to find such as replacement node • Q: What happens in the worst case? May require all nodes to be reinserted
Point quadtrees: Range Searches • Each node in a point quadtree represents a region • Do not search regions that do not intersect the circle defined by the query
SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees
MX-Quadtrees • Drawbacks of 2-d trees, point quadtrees: • shape of tree depends upon the order in which objects are inserted into the tree • splits may be uneven depending upon where the point (N.xval, N.yval) is located inside the region (represented by N) • MX-quadtrees: shape (and height) of tree independent of number of nodes and order of insertion
MX-Quadtrees • Assumption: the map is represented as a grid of size (2k x 2k) for some k • When a region gets “split” it splits down the middle
MX-Quadtrees - Insertion After insertion of A, B, C, and D respectively
MX-Quadtrees - Insertion After insertion of A, B, C, and D respectively
MX-Quadtrees - Deletion • Fairly easy – why? • All point are represented at the leaf level • Total time for deletion: O(k)
MX-Quadtrees –Range Queries • Same as in point quadtrees • One difference: • Checking to see if a point is in the circle defined by the range query needs to be performed at the leaf level (points are stored at the leaf level)
SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees
z-ordering Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page) Hint: reduce the problem to 1-d points(!!) Q1: why? A: Q2: how?
z-ordering Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page) Hint: reduce the problem to 1-d points (!!) Q1: why? A: B-trees! Q2: how?
z-ordering Q2: how? A: assume finite granularity; z-ordering = bit-shuffling = N-trees = Morton keys = geo-coding = ...
z-ordering Q2: how? A: assume finite granularity (e.g., 232x232 ; 4x4 here) Q2.1: how to map n-d cells to 1-d cells?
z-ordering Q2.1: how to map n-d cells to 1-d cells?
z-ordering Q2.1: how to map n-d cells to 1-d cells? A: row-wise Q: is it good?
z-ordering Q: is it good? A: great for ‘x’ axis; bad for ‘y’ axis
z-ordering Q: How about the ‘snake’ curve?
z-ordering Q: How about the ‘snake’ curve? A: still problems: 2^32 2^32