1 / 128

Spatial and Temporal Data Mining

Explore methods like k-d trees, Point Quadtrees, and R-trees to store and query geometric data efficiently on disk. Understand how to answer point, range, k-nn queries, and spatial joins using spatial access methods.

blancal
Download Presentation

Spatial and Temporal Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial and Temporal Data Mining V. Megalooikonomou Spatial Access Methods (SAMs) I (slides are based on notes by C. Faloutsos)

  2. General Overview • Multimedia Indexing • Spatial Access Methods (SAMs) • k-d trees • Point Quadtrees • MX-Quadtree • z-ordering • R-trees

  3. SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees

  4. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer spatial queries (like??)

  5. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)

  6. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)

  7. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)

  8. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ queries)

  9. Spatial Access Methods - problem • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries • spatial joins (‘all pairs’ within ε)

  10. SAMs - motivation • Q: applications?

  11. SAMs - motivation traditional DB GIS age salary

  12. SAMs - motivation traditional DB GIS age salary

  13. SAMs - motivation CAD/CAM find elements too close to each other

  14. SAMs - motivation CAD/CAM

  15. SAMs - motivation eg,. std S1 F(S1) 1 365 day F(Sn) Sn eg, avg 1 365 day

  16. SAMs: solutions • K-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees • (grid files) Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page)

  17. SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees

  18. k-d trees • Used to store k dimensional point data • It is not used to store region data • A 2-d tree (i.e., for k=2) stores 2-dimensional point data while a 3-d tree stores 3-dimensional point data, etc.

  19. 2-d trees – node structure • Binary trees • Info: information field • Xval,Yval: coordinates of a point associated with the node • Llink, Rlink: pointers to children • Properties (N: node): • If level N even -> • for all nodes M in the subtree rooted at N.Llink: M.Xval < N.Xval • for all nodes P in the subtree rooted at N.Rlink: P.Xval >= N.Xval • If level N odd -> • Similarly use Yvals

  20. 2-d trees – Example

  21. 2-d trees: Insertion/Search • To insert a node N into the tree pointed by T • If N and T agree on Xval, Yval then overwrite T • Else, branch left if N.Xval < T.xval, right otherwise (even levels) • Similarly for odd levels (branching on Yvals)

  22. 2-d trees – Example of Insertion Splitting of region by Banja Luka Splitting of region by Derventa Splitting of region by Toslic Splitting of region by Sinj

  23. 2-d trees: Deletion • Deletion of point (x,y) from T • If N is a leaf node easy • Otherwise either Tl (left subtree) or Tr (right subtree) is non-empty • Find a “candidate replacement” node R in Tl or Tr • Replace all of N’s non-link fields by those of R • Recursively delete R from Ti • Recursion guaranteed to terminate - Why?

  24. 2-d trees: Deletion • Finding candidate replacement nodes for deletion • Replacement node R must bear same spatial relation to all nodes in Tl and Tr as node N

  25. 2-d trees: Range Queries • Q: Given a point (xc, yc) and a distance r find all points in the 2-d tree that lie within the circle • A: Each node N in a 2-d tree implicitly represents a region RN – If the circle (specified by the query) has no intersection with RN then there is no point in searching the subtree rooted at node N

  26. SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • z-ordering • R-trees

  27. Point Quadtrees • Represent point data • Always split regions into 4 parts • 2-d tree: a node N splits a region into two by drawing one line through the point (N.xval, N.yval) • Point quadtree: a node N splits a region by drawing a horizontal and a vertical line through the point (N.xval, N.yval) • Four parts: NW, SW, NE, and SE quadrants • Q: Quadtree nodes have 4 children?

  28. Point Quadtrees • Nodes in point quadtrees represent regions

  29. Point quadtrees - Insertion Splitting of region by Banja Luka Splitting of region by Derventa Splitting of region by Toslic Splitting of region by Tuzla Splitting of region by Sinj

  30. Point Quadtrees - Insertion

  31. Point quadtrees: Deletion • Deletion of point (x,y) from T • If N is a leaf node easy • Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is non-empty • Find a “candidate replacement” node R in one of the subtrees such that: • Every other node R1 in N.NW is to the NW of R • Every other node R2 in N.SW is to the SW of R • etc… • Replace all of N’s non-link fields by those of R • Recursively delete R from Ti • In general, it may not always be possible to find such as replacement node • Q: What happens in the worst case?

  32. Point quadtrees: Deletion • Deletion of point (x,y) from T • If N is a leaf node easy • Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is non-empty • Find a “candidate replacement” node R in one of the subtrees such that: • Every other node R1 in N.NW is to the NW of R • Every other node R2 in N.SW is to the SW of R • etc… • Replace all of N’s non-link fields by those of R • Recursively delete R from Ti • In general, it may not always be possible to find such as replacement node • Q: What happens in the worst case? May require all nodes to be reinserted

  33. Point quadtrees: Range Searches • Each node in a point quadtree represents a region • Do not search regions that do not intersect the circle defined by the query

  34. SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees

  35. MX-Quadtrees • Drawbacks of 2-d trees, point quadtrees: • shape of tree depends upon the order in which objects are inserted into the tree • splits may be uneven depending upon where the point (N.xval, N.yval) is located inside the region (represented by N) • MX-quadtrees: shape (and height) of tree independent of number of nodes and order of insertion

  36. MX-Quadtrees • Assumption: the map is represented as a grid of size (2k x 2k) for some k • When a region gets “split” it splits down the middle

  37. MX-Quadtrees - Insertion After insertion of A, B, C, and D respectively

  38. MX-Quadtrees - Insertion After insertion of A, B, C, and D respectively

  39. MX-Quadtrees - Deletion • Fairly easy – why? • All point are represented at the leaf level • Total time for deletion: O(k)

  40. MX-Quadtrees –Range Queries • Same as in point quadtrees • One difference: • Checking to see if a point is in the circle defined by the range query needs to be performed at the leaf level (points are stored at the leaf level)

  41. SAMs - Detailed outline • spatial access methods • problem dfn • k-d trees • point quadtrees • MX-quadtrees • z-ordering • R-trees

  42. z-ordering Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page) Hint: reduce the problem to 1-d points(!!) Q1: why? A: Q2: how?

  43. z-ordering Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page) Hint: reduce the problem to 1-d points (!!) Q1: why? A: B-trees! Q2: how?

  44. z-ordering Q2: how? A: assume finite granularity; z-ordering = bit-shuffling = N-trees = Morton keys = geo-coding = ...

  45. z-ordering Q2: how? A: assume finite granularity (e.g., 232x232 ; 4x4 here) Q2.1: how to map n-d cells to 1-d cells?

  46. z-ordering Q2.1: how to map n-d cells to 1-d cells?

  47. z-ordering Q2.1: how to map n-d cells to 1-d cells? A: row-wise Q: is it good?

  48. z-ordering Q: is it good? A: great for ‘x’ axis; bad for ‘y’ axis

  49. z-ordering Q: How about the ‘snake’ curve?

  50. z-ordering Q: How about the ‘snake’ curve? A: still problems: 2^32 2^32

More Related