300 likes | 431 Views
CPSC 695 Week 6. Query Processing in Databases Dr. M. Gavrilova. Overview. Introduction I/O algorithms for large databases Complex geometric operations in graphical querying Applications. Introduction. Geometric algorithms studied before dealt with RAM
E N D
CPSC 695 Week 6 Query Processing in Databases Dr. M. Gavrilova
Overview • Introduction • I/O algorithms for large databases • Complex geometric operations in graphical querying • Applications
Introduction • Geometric algorithms studied before dealt with RAM • In databases, a problem of accessing “pages” of memory stored on disk is encountered. • We will see how traditional algorithm design techniques can be useful.
Example • 4 pages of memory, 10 items in each • To list all sequentially, 4 disks accesses is required • To randomly list items – up to 40 disk accesses is require if only 1 page is loaded in the memory once –too expensive!
PART 1 Techniques for large data sets • External sorting • Distributed sweeping • Two-step processing
“External” sorting problem • n pages in dataset on disk • m pages of memory, m < n
“Divide and conquer” strategy • Step 1. Sorted “runs” of size m are created in memory, then written to disk. Used internal sorting algorithms. • Step 2. Load some number of first records from each run into memory, merge them in the sorted order. Once a block is sorted, write it back to disk. • Complexity: O(n logmn )
Distributed sweeping Segment intersection problem, orthogonal segments. In RAM – sweep-line algorithm, O(n logn+k ) n – number of segments, k – number of intersections. In DB – O(n logmn + k ) algorithm, m – number of pages in RAM. Range query v Sweep-line
Distributed sweeping • Idea: split space by m horizontal strips, each contains approximately n/m segments. • Active list is created for each strip: L1, L2, …, Lm . • When a vertical segment is met, it is tested against intersection with segment in active lists of strips that overlap with the segment. v 4 3 2 1
end 4 3 middle part 2 end 1 Distributed sweeping • However, in the worst case, for all vertical segments all strips should be tested. • In the picture, segment v intersects strip 4, while no intersections are reported. • Solution: split each vertical segment into 3 parts: • One lies completely within some number of strips • Other two partially cover a strip. v
Distributed sweeping • Test intersection between the vertical segment and all segments in “middle” strips • Then recursively do it for two “end” strips. • Recursion terminates when all processing can be carried out in RAM • O(n logmn + k )
Rectangle intersection • The same idea is carried out to the case of rectangle intersection • Θ(n logmn + k ) bound is met again
Two-step processing: Spatial Join • Spatial predicates: • Overlaps • Contains • Adjacent • etc. • 2 steps: • Filter step • Refinement step
Additional Database Specifics • In databases: challenges with I/O (file access) are resolved using techniques discussed above. • Specific methods exist for: • Grid files (linear structures) • R-trees • Unindexed collections of objects
r g r g PART 2 Computer Graphics Applications • DB operations: windowing and clipping • Windowing(g,r) is a Boolean operator: to test if object g intersects rectangle r. • Clipping (g,r) computes part of g inside r
Computer Graphic Primitives • Windowing: • scan edges of g • test for intersection with r • checking vertices is not enough • O(n) • Clipping: • consider each edge of r as a half-plane • clip g against each of those • combine results • O(n)
Computer Graphic Primitives • Polygon partitioning (for large data sets) • Polygon triangulation • Intersections (polyline, polygon)
Polygon partitioning • Sort vertices of polygon P according to the X coordinate • Use sweep-line technique: vertical line L, for each vertex v compute the maximum vertical segment of L, internal to P and containing v. This is done by examining nearest edges above/below v. • The visibility segments define trapezoids. • Complexity O(n lg n) • Note: complex polygons can be triangulated, if trapezoids are further triangulated.
Polygon partitioning • The visibility segments define trapezoids (geometric object with 2 parallel edges)
Triangulation of a simple polygon • Triangulation involves finding diagonals within the polygon, i.e. segments vivj between vertices of P. • vi andvj are said to be visible to each other. • Each triangulation of a polygon has (n-3) diagonals and (n-2) triangles
Triangulation of a monotone polygon • Idea: monotone polygons can be linearly triangulated. Simple polygon can be partitioned into monotone polygons. • Monotone Simple
Triangulation of a monotone polygon • Idea: sweep-line, sort all vertices of P • If the angle between 3 previously processed points is convex create a triangle, remove point from list L. • If reflex angle add next point. • Partitioning a polygon into monotone polygons – similarly to trapezoidation, sweep-line by edges, find trapezoids, they represent monotone chains. O(n lg n)
Convex partitioning • Convex partitioning – partitioning into convex components, can minimize the number of components, done in O(n).
Geometric Relationships • Computing intersections: • Point in a polygon • Polyline intersection • Polygon intersection (general and convex)
q p Point in a polygon (simple) • Draw a half-ray from p • Count # of intersections with the boundary • If odd p is inside, even outside • O(n) algorithm
Polyline intersection • Given a set of line segments. Detect if any 2 segments intersect. • O(n2) – straightforward
sweep Polyline intersection • Plane-sweep O(n lg n): • The line meets the leftmost point of S: S inserted in L, two neighboring segments below and above S are tested for intersection. • The line L meets the rightmost point of S: S is deleted, segments above and below S are tested for intersection.
Polygon intersection • Two simple polygons P and Q. • Possible cases: • One edge of P intersects one edge of Q (use segment intersection test) • P is inside Q (point inside polygon) • Q is inside P (point inside polygon) • Otherwise, P and Q don’t intersect. • O(n lg n)
Convex polygon intersection • Convexity allows to devise a faster O(n) algorithm. • Idea: synchronized scan of edges of P and Q, so that all intersection points are eventually found and “inner” intersection boundary is known at each step. • Scanned edges are advanced if they “point” at each other.
Summary • Dealing with large data sets requires additional resources • Some methods such as below can be useful: • External sorting • Distributed sweeping • Two-step processing • Other applications (spatial map querying) require computer graphics primitives • Various intersection operations exist