300 likes | 312 Views
Learn advanced I/O algorithms and geometric operations in graphical querying with practical examples. Explore external sorting, distributed sweeping, spatial join, and more for efficient query processing in databases. Discover specific methods for resolving challenges in computer graphics applications.
E N D
CPSC 461 Query Processing in Databases Dr. M. Gavrilova
Overview • Introduction • I/O algorithms for large databases • Complex geometric operations in graphical querying • Applications
Introduction • Geometric algorithms studied before dealt with RAM • In databases, a problem of accessing “pages” of memory stored on disk is encountered. • We will see how traditional algorithm design techniques can be useful.
Example • 4 pages of memory, 10 items in each • To list all sequentially, 4 disks accesses is required • To randomly list items – up to 40 disk accesses is require if only 1 page is loaded in the memory once –too expensive!
PART 1 Techniques for large data sets • External sorting • Distributed sweeping • Two-step processing
“External” sorting problem • n pages in dataset on disk • m pages of memory, m < n
“Divide and conquer” strategy • Step 1. Sorted “runs” of size m are created in memory, then written to disk. Used internal sorting algorithms. • Step 2. Load some number of first records from each run into memory, merge them in the sorted order. Once a block is sorted, write it back to disk. • Complexity: O(n logmn )
Distributed sweeping Segment intersection problem, orthogonal segments. In RAM – sweep-line algorithm, O(n logn+k ) n – number of segments, k – number of intersections. In DB – O(n logmn + k ) algorithm, m – number of pages in RAM. Range query v Sweep-line
Distributed sweeping • Idea: split space by m horizontal strips, each contains approximately n/m segments. • Active list is created for each strip: L1, L2, …, Lm . • When a vertical segment is met, it is tested against intersection with segment in active lists of strips that overlap with the segment. v 4 3 2 1
end 4 3 middle part 2 end 1 Distributed sweeping • However, in the worst case, for all vertical segments all strips should be tested. • In the picture, segment v intersects strip 4, while no intersections are reported. • Solution: split each vertical segment into 3 parts: • One lies completely within some number of strips • Other two partially cover a strip. v
Distributed sweeping • Test intersection between the vertical segment and all segments in “middle” strips • Then recursively do it for two “end” strips. • Recursion terminates when all processing can be carried out in RAM • O(n logmn + k )
Rectangle intersection • The same idea is carried out to the case of rectangle intersection • Θ(n logmn + k ) bound is met again
Two-step processing: Spatial Join • Spatial predicates: • Overlaps • Contains • Adjacent • etc. • 2 steps: • Filter step • Refinement step
Additional Database Specifics • In databases: challenges with I/O (file access) are resolved using techniques discussed above. • Specific methods exist for: • Grid files (linear structures) • R-trees • Unindexed collections of objects
r g r g PART 2 Computer Graphics Applications • DB operations: windowing and clipping • Windowing(g,r) is a Boolean operator: to test if object g intersects rectangle r. • Clipping (g,r) computes part of g inside r
Computer Graphic Primitives • Windowing: • scan edges of g • test for intersection with r • checking vertices is not enough • O(n) • Clipping: • consider each edge of r as a half-plane • clip g against each of those • combine results • O(n)
Computer Graphic Primitives • Polygon partitioning (for large data sets) • Polygon triangulation • Intersections (polyline, polygon)
Polygon partitioning • Sort vertices of polygon P according to the X coordinate • Use sweep-line technique: vertical line L, for each vertex v compute the maximum vertical segment of L, internal to P and containing v. This is done by examining nearest edges above/below v. • The visibility segments define trapezoids. • Complexity O(n lg n) • Note: complex polygons can be triangulated, if trapezoids are further triangulated.
Polygon partitioning • The visibility segments define trapezoids (geometric object with 2 parallel edges)
Triangulation of a simple polygon • Triangulation involves finding diagonals within the polygon, i.e. segments vivj between vertices of P. • vi andvj are said to be visible to each other. • Each triangulation of a polygon has (n-3) diagonals and (n-2) triangles
Triangulation of a monotone polygon • Idea: monotone polygons can be linearly triangulated. Simple polygon can be partitioned into monotone polygons. • Monotone Simple
Triangulation of a monotone polygon • Idea: sweep-line, sort all vertices of P • If the angle between 3 previously processed points is convex create a triangle, remove point from list L. • If reflex angle add next point. • Partitioning a polygon into monotone polygons – similarly to trapezoidation, sweep-line by edges, find trapezoids, they represent monotone chains. O(n lg n)
Convex partitioning • Convex partitioning – partitioning into convex components, can minimize the number of components, done in O(n).
Geometric Relationships • Computing intersections: • Point in a polygon • Polyline intersection • Polygon intersection (general and convex)
q p Point in a polygon (simple) • Draw a half-ray from p • Count # of intersections with the boundary • If odd p is inside, even outside • O(n) algorithm
Polyline intersection • Given a set of line segments. Detect if any 2 segments intersect. • O(n2) – straightforward
sweep Polyline intersection • Plane-sweep O(n lg n): • The line meets the leftmost point of S: S inserted in L, two neighboring segments below and above S are tested for intersection. • The line L meets the rightmost point of S: S is deleted, segments above and below S are tested for intersection.
Polygon intersection • Two simple polygons P and Q. • Possible cases: • One edge of P intersects one edge of Q (use segment intersection test) • P is inside Q (point inside polygon) • Q is inside P (point inside polygon) • Otherwise, P and Q don’t intersect. • O(n lg n)
Convex polygon intersection • Convexity allows to devise a faster O(n) algorithm. • Idea: synchronized scan of edges of P and Q, so that all intersection points are eventually found and “inner” intersection boundary is known at each step. • Scanned edges are advanced if they “point” at each other.
Summary • Dealing with large data sets requires additional resources • Some methods such as below can be useful: • External sorting • Distributed sweeping • Two-step processing • Other applications (spatial map querying) require computer graphics primitives • Various intersection operations exist