1 / 12

Spatial (or N-Dimensional) Search in a Relational World

Spatial (or N-Dimensional) Search in a Relational World. Jim Gray, Microsoft Alex Szalay, Johns Hopkins U. y. x=c/a. x. ax + by = c. y=c/b. y. x. Equations Define Subspaces. For (x,y) above the line ax+by > c Reverse the space by -ax + -by > -c

zariel
Download Presentation

Spatial (or N-Dimensional) Search in a Relational World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial (or N-Dimensional) Searchin a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U.

  2. y x=c/a x ax + by = c y=c/b y x Equations Define Subspaces • For (x,y) above the lineax+by > c • Reverse the space by-ax + -by > -c • Intersect 3 half-spaces: a1x + b1y > c1a2x + b2y > c2a3x + b3y > c3

  3. Domain is Union of Convex Hulls Not a convex hull • Simple volumes are unions of convex hulls. • Higher order curves also work • Complex volumes have holes and their holes have holes. (that is harder). +

  4. Now in Relational Terms create table HalfSpace ( domainID int not null -- domain name foreign key references Domain(domainID), convexID int not null, -- grouping a set of ½ spaces halfSpaceID int identity(), -- a particular ½ space x float not null, -- the (a,b,..) parameters y float not null, -- defining the ½ space z float not null, c float not null, -- the constant (“c” above) primary key (domainID, convexID, halfSpaceID) (x,y,z) inside a convex if it is inside all lines of the convex (x,y,z) inside a convex if it is NOT OUTSIDE ANY line of the convex select convexID -- return the convex hulls from HalfSpace -- from the constraints where @x * x + @y * y + @x * z < l -- point outside the line? group by all convexID -- consider all the lines of a convexID having count(*) = 0 -- count outside == 0

  5. The Algebra is Simple (Boolean) @domainID = spDomainNew (@type varchar(16), @comment varchar(8000)) @convexID = spDomainNewConvex (@domainID int) @halfSpaceID = spDomainNewConvexConstraint (@domainID int, @convexID int, @x float, @y float, @z float, @l float) @returnCode = spDomainDrop(@domainID) select * from fDomainsContainPoint(@x float, @y float, @z float) Once constructed they can be manipulated with the Boolean operations. @domainID = spDomainOr (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000)) @domainID = spDomainAnd (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000)) @domainID = spDomainNot (@domainID1 int, @type varchar(16), @comment varchar(8000))

  6. What! No Bounding Box? • Bounding box limits search.A subset of the convex hulls. • If query runs at 3M halfspace/sec then no need for bounding box, unless you have more than 10,000 lines. • But, if you have a lot of half-spaces then bounding box is good.

  7. A Different Problem • Table-valued function find points near a point • Select * from fGetNearbyEq(ra,dec,r) • Use Hierarchical Triangular Mesh www.sdss.jhu.edu/htm/ • Space filling curve, bounding triangles… • Standard approach • 13 ms/call… So 70 objects/second. • Too slow, so precompute neighbors: Materialized view. • At 70 objects/sec it takes 6 months to compute a billion objects.

  8. Zone Based Spatial Join • Divide space into zones • Key points by Zone, offset(on the sphere this need wrap-around margin.) • Point search look in a few zones at a limited offset: ra ± r a bounding box that has 1-π/4 false positives • All inside the relational engine • Avoids “impedance mismatch” • Can “batch” all-all comparisons • 33x faster and parallel6 days, not 6 months! r ra-zoneMax x √(r2+(ra-zoneMax)2) cos(radians(zoneMax)) zoneMax Ra ± x

  9. In SQL: points near point select o1.objID -- find objects from zone o1 -- in the zoned table where o1.zoneID between -- where zone # floor((@dec-@r)/@zoneHeight) and -- overlaps the circle floor((@dec+@r)/@zoneHeight) and o1.ra between @ra - @r and @ra + @r -- quick filter on ra and o1.dec between @dec-@r and @dec+@r -- quick filter on dec and ( (sqrt( power(o1.cx-@cx,2)+power(o1.cy-@cy,2)+power(o1.cz-@cz,2)))) < @r -- careful filter on distance Bounding box Eliminates the ~ 21% = 1-π/4 False positives

  10. Do the following for @deltaZone in {-1, 0, 1} example ignores some spherical geometry details in full paper insert neighbors -- insert one zone's neighbors select o1.objID as objID, -- object pairs o2.objID as NeighborObjID, .. other fields elided from zone o1 join zone o2 -- force a nested loop on o1.zoneID-@deltaZone = o2.zoneID -- using zone number and ra and o2.ra between o1.ra - @r and o1.ra + @r -- elided 1/cos(@dec) where -- elided margin logic, see paper. and o2.dec between o1.dec-@r and o1.dec+@r -- quick filter on dec andsqrt(power(o1.x-o2.x,2)+power(o1.y-o2.y,2)+power(o1.z-o2.z,2)) < @r -- careful filter on distance

  11. Summary • SQL is a set oriented language • You can express constraints as rows • Then You • Can evaluate LOTS of predicates per second • Can do set algebra on the predicates. • Benefits from SQL parallelism • SQL == Prolog? 

  12. References • Representing Polygon Areas and Testing Point-in-Polygon Containment in a Relational Databasehttp://research.microsoft.com/~Gray/papers/Polygon.doc • A Purely Relational Way of Computing Neighbors on a Sphere, http://research.microsoft.com/~Gray/papers/Neighbors.doc

More Related