120 likes | 290 Views
Spatial (or N-Dimensional) Search in a Relational World. Jim Gray, Microsoft Alex Szalay, Johns Hopkins U. y. x=c/a. x. ax + by = c. y=c/b. y. x. Equations Define Subspaces. For (x,y) above the line ax+by > c Reverse the space by -ax + -by > -c
E N D
Spatial (or N-Dimensional) Searchin a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U.
y x=c/a x ax + by = c y=c/b y x Equations Define Subspaces • For (x,y) above the lineax+by > c • Reverse the space by-ax + -by > -c • Intersect 3 half-spaces: a1x + b1y > c1a2x + b2y > c2a3x + b3y > c3
Domain is Union of Convex Hulls Not a convex hull • Simple volumes are unions of convex hulls. • Higher order curves also work • Complex volumes have holes and their holes have holes. (that is harder). +
Now in Relational Terms create table HalfSpace ( domainID int not null -- domain name foreign key references Domain(domainID), convexID int not null, -- grouping a set of ½ spaces halfSpaceID int identity(), -- a particular ½ space x float not null, -- the (a,b,..) parameters y float not null, -- defining the ½ space z float not null, c float not null, -- the constant (“c” above) primary key (domainID, convexID, halfSpaceID) (x,y,z) inside a convex if it is inside all lines of the convex (x,y,z) inside a convex if it is NOT OUTSIDE ANY line of the convex select convexID -- return the convex hulls from HalfSpace -- from the constraints where @x * x + @y * y + @x * z < l -- point outside the line? group by all convexID -- consider all the lines of a convexID having count(*) = 0 -- count outside == 0
The Algebra is Simple (Boolean) @domainID = spDomainNew (@type varchar(16), @comment varchar(8000)) @convexID = spDomainNewConvex (@domainID int) @halfSpaceID = spDomainNewConvexConstraint (@domainID int, @convexID int, @x float, @y float, @z float, @l float) @returnCode = spDomainDrop(@domainID) select * from fDomainsContainPoint(@x float, @y float, @z float) Once constructed they can be manipulated with the Boolean operations. @domainID = spDomainOr (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000)) @domainID = spDomainAnd (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000)) @domainID = spDomainNot (@domainID1 int, @type varchar(16), @comment varchar(8000))
What! No Bounding Box? • Bounding box limits search.A subset of the convex hulls. • If query runs at 3M halfspace/sec then no need for bounding box, unless you have more than 10,000 lines. • But, if you have a lot of half-spaces then bounding box is good.
A Different Problem • Table-valued function find points near a point • Select * from fGetNearbyEq(ra,dec,r) • Use Hierarchical Triangular Mesh www.sdss.jhu.edu/htm/ • Space filling curve, bounding triangles… • Standard approach • 13 ms/call… So 70 objects/second. • Too slow, so precompute neighbors: Materialized view. • At 70 objects/sec it takes 6 months to compute a billion objects.
Zone Based Spatial Join • Divide space into zones • Key points by Zone, offset(on the sphere this need wrap-around margin.) • Point search look in a few zones at a limited offset: ra ± r a bounding box that has 1-π/4 false positives • All inside the relational engine • Avoids “impedance mismatch” • Can “batch” all-all comparisons • 33x faster and parallel6 days, not 6 months! r ra-zoneMax x √(r2+(ra-zoneMax)2) cos(radians(zoneMax)) zoneMax Ra ± x
In SQL: points near point select o1.objID -- find objects from zone o1 -- in the zoned table where o1.zoneID between -- where zone # floor((@dec-@r)/@zoneHeight) and -- overlaps the circle floor((@dec+@r)/@zoneHeight) and o1.ra between @ra - @r and @ra + @r -- quick filter on ra and o1.dec between @dec-@r and @dec+@r -- quick filter on dec and ( (sqrt( power(o1.cx-@cx,2)+power(o1.cy-@cy,2)+power(o1.cz-@cz,2)))) < @r -- careful filter on distance Bounding box Eliminates the ~ 21% = 1-π/4 False positives
Do the following for @deltaZone in {-1, 0, 1} example ignores some spherical geometry details in full paper insert neighbors -- insert one zone's neighbors select o1.objID as objID, -- object pairs o2.objID as NeighborObjID, .. other fields elided from zone o1 join zone o2 -- force a nested loop on o1.zoneID-@deltaZone = o2.zoneID -- using zone number and ra and o2.ra between o1.ra - @r and o1.ra + @r -- elided 1/cos(@dec) where -- elided margin logic, see paper. and o2.dec between o1.dec-@r and o1.dec+@r -- quick filter on dec andsqrt(power(o1.x-o2.x,2)+power(o1.y-o2.y,2)+power(o1.z-o2.z,2)) < @r -- careful filter on distance
Summary • SQL is a set oriented language • You can express constraints as rows • Then You • Can evaluate LOTS of predicates per second • Can do set algebra on the predicates. • Benefits from SQL parallelism • SQL == Prolog?
References • Representing Polygon Areas and Testing Point-in-Polygon Containment in a Relational Databasehttp://research.microsoft.com/~Gray/papers/Polygon.doc • A Purely Relational Way of Computing Neighbors on a Sphere, http://research.microsoft.com/~Gray/papers/Neighbors.doc