140 likes | 269 Views
Recent spatial work by Jim Gray and Alex Szalay. Bob Mann. Recent work on spatial querying by Jim Gray & Alex Szalay. SkyServer HTM approach works, but precomputing neighbours table take long time
E N D
Recent work on spatial querying by Jim Gray & Alex Szalay • SkyServer HTM approach works, but precomputing neighbours table take long time • Partially solved by new version of HTM library (soon): >10 times faster than current version • C# interface to next version of SQL Server • Want a purely relational approach • vendor-neutral • no “impedance mismatch” with external calls • So, Gray and Szalay working on more geometrical approach
Relational method for computing neighbours on sphere • Problem: • fGetNearbyObjEq() function can compute neighbours of ~70 objects/sec (GHz PC) • Scaling to full SDSS: 2 months of computation! • One solution: • Don’t use fGetNearbyObjEq/HTM index for computing neighbours • Subdivide sphere into Dec zones, then cut by RA [Following slides stolen/adapted from Jim Gray]
Zone Based Spatial Join • Divide space into zones • Key points by Zone (by Dec), offset (RA) (on the sphere this need wrap-around margin.) • Point search look in a few zones at a limited offset: ra ± r a bounding box that has 1-π/4 false positives • All inside the relational engine • Avoids “impedance mismatch” • Can “batch” all-all comparisons • 32x faster and parallel r ra-zoneMax x √(r2+(ra-zoneMax)2) cos(radians(zoneMax)) zoneMax Ra ± x
In SQL: points near point create table zone (zone int, objID bigint, ra float, dec float, x float, y float, z float,primary key (zone, ra, objID)) insert into zone select floor((dec+90) /zoneHeight), ra, dec, cx, cy, cz from PhotoObj select o1.objID -- find objects from zone o1 -- in the zoned table where o1.zoneID between -- where zone # floor((@dec-@r)/@zoneHeight) and -- overlaps the circle ceiling((@dec+@r)/@zoneHeight) and o1.ra between @ra - @r and @ra + @r -- quick filter on ra and o1.dec between @dec-@r and @dec+@r -- quick filter on dec and ( (sqrt( power(o1.cx-@cx,2)+power(o1.cy-@cy,2)+power(o1.cz-@cz,2)))) < @r -- careful filter on distance
Some details to add • Correct for “compression” of RA by cos(dec) and avoid division by zero at poles • o.ra between (ra-R)/ (cos(dec)+) and (ra+R)/(cos(dec)+) • Include wrap around in RA (RA=0 & 360º) • Insert margins into zone table (duplicate entries) • Result: 32x speed up over use of fGetNearbyObjEq • More by parallelisation Neighbour precomputation looks OK for WSA/VSA!
“Relational approach to testing point-in-polygon containment” • Think geometrically • Point within circular radius on sphere translated to half-space cut in 3D Cartesian space • Combine spatial cuts as half-space constraints • Store geom. constraint info in DB & query [Graphic from Alex Szalay]
y x=c/a x ax + by = c y=c/b y x The Idea:Equations Define Subspaces • For (x,y) above the lineax+by > c • Reverse the space by-ax + -by > -c • Intersect 3 half-spaces: a1x + b1y > c1a2x + b2y > c2a3x + b3y > c3
Some terminology • A half-space, H, of the N-dimensional space S can be expressed as H = {x in S | f(x) > 0} • The intersection of a set of half-spaces {Hi} is a convex hull of points, C C = {x in S | x in Hi for all i} • A domain, D, is the union of a set of convex hulls {Ci} D = {x in Ci for all i}
y y x x The Idea:Equations Define Subspaces a1x + b1y > c1a2x + b2y > c2a3x + b3y > c3 1 2 1 3 select count(*) from convex where a*@x + b*@y < c 2 2 1 2 1 2 select count(*) from convex where a*@x + b*@y > c 0 1 1 2
Domain is Union of Convex Hulls Not a convex hull • Simple volumes are unions of convex hulls. • Higher order curves also work • Complex volumes have holes and their holes have holes. (that is harder). +
Now in Relational Terms create table HalfSpace ( domainID int not null -- domain name foreign key references Domain(domainID), convexID int not null, -- grouping a set of ½ spaces halfSpaceID int identity(), -- a particular ½ space a float not null, -- the (a,b,..) parameters b float not null, -- defining the ½ space c float not null, -- the constraint (“c” above) primary key (domainID, convexID, halfSpaceID) (x,y) inside a convex if it is inside all lines of the convex (x,y) inside a convex if it is NOT OUTSIDE ANY line of the convex Convexes containing point (@x,@y): select convexID -- return the convex hulls from HalfSpace -- from the constraints where (@x * a + @y * b) < c -- point outside the line? group by all convexID -- insist no line of convex having count(*) = 0 -- is outside (count outside == 0)
All Domains Containing this Point • The group byis supported by the domain/convex index, so it’s a sequential scan (pre-sorted!). select distinct domainID -- return domains from HalfSpace -- from constraints where (@x * a + @y * b) < c -- point outside group by all domainID, convexID -– never happens having count(*) = 0 -- count outside == 0
The Algebra is Simple (Boolean) @domainID = spDomainNew (@type varchar(16), @comment varchar(8000)) @convexID = spDomainNewConvex (@domainID int) @halfSpaceID = spDomainNewConvexConstraint (@domainID int, @convexID int, @a float, @b float, @c float) @returnCode = spDomainDrop(@domainID) select * from fDomainsContainPoint(@x float, @y float) Once constructed they can be manipulated with the Boolean operations. @domainID = spDomainOr (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000)) @domainID = spDomainAnd (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000)) @domainID = spDomainNot (@domainID1 int, @type varchar(16), @comment varchar(8000)) Very different approach to spatial querying – “constraint DB” Reference: Representing Polygon Areas and Testing Point-in-Polygon Containment in a Relational Databasehttp://research.microsoft.com/~Gray/papers/Polygon.doc