1 / 14

Recent spatial work by Jim Gray and Alex Szalay

Recent spatial work by Jim Gray and Alex Szalay. Bob Mann. Recent work on spatial querying by Jim Gray & Alex Szalay. SkyServer HTM approach works, but precomputing neighbours table take long time

rhea
Download Presentation

Recent spatial work by Jim Gray and Alex Szalay

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recent spatial work by Jim Gray and Alex Szalay Bob Mann

  2. Recent work on spatial querying by Jim Gray & Alex Szalay • SkyServer HTM approach works, but precomputing neighbours table take long time • Partially solved by new version of HTM library (soon): >10 times faster than current version • C# interface to next version of SQL Server • Want a purely relational approach • vendor-neutral • no “impedance mismatch” with external calls • So, Gray and Szalay working on more geometrical approach

  3. Relational method for computing neighbours on sphere • Problem: • fGetNearbyObjEq() function can compute neighbours of ~70 objects/sec (GHz PC) • Scaling to full SDSS: 2 months of computation! • One solution: • Don’t use fGetNearbyObjEq/HTM index for computing neighbours • Subdivide sphere into Dec zones, then cut by RA [Following slides stolen/adapted from Jim Gray]

  4. Zone Based Spatial Join • Divide space into zones • Key points by Zone (by Dec), offset (RA) (on the sphere this need wrap-around margin.) • Point search look in a few zones at a limited offset: ra ± r a bounding box that has 1-π/4 false positives • All inside the relational engine • Avoids “impedance mismatch” • Can “batch” all-all comparisons • 32x faster and parallel r ra-zoneMax x √(r2+(ra-zoneMax)2) cos(radians(zoneMax)) zoneMax Ra ± x

  5. In SQL: points near point create table zone (zone int, objID bigint, ra float, dec float, x float, y float, z float,primary key (zone, ra, objID)) insert into zone select floor((dec+90) /zoneHeight), ra, dec, cx, cy, cz from PhotoObj select o1.objID -- find objects from zone o1 -- in the zoned table where o1.zoneID between -- where zone # floor((@dec-@r)/@zoneHeight) and -- overlaps the circle ceiling((@dec+@r)/@zoneHeight) and o1.ra between @ra - @r and @ra + @r -- quick filter on ra and o1.dec between @dec-@r and @dec+@r -- quick filter on dec and ( (sqrt( power(o1.cx-@cx,2)+power(o1.cy-@cy,2)+power(o1.cz-@cz,2)))) < @r -- careful filter on distance

  6. Some details to add • Correct for “compression” of RA by cos(dec) and avoid division by zero at poles • o.ra between (ra-R)/ (cos(dec)+) and (ra+R)/(cos(dec)+) • Include wrap around in RA (RA=0 & 360º) • Insert margins into zone table (duplicate entries) • Result: 32x speed up over use of fGetNearbyObjEq • More by parallelisation Neighbour precomputation looks OK for WSA/VSA!

  7. “Relational approach to testing point-in-polygon containment” • Think geometrically • Point within circular radius on sphere translated to half-space cut in 3D Cartesian space • Combine spatial cuts as half-space constraints • Store geom. constraint info in DB & query [Graphic from Alex Szalay]

  8. y x=c/a x ax + by = c y=c/b y x The Idea:Equations Define Subspaces • For (x,y) above the lineax+by > c • Reverse the space by-ax + -by > -c • Intersect 3 half-spaces: a1x + b1y > c1a2x + b2y > c2a3x + b3y > c3

  9. Some terminology • A half-space, H, of the N-dimensional space S can be expressed as H = {x in S | f(x) > 0} • The intersection of a set of half-spaces {Hi} is a convex hull of points, C C = {x in S | x in Hi for all i} • A domain, D, is the union of a set of convex hulls {Ci} D = {x in Ci for all i}

  10. y y x x The Idea:Equations Define Subspaces a1x + b1y > c1a2x + b2y > c2a3x + b3y > c3 1 2 1 3 select count(*) from convex where a*@x + b*@y < c 2 2 1 2 1 2 select count(*) from convex where a*@x + b*@y > c 0 1 1 2

  11. Domain is Union of Convex Hulls Not a convex hull • Simple volumes are unions of convex hulls. • Higher order curves also work • Complex volumes have holes and their holes have holes. (that is harder). +

  12. Now in Relational Terms create table HalfSpace ( domainID int not null -- domain name foreign key references Domain(domainID), convexID int not null, -- grouping a set of ½ spaces halfSpaceID int identity(), -- a particular ½ space a float not null, -- the (a,b,..) parameters b float not null, -- defining the ½ space c float not null, -- the constraint (“c” above) primary key (domainID, convexID, halfSpaceID) (x,y) inside a convex if it is inside all lines of the convex (x,y) inside a convex if it is NOT OUTSIDE ANY line of the convex Convexes containing point (@x,@y): select convexID -- return the convex hulls from HalfSpace -- from the constraints where (@x * a + @y * b) < c -- point outside the line? group by all convexID -- insist no line of convex having count(*) = 0 -- is outside (count outside == 0)

  13. All Domains Containing this Point • The group byis supported by the domain/convex index, so it’s a sequential scan (pre-sorted!). select distinct domainID -- return domains from HalfSpace -- from constraints where (@x * a + @y * b) < c -- point outside group by all domainID, convexID -– never happens having count(*) = 0 -- count outside == 0

  14. The Algebra is Simple (Boolean) @domainID = spDomainNew (@type varchar(16), @comment varchar(8000)) @convexID = spDomainNewConvex (@domainID int) @halfSpaceID = spDomainNewConvexConstraint (@domainID int, @convexID int, @a float, @b float, @c float) @returnCode = spDomainDrop(@domainID) select * from fDomainsContainPoint(@x float, @y float) Once constructed they can be manipulated with the Boolean operations. @domainID = spDomainOr (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000)) @domainID = spDomainAnd (@domainID1 int, @domainID2 int, @type varchar(16), @comment varchar(8000)) @domainID = spDomainNot (@domainID1 int, @type varchar(16), @comment varchar(8000)) Very different approach to spatial querying – “constraint DB” Reference: Representing Polygon Areas and Testing Point-in-Polygon Containment in a Relational Databasehttp://research.microsoft.com/~Gray/papers/Polygon.doc

More Related