80 likes | 203 Views
László Dobos 1 , Tamás Budavári 2 , Alex Szalay 2 , István Csabai 1 1 Eötvös Loránd University, Hungary 2 Johns Hopkins University, Baltimore. Sky Query : A distributed query engine for astronomy. The multiwavelength sky. infrared (2MASS). visible (DSS). ultraviolet ( Galex ).
E N D
László Dobos1, Tamás Budavári2, Alex Szalay2, István Csabai1 1 Eötvös Loránd University, Hungary 2JohnsHopkins University, Baltimore SkyQuery:A distributedqueryengineforastronomy
The multiwavelengthsky infrared (2MASS) visible (DSS) ultraviolet (Galex)
Crossmatching • Astronomicalcatalogs • in RDBMS • o(100 million) objects • o(1TB – 10TB) DB size • Donebycoordinates • RA, Dec • Astrometricerror • Differentskycoverage • Differentwavelengthrange • Movingobjects etc.
Crossmatchingondemand • Crossmatchanynumber of catalogs • Allcombinationscannot be precomputed • Maybe catalogpairs? • Usercanspecify • List of catalogstomatch • Region of interes • Priorsfornon-coordinate-basedmatching
Problemdescription • Astronomers„script” whattheydo • multiplere-runs, tweakparameters etc. • huge web forms: no-no • Alldatain RDBMS • runcomputationinsidethedatabase • usemultiple servers and parallelize • must be transparentforusers • Problemdescriptionin SQL • functions and languageextensionstosupportastronomy • syntaxtoformulatethecoordinate-basedprobabilisticjoin • spatialconstraints: celestialregions
Sample SQL query SELECTs.objId, g.objID, t.objID, s.ra, s.dec, g.ra, g.dec, t.ra, t.dec, x.ra, x.decFROMSDSSDR7:GalaxiesAS sCROSS JOIN Galex:GalaxiesAS g CROSS JOIN TwoMASS:ExtendedSourcesAS tXMATCH BAYESIAN AS xMUST s ONPOINT(s.cx, s.cy, s.cz), 0.1MUST g ONPOINT(g.ra, g.dec), 0.2 MAY t ONPOINT(t.ra, t.dec), 0.5HAVING LIMIT 1e3 REGIONCIRCLE J2000 165.7, 0.3, 60 Standard SQL Probabilisticcrossmatch Spatialconstraint
Zonealgorithms • Pure SQL:Can leverage from query optimizer of SQL Server • Divide sphere into zones • ZoneID: very simple hash on declination • Indexes built on ZoneID and right ascension help very quick pre-filtering of match candidates • very well parallelized on multi-core machines • [Gray, Szalay & Nieto-Santisteban 2006, The Zones Algorithm for Finding Points-Near-a-Point or Cross-Matching Spatial Datasets]