1 / 20

Introduction to Sky Survey Problems

Introduction to Sky Survey Problems. Bob Mann. Introduction to sky survey database problems. Astronomical data Astronomical databases The Virtual Observatory – concept & status Large sky survey databases Spatial indexing in astronomical databases Case Study: SDSS & SkyServer.

lexiss
Download Presentation

Introduction to Sky Survey Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Sky Survey Problems Bob Mann

  2. Introduction to sky survey database problems • Astronomical data • Astronomical databases • The Virtual Observatory – concept & status • Large sky survey databases • Spatial indexing in astronomical databases • Case Study: SDSS & SkyServer

  3. ROSAT ~keV DSS Optical IRAS 25m 2MASS 2m GB 6cm WENSS 92cm NVSS 20cm IRAS 100m Observational Astronomy • Electromagnetic spectrum

  4. Astronomical data – in original form • Optical • Image: array of pixel values • X-ray • Event list: positions, arrival times, energies of all detected photons • Radio • Interferometric visibilities: sparse Fourier transform of a region of the sky Very different types of data

  5. Astronomical data – in final form • Most research done using catalogue data • i.e. tables of attributes of detected sources – mainly discrete sources (stars, galaxies, etc) • Data compression • Catalogue - few% of image data volume • Amenable to representation in relational DB • Natural indexing by location in sky

  6. Astronomical Databases • Sky survey archives • Homogeneous data, standard reduction pipeline • “Science Archive” – do science on DB • Telescope archives • Semi-indexed collections of raw data files from all observations taken – heterogeneous • Download data for reduction and analysis • Specialist data centres – collections of catalogues • Bibliographic databases– scans of major journals

  7. The Virtual Observatory • Concept: • Interoperable federation of all the world’s significant astronomical databases • Facilitate multi-wavelength astronomy • Status: • Several projects underway – AstroGrid in UK • 5+ years’ work to create a fully working VO The VO sets the context for the design of new sky survey databases

  8. AstroGrid: www.astrogrid.org • Consortium: • Edinburgh, Leicester, Cambridge, RAL, MSSL, Jodrell Bank, Queens Belfast • 3 year (~£4M) project: • 1 yr Phase A Study – finished end of 2002 • 2 yr Phase B Implementation – to end 2004 • Web (later Grid) service framework; in Java • Currently building web services, portals, etc - researching OGSA and OGSA-DAI

  9. Large sky survey databases • Major science driver for AstroGrid – and VO • New science – mining multi-wavelength data • Largest are optical/near-infrared sky surveys • Largest of these hosted in Edinburgh: • current - SuperCOSMOS, SDSS (mirror) • future - WFCAM, VISTA • Each yield 1-10TB of catalogue data in RDBMS

  10. Spatial queries in astronomy • Two important types: • Select entries (with predicate) in area of sky • Match entries (esp. between two tables) • Second is special case of first • i.e. both boil down to “point-within-distance-of-point” • but distances in two cases can be very different • Advantage in using a hierarchical spatial indexing scheme • Perform spatial query at appropriate granularity

  11. Spatial Indexingin Astronomy • The Celestial Sphere • Many coordinate systems • Most common is the equatorial system, with Right Ascension and Declination as analogues of Longitude & Latitude

  12. Spatial indexing in astronomical databases • Basic DBMS indexes are 1-D – e.g. B-trees • Some DBMSs support general 2-D indexing • Usually using R-trees (or variants) – rectangles: astronomical experiments not too successful: [Clive] • Some DBMSs have native spatial indexing • Little knowledge of this in astronomy - want to know more But • The Celestial Sphere is a sphere(!) • Many geographical spatial DBs use planar projections • So, astronomers have felt the need to develop spatial indexing prescriptions of their own

  13. Hierarchical Triangular Mesh - HTM • Developed by Sloan survey archive team at JHU • Start with projection of octahedron on sphere and subdivide triangles at their midpoints • Generate unique pixel ID code based on position in the sky and level in hierarchy – can index that with B-tree

  14. Hierarchical Equal Area Iso-Latitude Pixelisation (HEALPix) • Developed by Kris Gorski (now JPL/Caltech) • Start with division of sphere into twelve equal area curvilinear quadrilaterals, then divide each into four • Like HTM, produces a pixel code on which a B-tree index can be made (Ian – HEALPix in Oracle?)

  15. Sky survey DB case study:SkyServer for SDSS • Sloan Digital Sky Survey (SDSS): • first of new generation of sky surveys • US-led team, dedicated telescope & camera • Image half of northern sky in 5 optical bands • Then obtain optical spectra for 1,000,000 galaxies • Estimated ~1TB of catalogue data

  16. SDSS Archive • First of new generation of sky survey archives • Represents the state-of-the-art in sky survey databases • Developed by Alex Szalay’s team at Johns Hopkins • Project started in earnest in about 1996 • OODBMSs seen as the coming thing • SDSS chose Objectivity/DB for their archive: ~15 staff-years of effort later, they’d rewritten much of the DBMS themselves…and then jumped ship and started using MS SQL Server! - SkyServer (in collaboration with Jim Gray, MS Research)

  17. SkyServer design considerations • Power & flexibility to pose arbitrary queries • Simple – astronomers ignorant of SQL! • Hide messy spherical trigonometry • Distance on sphere between (a1,d1) and (a2,d2) is given in SQL by 2.0*asin(sqrt(square(sin(0.5*(radians(d1-d2)))) + cos(radians(d1))*cos(radians(d2))* square(sin(0.5*(radians(a1-a2))))) • Don’t want users typing this • Don’t really want DBMS to evaluate expressions like this often

  18. SkyServer spatial queries • Simple table-valued functions exposed to user: • E.g. select count(*) fromfGetNearbyObjEq(a,d,radius) (a,d)=(Right Ascension, Declination) • Functions call SQL Server Extended Stored Procedure • HTM index manipulation routines, implemented in a Dynamically Linked Library (DLL) • DLL generated from HTM package in C++

  19. Lessons from HTM implementation in SkyServer • SQL is not great for spherical trigonometry • Messy to write, slow to compute • Have to define stored procedures/functions • Expose a clean interface to users • Let them pose queries the way they want to • Replace trig operations by integer arithmetic • Library of HTM index operations underneath • Precompute tables of neighbouring objects • Far fewer spatial match operations at query time

  20. Problems with this approach • How easy to develop stored procedures, etc? • Needs detailed knowledge of DBMS • Extended Stored Procedure calls slow • How well will query optimiser use HTM? • …less well than built-in spatial index?… • …but that might be poorly suited to astronomical applications… • How easy to implement all this in DBMSs other than SQL Server? But this works reasonably well in practice!

More Related