90 likes | 192 Views
Features of the SDSS. Special 2.5m telescope, at Apache Point, NM 3 degree field of view Zero distortion focal plane Two surveys in one Photometric survey in 5 bands - 200 million objects Spectroscopic redshift survey - 1 million distances Automated data reduction
E N D
Features of the SDSS Special 2.5m telescope, at Apache Point, NM 3 degree field of view Zero distortion focal plane Two surveys in one Photometric survey in 5 bands - 200 million objects Spectroscopic redshift survey - 1 million distances Automated data reduction Over 120 man-years of development (Fermilab + collaboration scientists) Very high data volume Expect over 40 TB of raw data About 2 TB processed catalogs Data made available to the public
SDSS Data Products Object catalog 500 GBparameters of >108 objects Redshift Catalog 1 GBparameters of 106 objects Atlas Images 1500 GB5 color cutouts of >108 objects Spectra 60 GB in a one-dimensional form Derived Catalogs 20 GB clusters QSO absorption lines 4x4 Pixel All-Sky Map 60 GB heavily compressed Corrected Frames 15 TB All raw data (40TB) saved at Fermilab
Accessing the Data • Few fixed access patterns • one cannot build indices for all possible queries • worst case scenario is linear scan of the whole table • Increasingly large differences between • Random access • Sequential I/O • Often much faster to scan than to seek • Good layout of data => more sequential I/O • Geometric indexing – partitioning in storage • Using Objectivity/DB • Ported to MS SQL Server (w. Jim Gray)
SDSS in GriPhyN • Two Tier2 Nodes (FNAL+JHU) • testing framework on real data in different scenarios • FNAL node • massive reprocessing of images • full regeneration of catalogs from the images (on disk) • gravitational lensing, finer morphological classification • Image coaddition, differencing • JHU node • catalog calculations, integrated with database • tasks require lots of data, can be run in parallel • various statistical calculations, likelihood analyses • power spectra, correlation functions, Monte-Carlo • Public access • creating virtual data for NVO services (implemented later)
The SDSS Southern Survey • Scanning a single stripe on the sky >30 times over • Coaddition => extra depth • Differencing => time dimension • Multiple ways to combine the stripes • Rerun the pipelines with custom parameters • Build a new object catalog • Perform particular science analysis (lensing map) • On the right timescale to try GriPhyN framework
Large Scale Statistical Analysis • Galaxy distribution has non-trivial clustering patterns • Reflects conditions in the early universe • Spatial statistical tools to be run on object catalog, applying many different cuts to the data • Spatial power spectrum • Correlation functions • These algorithms are typically N2 or N3 with the number objects!! • Some of the analyses will partition well (likelihood), others will not (pair counts)
Trends in Astronomy Future dominated by detector improvements • Moore’s Law growth in CCD capabilities • Gigapixel arrays on the horizon • Improvements in computing and storage will track growth in data volume • Investment in software is critical, and growing Total area of 3m+ telescopes in the world in m2, total number of CCD pixels in Megapix, as a function of time. Growth over 25 years is a factor of 30 in glass, 3000 in pixels.
VO- The challenges • Large number of new surveys • multi-TB in size, 100 million objects or more • individual archives planned, or under way • Multi-wavelength view of the sky • more than 13 wavelength coverage in 5 years • Size of the archived data 40,000 square degrees is 2 Trillion pixels • One band 4 Terabytes • Multi-wavelength 10-100 Terabytes • Time dimension 10 Petabytes • Current techniques inadequate • Scalable hardware/networking requirements • Transition to the new astronomy MACHO 2MASS DENIS SDSS DPOSS GSC-II VISTA COBE MAP NVSS FIRST GALEX ROSAT OGLE, ...