230 likes | 344 Views
Databases@MPA, access methods and plans. With contributions from JHU : Alex Szalay, Jan Vanderberg MPA: Jeremy Blaizot, Jarle Brinchmann, Guinevere Kauffmann, Anja von der Linden, Ben Panter, Guo Qi, Volker Springel, Vivienne Wild.
E N D
Databases@MPA, access methods and plans With contributions from JHU : Alex Szalay, Jan Vanderberg MPA: Jeremy Blaizot, Jarle Brinchmann, Guinevere Kauffmann, Anja von der Linden, Ben Panter, Guo Qi, Volker Springel, Vivienne Wild Databases @ MPA
Last year, Budapest • Presented milli-Millennium halo merger tree database • Requests: • More properties (lambda, ...) X • Galaxies V • Correlation with environment (galaxies in voids) V • Millennium • Why use databases ? Ask Alex. Databases @ MPA
Current status • milli-Millennium • Galaxies added: merger trees, links to their parent halos • Density field at various smoothings • Updated web site (demo) • Millennium subset • Subset (~2%, 10x milli-Mil) of halo and galaxy trees • Z=0 density field • Millennium • Halo trees in database (proprietary) • SAM galaxies under way (settle on model etc) • Density fields at all Z will be added: 1056964608 rows • Durham • milli_Millennium mirror (Postgres) • Durham halo tree and galaxy catalogues Databases @ MPA
Other databases • ROSAT: source catalogues and RASS photons (~100 million) • SDSS Peripherals • SDSS_MPA (Brinchman, Kauffmann, Tremonti et al) • MOPED (Ben Panter) • SDSS_PCA (Vivienne Wild et al) • GalICS (Jeremy Blaizot) • HEALPix all sky maps (Alex Szalay, Tony Banday) • wmap (3 year data soon !) • extinction maps • radio maps (Bonn) • ROSAT background (hopefully) Databases @ MPA
Access • Public: http://www.g-vo.org/mpasims • Local web apps to Millennium, BESTDR3 and peripherals: http://www.g-vo.org/sdssdr3/ • Public web browser queries limited (1min, 10000 rows) • Local databases + web apps less limited Databases @ MPA
Streaming • Query results temporarily buffered on server: memory • Streaming queries: faster, less limited (only timeout) • Access: • IDL (with Ben Panter) • wget –http-user=*** --http-password=*** -O localfile.csv http://www.g-vo.org/sdssdr3/DBQueryStream?SQL=select * from moped..agebin • GUI asking for username/password • Interprets CSV stream, turned into IDL components • TOPCAT Databases @ MPA
Plans: Millennium • Millennium: • Tune database • 750000000 halos • N x 1000000000 galaxies • 63 x 256^3 density field grid cells • More halo properties (shape, λ, ...) • More galaxy catalogues • different parameters • different algorithms (GalICS, Durham, ...) • Light cone mock catalogues • Galaxy spectra (+ PCA) • Links to SDSS mirror and peripherals • Proper metadata handling (ala SkyServer) • "SAM online„ • Move webapps to MPA • Use JHU services, install CAS jobs Databases @ MPA
Plans: SDSS mirror + peripherals • Make mirror web site public • Upgrade SDSS mirror to DR4 … • Stabilize, document, publish SDSS peripherals • Proper metadata handling • Links to Millennium • Personal databases: MyDB (ala SkyServer) • Add logos Databases @ MPA
Theory VO: spectra • Combine theory and observations • Example: query-by-example on theory spectra • Find similar spectra, from these the actual galaxy formation history • Chi-squared on all stored spectra ? Slow, requires storing all of them • Idea (not original, see HVO/JHU talks): use PCA to compress data Databases @ MPA
PCA • Need training sample of theory spectra to create eigenspectra • Project all spectra • Store PCA amplitudes in DB • Provide web service: • Upload (observational) spectrum (IVOA SSA/SED) • Project onto theory eigenspectra • Use amplitudes as parameters in query for “nearby” amplitudes • Return corresponding theory spectra • Return corresponding galaxy formation histories, or their halos, or their environment … Databases @ MPA
Issues • Dealing with errors, gaps: “gappy PCA” (Connolly & Szalay) • Normalization: • incoming spectrum in general from very different dataset, needs common normalization • Incoming set will have gaps, errors • Ad hoc normalization possible (and works quite good) • Indexing of complex multi-dimensional point set for quick nearest k neigbours search (Voronoi ? See Laszlo‘s work) Databases @ MPA
Normalized gappy PCA • Fit normalization factor at same time as PCA amplitudes. Model: • Minimize (over aiand N ) : Databases @ MPA
So far • Ran PCA on BC03 stochastic bursts (Vivienne) • On first GalICS+milli-Millennium spectra (Jeremy) • Projected SDSS spectra on both • Defined a PCA data model/schema • Stored PCAs in database • TOPCAT Databases @ MPA
PCA data model (RDB schema available) Databases @ MPA
milliMil-GalICS PC1 vs PC2 Voronoi tesselation Databases @ MPA
Issues for query-by-example • Overlap quite good, but good enough ? • GalICS spread less than SDSS. • BC03 comparable with SDSS, but different slope. • Systematics • Model: • physics very preliminary (see Blaizot & de Lucia?) • resolution effects • Preprocessing SDSS galaxies • Rebinning: different algorithms give comparable results • (slightly) wrong redshift ? Can be easily simulated • Projection algorithm: normalization does not affect outcome • Observational systematics: use virtual telescope (+virtual spectrograph) to test on the theory spectra.Easier to blow up simulation than to shrink observation cloud Databases @ MPA
Comments • Millennium database being used for science projects (Guo Qi) • SDSS peripherals used for science projects (see Vivienne’s talk, Ben Panter) • Use of mydb for debugging and testing (Jeremy) • Please give comments, feedback. Databases @ MPA