1 / 22

Databases@MPA, access methods and plans

Databases@MPA, access methods and plans. With contributions from JHU : Alex Szalay, Jan Vanderberg MPA: Jeremy Blaizot, Jarle Brinchmann, Guinevere Kauffmann, Anja von der Linden, Ben Panter, Guo Qi, Volker Springel, Vivienne Wild.

von
Download Presentation

Databases@MPA, access methods and plans

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databases@MPA, access methods and plans With contributions from JHU : Alex Szalay, Jan Vanderberg MPA: Jeremy Blaizot, Jarle Brinchmann, Guinevere Kauffmann, Anja von der Linden, Ben Panter, Guo Qi, Volker Springel, Vivienne Wild Databases @ MPA

  2. Last year, Budapest • Presented milli-Millennium halo merger tree database • Requests: • More properties (lambda, ...) X • Galaxies V • Correlation with environment (galaxies in voids) V • Millennium • Why use databases ? Ask Alex. Databases @ MPA

  3. Current status • milli-Millennium • Galaxies added: merger trees, links to their parent halos • Density field at various smoothings • Updated web site (demo) • Millennium subset • Subset (~2%, 10x milli-Mil) of halo and galaxy trees • Z=0 density field • Millennium • Halo trees in database (proprietary) • SAM galaxies under way (settle on model etc) • Density fields at all Z will be added: 1056964608 rows • Durham • milli_Millennium mirror (Postgres) • Durham halo tree and galaxy catalogues Databases @ MPA

  4. Other databases • ROSAT: source catalogues and RASS photons (~100 million) • SDSS Peripherals • SDSS_MPA (Brinchman, Kauffmann, Tremonti et al) • MOPED (Ben Panter) • SDSS_PCA (Vivienne Wild et al) • GalICS (Jeremy Blaizot) • HEALPix all sky maps (Alex Szalay, Tony Banday) • wmap (3 year data soon !) • extinction maps • radio maps (Bonn) • ROSAT background (hopefully) Databases @ MPA

  5. Access • Public: http://www.g-vo.org/mpasims • Local web apps to Millennium, BESTDR3 and peripherals: http://www.g-vo.org/sdssdr3/ • Public web browser queries limited (1min, 10000 rows) • Local databases + web apps less limited Databases @ MPA

  6. Streaming • Query results temporarily buffered on server: memory • Streaming queries: faster, less limited (only timeout) • Access: • IDL (with Ben Panter) • wget –http-user=*** --http-password=*** -O localfile.csv http://www.g-vo.org/sdssdr3/DBQueryStream?SQL=select * from moped..agebin • GUI asking for username/password • Interprets CSV stream, turned into IDL components • TOPCAT Databases @ MPA

  7. Plans: Millennium • Millennium: • Tune database • 750000000 halos • N x 1000000000 galaxies • 63 x 256^3 density field grid cells • More halo properties (shape, λ, ...) • More galaxy catalogues • different parameters • different algorithms (GalICS, Durham, ...) • Light cone mock catalogues • Galaxy spectra (+ PCA) • Links to SDSS mirror and peripherals • Proper metadata handling (ala SkyServer) • "SAM online„ • Move webapps to MPA • Use JHU services, install CAS jobs Databases @ MPA

  8. Plans: SDSS mirror + peripherals • Make mirror web site public • Upgrade SDSS mirror to DR4 … • Stabilize, document, publish SDSS peripherals • Proper metadata handling • Links to Millennium • Personal databases: MyDB (ala SkyServer) • Add logos Databases @ MPA

  9. Theory VO: spectra • Combine theory and observations • Example: query-by-example on theory spectra • Find similar spectra, from these the actual galaxy formation history • Chi-squared on all stored spectra ? Slow, requires storing all of them • Idea (not original, see HVO/JHU talks): use PCA to compress data Databases @ MPA

  10. PCA • Need training sample of theory spectra to create eigenspectra • Project all spectra • Store PCA amplitudes in DB • Provide web service: • Upload (observational) spectrum (IVOA SSA/SED) • Project onto theory eigenspectra • Use amplitudes as parameters in query for “nearby” amplitudes • Return corresponding theory spectra • Return corresponding galaxy formation histories, or their halos, or their environment … Databases @ MPA

  11. Issues • Dealing with errors, gaps: “gappy PCA” (Connolly & Szalay) • Normalization: • incoming spectrum in general from very different dataset, needs common normalization • Incoming set will have gaps, errors • Ad hoc normalization possible (and works quite good) • Indexing of complex multi-dimensional point set for quick nearest k neigbours search (Voronoi ? See Laszlo‘s work) Databases @ MPA

  12. Normalized gappy PCA • Fit normalization factor at same time as PCA amplitudes. Model: • Minimize (over aiand N ) : Databases @ MPA

  13. Databases @ MPA

  14. Databases @ MPA

  15. Databases @ MPA

  16. So far • Ran PCA on BC03 stochastic bursts (Vivienne) • On first GalICS+milli-Millennium spectra (Jeremy) • Projected SDSS spectra on both • Defined a PCA data model/schema • Stored PCAs in database • TOPCAT Databases @ MPA

  17. PCA data model (RDB schema available) Databases @ MPA

  18. Databases @ MPA

  19. Databases @ MPA

  20. milliMil-GalICS PC1 vs PC2 Voronoi tesselation Databases @ MPA

  21. Issues for query-by-example • Overlap quite good, but good enough ? • GalICS spread less than SDSS. • BC03 comparable with SDSS, but different slope. • Systematics • Model: • physics very preliminary (see Blaizot & de Lucia?) • resolution effects • Preprocessing SDSS galaxies • Rebinning: different algorithms give comparable results • (slightly) wrong redshift ? Can be easily simulated • Projection algorithm: normalization does not affect outcome • Observational systematics: use virtual telescope (+virtual spectrograph) to test on the theory spectra.Easier to blow up simulation than to shrink observation cloud Databases @ MPA

  22. Comments • Millennium database being used for science projects (Guo Qi) • SDSS peripherals used for science projects (see Vivienne’s talk, Ben Panter) • Use of mydb for debugging and testing (Jeremy) • Please give comments, feedback. Databases @ MPA

More Related