1 / 50

Introduction to the Millennium Database with an SQL tutorial

Introduction to the Millennium Database with an SQL tutorial. Overview. Why relational database ? Overview relational databases general Millennium DB design SQL Tutorial Science queries Tools Advanced subjects (not now). Website documentation http://www.g-vo.org/Millennium/Help.

Download Presentation

Introduction to the Millennium Database with an SQL tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to the Millennium Database with an SQL tutorial Millennium DB Tutorial

  2. Overview • Why relational database ? • Overview relational databases • general • Millennium DB design • SQL Tutorial • Science queries • Tools • Advanced subjects (not now) Millennium DB Tutorial

  3. Website documentationhttp://www.g-vo.org/Millennium/Help Millennium DB Tutorial

  4. Why use relational database ? • encapsulation of data in terms of rigorous logical model • no need to know about internals of data storage • forces one to think carefully about data structure • ANSI standard query language (SQL) for finding information one is interested in • remote filtering • speeds up path from science question to answer • facilitates communication • many implementations, commercial and open source • advanced query optimizers (indexes, clustering) Millennium DB Tutorial

  5. Relational Database concepts Millennium database design Millennium DB Tutorial

  6. Relational database stores data in relations ( = tables) Millennium DB Tutorial

  7. Tables • Tables have names • Related data values are stored in rows • Rows have columns • all the same for a given table • Columns have names and data types • Rows often have a unique identifier consisting of the values of >= 1 columns: primary key Millennium DB Tutorial

  8. Primary Key Column Column Foreign Key Columns Row Millennium DB Tutorial

  9. Foreign keys • Database can contain many tables • The set of table definitions in a database is called the schemaof the database • Tables can related by foreign keys: pointers (by value) from a row in one table to a row in another (or possibly the same) table • Why not combine these rows into one table ? • Consider storing galaxies, with info about their sub-halo as well as the FOF groups these live in.Note, a subhalo contains >=1 galaxies, a FOF halo >= 0 subhalos Millennium DB Tutorial

  10. One table: redundancy GalaxyEtc Millennium DB Tutorial

  11. Normalization FOF Galaxy SubHalo Millennium DB Tutorial

  12. Millennium database FOF DHalo Bower2006a SubHalo MPAMocks DSubHalo MField DeLucia2006a MPAHalo Millennium DB Tutorial

  13. Web browser: http://www.g-vo.org/Millenniumhttp://www.g-vo.org/MyMillennium Millennium DB Tutorial

  14. SQL Tutorial Millennium DB Tutorial

  15. SQL • Sequentiual Query Language • Filtering, combining, sub-setting of tables • Functions, procedures, aggregations • Data manipulation: insert/update/delete • A query produces tabular results, which can be used as tables again in sub-queries, or stored in a database • Table creation... Millennium DB Tutorial

  16. Table creation statement create table MPAHalo ( haloId long not null, descendantId long, -- foreign key lastProgenitorId long, -- foreign key snapnum integer, redshift real, x real,y real,z real, np integer, velDisp real, vmax real, ..., primary key (haloId) ); Millennium DB Tutorial

  17. SELECT ... FROM ... WHERE ... 1. select * from MPAHalo 2. select snapnum, redshift, np from MPAHalo 3. select * from MPAHalo where redshift = 0 Millennium DB Tutorial

  18. WHERE conditions • = <> != < > <= >= • np between 100 and 200 • name like ‘%Frenk’ • a=b and d=e • a=b or e=d • id in (1,2,3) • a is null • a is not null • exists ... (later) Millennium DB Tutorial

  19. Custom column names select snapnum as snapshotIndex , redshift as z , np as numberOfParticles from MPAHalo Millennium DB Tutorial

  20. Demo queries select * from snapshots select haloid,snapnum from MPAHalo where np = 100 select x,y from MPAHalo where z between 10 and 12 and np > 50 and snapnum = 63 Millennium DB Tutorial

  21. ORDER BY ... [ASC | DESC] select h.* from MPAHalo h order by h.snapnum desc , h.x asc Millennium DB Tutorial

  22. TOP select top 10 haloid, np from mpahalo where snapnum = 63 order by np desc Millennium DB Tutorial

  23. Aggregation: count, sum, max, min, avg, stddev select count(*) as num , max(stellarmass) as maxmass , avg(stellarmass) as avgmass from delucia2006a where snapnum = 63 and type = 1 Millennium DB Tutorial

  24. JOIN (note the aliases) select h.haloid, g.stellarMass from delucia2006a g , mpahalo h where h.np = 1000 and g.haloid = h.haloid Millennium DB Tutorial

  25. Demo: galaxies in massive halos select h.haloId, g.* from DeLucia2006a g , MPAHalo h where h.snapnum = 63 and h.np between 10000 and 11000 and g.haloId = h.haloId Millennium DB Tutorial

  26. Demo: direct progenitors of massive halos select prog.* from MPAHalo prog , MPAHalo des where des.haloId = prog.descendantId and des.np > 10000 and des.snapnum = 63 Millennium DB Tutorial

  27. GROUP BY select redshift , type , count(*) as numGal , avg(stellarMass) as m_avg , max(stellarMass) as m_max from DeLucia2006a group by redshift, type order by redshift, type Millennium DB Tutorial

  28. Sub-selects select g.galaxyId from DeLucia2006a g , (select top 10 haloId from mpahalo where snapnum = 63 order by np desc) mh where g.haloId = mh.haloId Millennium DB Tutorial

  29. Science questions as SQL Millennium DB Tutorial

  30. Return the galaxies residing in halos of mass between 10^13 and 10^14 solar masses. • Return the galaxy content at z=3 of the progenitors of a halo identified at z=0 • Return all the galaxies within a sphere of radius 3Mpc around a particular halo • Return the complete halo merger tree for a halo identified at z=0 • Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals. • Find properties of all galaxies in haloes of mass 10**14 at redshift 1 which have had a major merger (mass-ratio < 4:1) since redshift 1.5. • Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5) • Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3 • Make a list of all haloes at z=3 which contain a galaxy of mass >10**9 Msun which is a progenitor of BCG's in z=0 cluster of mass >10**14.5 • Find all z=3 galaxies which have NO z=0 descendant. • Return the complete galaxy merging history for a given z=0 galaxy. • Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at some previous redshift. • Find the multiplicity function of halos depending on their environment (overdensity of density field smoothed on certain scale) • Find the dependency of halo formation times on environment (“Gao-effect”) Motivation for data model Millennium DB Tutorial

  31. 5. Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals. select x,y,z,velX, velY, velZ from DeLucia2006a where mag_b between -23 and -18 and bulgeMass >= .9*stellarMass and snapnum = 50 Millennium DB Tutorial

  32. 4. Return the complete halo merger tree for a halo identified at z=0 Millennium DB Tutorial

  33. Efficient storage of merger trees in a relational database • Goal: allow queries for the formation history of any object • No recursion possible in RDB, nor desired • Method: • depth first ordering of trees • label by rank in order • pointer to “last progenitor” below each node • all progenitors have label BETWEEN label of root AND that of last progenitor • cluster table on label Millennium DB Tutorial

  34. Merger trees Millennium DB Tutorial

  35. select prog.snapnum , prog.x , prog.y , prog.np from millimil..mpahalo des , millimil..mpahalo prog where prog.haloId between des.haloId and des.lastProgenitorId and des.haloId = 0 Millennium DB Tutorial

  36. Some more features of the merger tree data model • Leaves : • select galaxyId as leaf • from galaxies des • where galaxyId • = lastProgenitorId Branching points : select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1 Millennium DB Tutorial

  37. Main branches • Roots and leaves: select des.galaxyId as rootId , min(prog.lastprogenitorid) as leafId into rootLeaf from mpagalaxies..delucia2006a des , mpagalaxies.. delucia2006a prog where des.galaxyId = 0 and prog.galaxyId between des.galaxyId and des.lastProgenitorId • Main branch select rl.rootId, b.* from rootLeaf rl , mpagalaxies..delucia2006a b where b.galaxyId between rl.rootId and rl.leafId Millennium DB Tutorial

  38. Find all halos in a subvolume of space:15 <= x <= 2020 <= y <= 255 <= z <= 10 Millennium DB Tutorial

  39. select x,y,z from mpahalo where snapnum = 63 and x between 10 and 20 and y between 20 and 30 and z between 0 and 10 Inefficient, even when indexed ! Millennium DB Tutorial

  40. x y z 15.001083 42.471325 24.673561 15.001247 58.420914 42.722874 15.002215 38.042484 29.557423 15.002735 50.487785 57.716877 15.00275320.0001778.21466 15.005095 13.637599 16.135191 15.00659322.170828 48.242783 15.01148824.824438 19.773285 15.011741 48.099907 11.500685 15.01186823.312265 27.858799 15.01306523.969515 18.883507 15.013158 56.041866 40.82894 15.014361 59.503357 45.31733 15.017322 46.257664 44.37695 15.01820227.3338959.441319 Millennium DB Tutorial

  41. Spatial indexes • Performance of finding things is improved if those things are co-located on disk: ordering, indices • Co-locating a 3D configuration of points on a 1D disk can only be done approximately • Space filling curves: Peano-Hilbert, Z-curve Millennium DB Tutorial

  42. Zones Millennium DB Tutorial

  43. Zone index • Course sampling of points in multiple dimensions allows simple multi-dimensional ordering • ix = floor(x/10Mpc)iy = floor(y/10Mpc)iz = floor(z/10Mpc) • index on (snapnum,ix,iy,iz,x,y,z,galaxyId) Millennium DB Tutorial

  44. IX IY IZ X Y Z 1 2 0 15.061804 20.891907 4.4156647 1 2 0 15.069336 23.437601 9.812217 1 2 0 15.100678 20.905642 4.613036 1 2 0 15.173968 22.36883 8.01832 1 2 0 15.194122 20.67583 4.8034463 1 2 0 15.2500305 24.246683 1.6651521 1 2 0 15.365576 23.290754 9.404872 1 2 0 15.372606 20.203691 2.0006201 1 2 0 15.524696 21.03997 4.280077 1 2 0 15.583943 22.344622 9.421347 1 2 0 15.6358385 26.785904 9.881406 1 2 0 15.66383 22.829983 7.137772 1 2 0 15.673803 26.918291 3.302736 1 2 0 15.717824 22.365341 9.221828 1 2 0 15.847992 24.700747 1.389664 1 2 0 15.883896 22.593819 7.277129 1 2 0 15.91041 26.531118 2.5693457 1 2 0 15.916905 27.137867 4.289855 1 2 0 16.047333 28.93811 5.414605 Millennium DB Tutorial

  45. Return B-band luminosity function of galaxies residing in halos of mass between 10^13 and 10^14 solar masses. select .2*floor(5*g.mag_b) as magB , count(*) as num from DeLucia2006a g , MPAHalo h where g.haloId = h.haloId and h.m_TopHat between 1000 and 10000 and h.redshift=0 group by .2*floor(5*g.mag_b) Millennium DB Tutorial

  46. 13.Find the dependency of halo formation times on environment Millennium DB Tutorial

  47. select zForm , avg(g5) as g5 , avg(g10) as g10 from MMField , ( select des.haloId, des.phkey, max(PROG.redshift) as zForm from MPAHalo PROG, MPAHalo DES where DES.snapnum = 63 and PROG.haloId between DES.haloId and DES.lastProgenitorId and prog.np >= des.np/2 and des.np between 100 and 200 group by des.haloId, des.phkey ) t where t.phkey = f.phkey and f.snapnum=63 group by zForm Millennium DB Tutorial

  48. Tools Millennium DB Tutorial

  49. Other tools • wget, UNIX/LINUX command wget "http://www.g-vo.org/Millennium?action=doQuery & SQL=select top 10 haloid,snapnum, x,y,z,np from mpahalo" • Use in R (similar in IDL) ... • TOPCAT Millennium DB Tutorial

  50. Thank you. Millennium DB Tutorial

More Related