500 likes | 656 Views
Introduction to the Millennium Database with an SQL tutorial. Overview. Why relational database ? Overview relational databases general Millennium DB design SQL Tutorial Science queries Tools Advanced subjects (not now). Website documentation http://www.g-vo.org/Millennium/Help.
E N D
Introduction to the Millennium Database with an SQL tutorial Millennium DB Tutorial
Overview • Why relational database ? • Overview relational databases • general • Millennium DB design • SQL Tutorial • Science queries • Tools • Advanced subjects (not now) Millennium DB Tutorial
Website documentationhttp://www.g-vo.org/Millennium/Help Millennium DB Tutorial
Why use relational database ? • encapsulation of data in terms of rigorous logical model • no need to know about internals of data storage • forces one to think carefully about data structure • ANSI standard query language (SQL) for finding information one is interested in • remote filtering • speeds up path from science question to answer • facilitates communication • many implementations, commercial and open source • advanced query optimizers (indexes, clustering) Millennium DB Tutorial
Relational Database concepts Millennium database design Millennium DB Tutorial
Relational database stores data in relations ( = tables) Millennium DB Tutorial
Tables • Tables have names • Related data values are stored in rows • Rows have columns • all the same for a given table • Columns have names and data types • Rows often have a unique identifier consisting of the values of >= 1 columns: primary key Millennium DB Tutorial
Primary Key Column Column Foreign Key Columns Row Millennium DB Tutorial
Foreign keys • Database can contain many tables • The set of table definitions in a database is called the schemaof the database • Tables can related by foreign keys: pointers (by value) from a row in one table to a row in another (or possibly the same) table • Why not combine these rows into one table ? • Consider storing galaxies, with info about their sub-halo as well as the FOF groups these live in.Note, a subhalo contains >=1 galaxies, a FOF halo >= 0 subhalos Millennium DB Tutorial
One table: redundancy GalaxyEtc Millennium DB Tutorial
Normalization FOF Galaxy SubHalo Millennium DB Tutorial
Millennium database FOF DHalo Bower2006a SubHalo MPAMocks DSubHalo MField DeLucia2006a MPAHalo Millennium DB Tutorial
Web browser: http://www.g-vo.org/Millenniumhttp://www.g-vo.org/MyMillennium Millennium DB Tutorial
SQL Tutorial Millennium DB Tutorial
SQL • Sequentiual Query Language • Filtering, combining, sub-setting of tables • Functions, procedures, aggregations • Data manipulation: insert/update/delete • A query produces tabular results, which can be used as tables again in sub-queries, or stored in a database • Table creation... Millennium DB Tutorial
Table creation statement create table MPAHalo ( haloId long not null, descendantId long, -- foreign key lastProgenitorId long, -- foreign key snapnum integer, redshift real, x real,y real,z real, np integer, velDisp real, vmax real, ..., primary key (haloId) ); Millennium DB Tutorial
SELECT ... FROM ... WHERE ... 1. select * from MPAHalo 2. select snapnum, redshift, np from MPAHalo 3. select * from MPAHalo where redshift = 0 Millennium DB Tutorial
WHERE conditions • = <> != < > <= >= • np between 100 and 200 • name like ‘%Frenk’ • a=b and d=e • a=b or e=d • id in (1,2,3) • a is null • a is not null • exists ... (later) Millennium DB Tutorial
Custom column names select snapnum as snapshotIndex , redshift as z , np as numberOfParticles from MPAHalo Millennium DB Tutorial
Demo queries select * from snapshots select haloid,snapnum from MPAHalo where np = 100 select x,y from MPAHalo where z between 10 and 12 and np > 50 and snapnum = 63 Millennium DB Tutorial
ORDER BY ... [ASC | DESC] select h.* from MPAHalo h order by h.snapnum desc , h.x asc Millennium DB Tutorial
TOP select top 10 haloid, np from mpahalo where snapnum = 63 order by np desc Millennium DB Tutorial
Aggregation: count, sum, max, min, avg, stddev select count(*) as num , max(stellarmass) as maxmass , avg(stellarmass) as avgmass from delucia2006a where snapnum = 63 and type = 1 Millennium DB Tutorial
JOIN (note the aliases) select h.haloid, g.stellarMass from delucia2006a g , mpahalo h where h.np = 1000 and g.haloid = h.haloid Millennium DB Tutorial
Demo: galaxies in massive halos select h.haloId, g.* from DeLucia2006a g , MPAHalo h where h.snapnum = 63 and h.np between 10000 and 11000 and g.haloId = h.haloId Millennium DB Tutorial
Demo: direct progenitors of massive halos select prog.* from MPAHalo prog , MPAHalo des where des.haloId = prog.descendantId and des.np > 10000 and des.snapnum = 63 Millennium DB Tutorial
GROUP BY select redshift , type , count(*) as numGal , avg(stellarMass) as m_avg , max(stellarMass) as m_max from DeLucia2006a group by redshift, type order by redshift, type Millennium DB Tutorial
Sub-selects select g.galaxyId from DeLucia2006a g , (select top 10 haloId from mpahalo where snapnum = 63 order by np desc) mh where g.haloId = mh.haloId Millennium DB Tutorial
Science questions as SQL Millennium DB Tutorial
Return the galaxies residing in halos of mass between 10^13 and 10^14 solar masses. • Return the galaxy content at z=3 of the progenitors of a halo identified at z=0 • Return all the galaxies within a sphere of radius 3Mpc around a particular halo • Return the complete halo merger tree for a halo identified at z=0 • Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals. • Find properties of all galaxies in haloes of mass 10**14 at redshift 1 which have had a major merger (mass-ratio < 4:1) since redshift 1.5. • Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5) • Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3 • Make a list of all haloes at z=3 which contain a galaxy of mass >10**9 Msun which is a progenitor of BCG's in z=0 cluster of mass >10**14.5 • Find all z=3 galaxies which have NO z=0 descendant. • Return the complete galaxy merging history for a given z=0 galaxy. • Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at some previous redshift. • Find the multiplicity function of halos depending on their environment (overdensity of density field smoothed on certain scale) • Find the dependency of halo formation times on environment (“Gao-effect”) Motivation for data model Millennium DB Tutorial
5. Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals. select x,y,z,velX, velY, velZ from DeLucia2006a where mag_b between -23 and -18 and bulgeMass >= .9*stellarMass and snapnum = 50 Millennium DB Tutorial
4. Return the complete halo merger tree for a halo identified at z=0 Millennium DB Tutorial
Efficient storage of merger trees in a relational database • Goal: allow queries for the formation history of any object • No recursion possible in RDB, nor desired • Method: • depth first ordering of trees • label by rank in order • pointer to “last progenitor” below each node • all progenitors have label BETWEEN label of root AND that of last progenitor • cluster table on label Millennium DB Tutorial
Merger trees Millennium DB Tutorial
select prog.snapnum , prog.x , prog.y , prog.np from millimil..mpahalo des , millimil..mpahalo prog where prog.haloId between des.haloId and des.lastProgenitorId and des.haloId = 0 Millennium DB Tutorial
Some more features of the merger tree data model • Leaves : • select galaxyId as leaf • from galaxies des • where galaxyId • = lastProgenitorId Branching points : select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1 Millennium DB Tutorial
Main branches • Roots and leaves: select des.galaxyId as rootId , min(prog.lastprogenitorid) as leafId into rootLeaf from mpagalaxies..delucia2006a des , mpagalaxies.. delucia2006a prog where des.galaxyId = 0 and prog.galaxyId between des.galaxyId and des.lastProgenitorId • Main branch select rl.rootId, b.* from rootLeaf rl , mpagalaxies..delucia2006a b where b.galaxyId between rl.rootId and rl.leafId Millennium DB Tutorial
Find all halos in a subvolume of space:15 <= x <= 2020 <= y <= 255 <= z <= 10 Millennium DB Tutorial
select x,y,z from mpahalo where snapnum = 63 and x between 10 and 20 and y between 20 and 30 and z between 0 and 10 Inefficient, even when indexed ! Millennium DB Tutorial
x y z 15.001083 42.471325 24.673561 15.001247 58.420914 42.722874 15.002215 38.042484 29.557423 15.002735 50.487785 57.716877 15.00275320.0001778.21466 15.005095 13.637599 16.135191 15.00659322.170828 48.242783 15.01148824.824438 19.773285 15.011741 48.099907 11.500685 15.01186823.312265 27.858799 15.01306523.969515 18.883507 15.013158 56.041866 40.82894 15.014361 59.503357 45.31733 15.017322 46.257664 44.37695 15.01820227.3338959.441319 Millennium DB Tutorial
Spatial indexes • Performance of finding things is improved if those things are co-located on disk: ordering, indices • Co-locating a 3D configuration of points on a 1D disk can only be done approximately • Space filling curves: Peano-Hilbert, Z-curve Millennium DB Tutorial
Zones Millennium DB Tutorial
Zone index • Course sampling of points in multiple dimensions allows simple multi-dimensional ordering • ix = floor(x/10Mpc)iy = floor(y/10Mpc)iz = floor(z/10Mpc) • index on (snapnum,ix,iy,iz,x,y,z,galaxyId) Millennium DB Tutorial
IX IY IZ X Y Z 1 2 0 15.061804 20.891907 4.4156647 1 2 0 15.069336 23.437601 9.812217 1 2 0 15.100678 20.905642 4.613036 1 2 0 15.173968 22.36883 8.01832 1 2 0 15.194122 20.67583 4.8034463 1 2 0 15.2500305 24.246683 1.6651521 1 2 0 15.365576 23.290754 9.404872 1 2 0 15.372606 20.203691 2.0006201 1 2 0 15.524696 21.03997 4.280077 1 2 0 15.583943 22.344622 9.421347 1 2 0 15.6358385 26.785904 9.881406 1 2 0 15.66383 22.829983 7.137772 1 2 0 15.673803 26.918291 3.302736 1 2 0 15.717824 22.365341 9.221828 1 2 0 15.847992 24.700747 1.389664 1 2 0 15.883896 22.593819 7.277129 1 2 0 15.91041 26.531118 2.5693457 1 2 0 15.916905 27.137867 4.289855 1 2 0 16.047333 28.93811 5.414605 Millennium DB Tutorial
Return B-band luminosity function of galaxies residing in halos of mass between 10^13 and 10^14 solar masses. select .2*floor(5*g.mag_b) as magB , count(*) as num from DeLucia2006a g , MPAHalo h where g.haloId = h.haloId and h.m_TopHat between 1000 and 10000 and h.redshift=0 group by .2*floor(5*g.mag_b) Millennium DB Tutorial
13.Find the dependency of halo formation times on environment Millennium DB Tutorial
select zForm , avg(g5) as g5 , avg(g10) as g10 from MMField , ( select des.haloId, des.phkey, max(PROG.redshift) as zForm from MPAHalo PROG, MPAHalo DES where DES.snapnum = 63 and PROG.haloId between DES.haloId and DES.lastProgenitorId and prog.np >= des.np/2 and des.np between 100 and 200 group by des.haloId, des.phkey ) t where t.phkey = f.phkey and f.snapnum=63 group by zForm Millennium DB Tutorial
Tools Millennium DB Tutorial
Other tools • wget, UNIX/LINUX command wget "http://www.g-vo.org/Millennium?action=doQuery & SQL=select top 10 haloid,snapnum, x,y,z,np from mpahalo" • Use in R (similar in IDL) ... • TOPCAT Millennium DB Tutorial
Thank you. Millennium DB Tutorial