1 / 17

MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database

MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database. Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten CWI, Amsterdam Presented at SSDBM, July 2007, Banff, Canada. 230 million object images 1 million spectra 4TB catalog data 9TB images.

Download Presentation

MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MonetDB/SQL Meets SkyServer:the Challenges of a Scientific Database Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten CWI, Amsterdam Presented at SSDBM, July 2007, Banff, Canada

  2. 230 million object images • 1 million spectra • 4TB catalog data • 9TB images A project to make a map of a large part of the Universe SkyServer provides public access to SDSS for astronomers, students, and wide public

  3. SkyServer Schema Vertical fragment of 100+ popular columns 446 columns >370 million rows Materialized join of Photo and Spectra

  4. Outline • MonetDB/SQL • SkyServer porting lessons • Query log lessons • Evaluation • Outlook

  5. MonetDB Background PhotoObjAll Ra BAT Dec BAT U BAT

  6. select count(*) from photoobjall; SQL XQuery function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X6:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",1); X9:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",2); X13:bat[:oid,:oid] := sql.bind_dbat("sys","photoobjall",1); X8 := algebra.kunion(X1,X6); X11 := algebra.kdifference(X8,X9); X12 := algebra.kunion(X11,X9); X14 := bat.reverse(X13); X15 := algebra.kdifference(X12,X14); X16 := calc.oid(0@0); X18 := algebra.markT(X15,X16); X19 := bat.reverse(X18); X20 := aggr.count(X19); sql.exportValue(1,"sys.","count_","int",32,0,6,X20,""); end s3_1; MonetDB Kernel MonetDB Architecture MAL Tactical Optimizer MAL MonetDB Server

  7. SkyServer with MonetDB Goal: To provide SkyServer mirror with similar functionality using MonetDB Three phases: 1%, 10%, entire SDSS data set Can we • Do better in terms of performance and functionality? • Improve query processing by novel parallelism and query cracking techniques? • Extend functionality to support, e.g. LOFAR?

  8. Portability Lessons • Need for rich SQL environment (PSM) • Cast to SQL:2003 standard • Replacement of data types and operations • Specific extensions ignored or replaced • Avoid data redundancy • Auxiliary tables replaced by views:10% size reduction

  9. Spatial Search Lesson • HTM (Hierarchical Triangular Mesh) • Implemented in C++, C# • Good for point-near-point and point-in-region queries • Zones • Implemented in SQL • Good for point-near-point (x3) • Efficient for batch-oriented spatial join(x32) • Enables SQL optimizer usage

  10. Query Log Lessons • Query logs important for both application and science • Analysed 1.2M queries, August 2006 • Spatial access prevails (83%) • Small core of photo and spectro tables accessed • 64% photo, 44% spectro, 27% both

  11. Common Patterns • Limited number of query patterns • Correlation to web site interface • Most popular query (25%) SELECT top 10 p.objID, p.run, p.rerun, p.camcol, p.field, p.obj, p.type, p.ra, p.dec, p.u, p.g, p.r, p.i, p.z, p.Err_u, p.Err_g, p.Err_r, p.Err_i, p.Err_z FROM fGetNearbyObjEq(195,2.5,3) n, PhotoPrimary p WHERE n.objID = p.objID;

  12. Query coverage on the sky Query coverage on the sky Query Coverage

  13. Spatial Overlap • 24% queries overlap • Mean sequence length of 9.4, max of 6200 • Overlap and equality patterns for script-based interaction • Zoom in/zoom out patterns for manual interaction

  14. Evaluation on 1GB

  15. Evaluation on 100GB • ‘Color-cut’ for low-z quasars SELECT g, run, rerun, camcol, field, objID, FROM Galaxy WHERE ( ( g <= 22) and (u - g >= -0.27) and (u - g < 0.71) and (g - r >= -0.24) and (g - r < 0.35) and (r - i >= -0.27) and (r - i < 0.57) and (i - z >= -0.35) and (i - z < 0.7) ); • Moving asteroids SELECT objID, sqrt(power(rowv,2) + power(colv,2)) as velocity FROM PhotoObj WHERE power(rowv,2) + power(colv,2) > 50 and rowv >= 0 and colv >= 0;

  16. Status • Staircase to the sky • 1GB: done • 100GB: towards completion • Entire 4TB DR6: in progress • Web site

  17. Inspirations • Self-organization vs. hard-coded zoning • Adaptive segmentation (ICDE’08) • Adaptive replication (EDBT’08) • Results caching and reuse • Workload-driven optimization

More Related