1 / 19

The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in just a Few Seconds

The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in just a Few Seconds. Martin L. Kersten Stratos Idreos Stefan Manegold Erietta Liarou (and members of the CWI database group). Science Feb’11 Data. http://www.sciencemag.org/site/special/data/.

dmorelli
Download Presentation

The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in just a Few Seconds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Researcher’s Guide to the Data Deluge:Querying a Scientific Database in just a Few Seconds Martin L. Kersten Stratos Idreos Stefan Manegold Erietta Liarou (and members of the CWI database group)

  2. Science Feb’11 Data http://www.sciencemag.org/site/special/data/

  3. Science Feb’11 Data …. We have recently passed the point where more data is being collected than we can physically store. This storage gap will widen rapidly in data-intensive fields. Thus, decisions will be needed on which data to archive and which to discard. A separate problem is how to access and use these data. Many data sets are becoming too large to download. Even fields with well-established data archives, such as genomics, are facing new and growing challenges in data volume and management. And even where accessible, much data in many fields is too poorly organized to enable it to be efficiently used….

  4. Science Feb’11 Data

  5. Science Feb’11 Data

  6. Database research vision • Throwing away data before harvesting is the worst ROI one can imagine. • LSST budget is 100 M$ • During its ten-year survey, LSST will acquire 5.6 million 15-second images, spread over 2.8 million pointings. • 20 billion rows in the Object table, 3 trillion rows in the Source table

  7. Database technology is not designed for the challenges All sizes don’t fit

  8. The Dawn of a new Database Era Capture the query intent !

  9. FIVE STEPS INTO THE FUTURE • One-minute DBMS for real-time performance. • Multi-scale query processing for gradual exploration. • Post processing for conveying meaningful data. • Query morphing to adjust for proximity results. • Query alternatives to cope with lack of providence.

  10. One-minute database kernels Step 1: Do the BEST you can within a given time frame ! • Research how to … • organize query evaluation around what is available at low cost • redesign algorithms and operators such that they adaptively avoid expensive steps normally needed for correctness and completeness • stop process after agreed upon time • ensure continuation upon request.

  11. Multi-scale query processing Step 2: Use a staging scheme for query evaluation ! • Research how to … • partition the database for producing incremental valuable results D => D1 union (D2.1 union (D2.2 union (D2.3 union .. • avoid harmful SELECT * FROM table queries • break a query into a converging query sequence Q => Q1 union Q2 => Q1 union Q2.1 union Q2.2 => Q1 union Q2.1 union Q2.2.1 union Q2.2.2 …….

  12. Result-set post processing Step 3: Use meaningful compression to convey more ! • Research how to … • post-process results sets statistically • prepare for facetted query answers • show sort for boundaries first • Min/max domain enclosures for all attributes

  13. Query morphing Step 4: Bend the search towards interesting areas ! • Research how to … • explore the query expression space? • transform a query with small result set such that it produces relevant, nearby answers

  14. Result-set post processing Step 5: Ignore stupid questions, give hints instead ! • Research how to … • find alternative queries in terms of expressiveness + performance • Better exploit the query log for hints -- Q1: Using the time budget. (36291322 tuples) SELECT ra, dec, band1, intensity1, type FROM PhotoObj; -- Q2: Using data statistics. (879300 tuples) SELECT * FROM PhotoObj WHERE ra BETWEEN 53 AND 54 AND dec BETWEEN 80 AND 82; -- Q3: Using query statistics. (899 tuples) SELECT * FROM PhotoObj WHERE ra BETWEEN 53 AND 54 AND dec BETWEEN 80 AND 82 AND distance(ra,dec,radius) < 10; SELECT * FROM PhotoObj

  15. The Dawn of a new Database Era Brought to you by the CWI database research group

More Related