120 likes | 217 Views
Testing the In-Memory Column Store for in- d atabase physics analysis. Dr. Maaike Limper. About CERN. CERN - European Laboratory for Particle Physics. Support the research activities of 10 000 scientists from 110+ nationalities.
E N D
Testing the In-Memory Column Store for in-database physics analysis Dr.Maaike Limper
About CERN CERN - European Laboratory for Particle Physics Support the research activities of 10 000 scientists from 110+ nationalities Largest machine in the world, the Large Hadron Collider: 27km, 6000+ superconducting magnets Four main experiments: ATLAS, ALICE, CMS, LHCb Maaike Limper - CERN
Higgs Boson discovery 4 July 2012: Scientists from ATLAS and CMS present Higgs discovery result Plots of the invariant mass of photon-pairs produced at the LHC show a significant bump around 125 GeV … • Operation of the Large Hadron Collider and its experiments relies on Oracle databases: conditions data, metadata, logging & monitoring data, … • … but the data-points in these plots did not came out of a database Maaike Limper - CERN
CERN openlab My project: “Test the possibility of using the Oracle database for physics analysis” “CERN openlabis a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide LHC community” http://openlab.web.cern.ch Maaike Limper - CERN
In-database physics analysis Higgs decay to 2 photons candidate: event display from the ATLAS experiment Maaike Limper - CERN
In-database physics analysis Analysis queries • Predicate filtering to quickly apply object quality-criteria • Each analysis-specific query uses unique combination of columns J/ψ Ψ(3686) Physics Analysis database Separate physics-objects in separate tables Physics-object described by hundreds of variables wide tables! Maaike Limper - CERN
The problem • Analysis query performance typically limited by I/O reads • Full table scans over tables with many columns, while only few columns are used for each specific analysis • Combination of columns unique for each query • Can’t index every column! Maaike Limper - CERN
In-Memory Column Store • Profit from fast In-Memory reads • Read only columns relevant for the specific analysis query Oracle’s In-Memory Column Store provides a solution to reduce I/O read time, especially for tables with many columns Maaike Limper - CERN
Compression rates Average compression rate of dataset is 2.1 with query compression and 3.6 with capacity high: physics-objects represent the bulk of the data 17/6/2014 • COMPRESS FOR QUERY vs CAPACITY HIGH • “electron” typical physics-object data: mixture of int, float, double • “Event Filter” only booleans (mostly false), best compression • “Missing Energy” table with floats & double, worst compression Maaike Limper - CERN
Simple query performance 17/6/2014 Comparing “read from disk” vs IMC time: 1000x faster Comparing “read from buffer cache” vs IMC time: 40x faster Note2x more memory needed to put data in the buffer cache compared to placing it in the In-Memory Column store ! Maaike Limper - CERN
Complex query performance With IMC only 10 s to make this plot, allowing the analyst to quickly optimize results while trying different variable combinations 17/6/2014 Comparing “read from disk” vs IMC time: 70x faster Comparing “read from buffer cache” vs IMC time: 7x faster Maaike Limper - CERN
Conclusion 17/6/2014 IMC’s STAR-story: • Situation: In-database physics analysis is limited by I/O • Task: Remove I/O bottleneck for any query using any combination of columns in a table • Action: Use Oracle’s In-Memory Column Store • Take advantage of fast reads from cache • Columnar compression increases size of data that fits in-memory • Access only relevant columns and use predicate pruning to further reduce I/O • Result: I/O bottleneck removed, real-time in-database physics analysis is now possible* *while the Oracle database is not currently used for physics analysis, this study shows promising results using the In-Memory Column Store for in-database physics analysis Maaike Limper - CERN