1 / 20

Bitmap Indices for Fast End-User Physics Analysis in ROOT

Bitmap Indices for Fast End-User Physics Analysis in ROOT. Kurt Stockinger 1 , Kesheng Wu 1 , Rene Brun 2 , Philippe Canal 3 (1) Berkeley Lab, Berkeley, USA (2) CERN, Geneva, Switzerland (3) Fermi Lab, Batavia, USA. Contents.

vhiggins
Download Presentation

Bitmap Indices for Fast End-User Physics Analysis in ROOT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bitmap Indices for Fast End-User Physics Analysis in ROOT Kurt Stockinger1, Kesheng Wu1, Rene Brun2, Philippe Canal3 (1) Berkeley Lab, Berkeley, USA (2) CERN, Geneva, Switzerland (3) Fermi Lab, Batavia, USA

  2. Contents • Introduction to Bitmap Indices • Integration of Bitmap Indices into ROOT • Support for TTree::Draw and TChain::Draw • Example Usage • Example Usage • Experimental Results • Index Size • Performance of Bitmap Index vs. TTreeFormula • Conclusions

  3. Bitmap Indices • Bitmap indices are efficient data structures for accelerating multi-dimensional queries: • E.g. pT > 195 AND nTracks < 4 AND muonTight1cm > 12.4 • Supported by most commercial database management systems and data warehouses • Optimized for read-only data

  4. Equality Encoding vs. Range Encoding a) list of attributes b) equality encoding c) range encodingwith cardinality 10 Range encoding optimized for one-sided range queries, e.g. a0 <= 3

  5. Bitmap Indices with Binning • Simple bitmap indices work well for low-cardinality attributes, i.e. number of distinct values per attribute is low ( < 10,000) • For high-cardinality attributes, the size of the bitmap index is often too large to be of practical usage (also with good compression algorithms) • Solution: • Keep bitmap for attribute range rather than for each distinct attribute value (binning) • Requires additional step for evaluating candidates in bin (“Candidate Check”) – see example on the next slide

  6. Range Query on Bitmap Index with Binning bitmap 3 XOR bitmap 4 “Candidate check” is performed on bitmap 4 to identify attribute values where x < 63

  7. Implementation Details • FastBit: • Bitmap Index software developed at Berkeley Lab • Includes very efficient bitmap compression algorithm • Integrated bitmap indices to support: • TTree::Draw • TTree::Chain • Each attribute to be indexed is stored as a separate branch • Index is currently stored as binary file

  8. Example - Build Index // open ROOT-file TFile f("data/root/data.root"); TTree *tree = (TTree*) f.Get("tree"); TBitmapIndex bitmapIndex; bitmapIndex.Init(); char indexLocation[1024] = “/data/index/"; bitmapIndex.ReadRootWriteIndexFile(tree, indexLocation); // build index for two attributes bitmapIndex.BuildIndex(tree, "a1", indexLocation); bitmapIndex.BuildIndex(tree, "a2", indexLocation);

  9. Example - Tree::Draw with Index // open ROOT-file TFile f("data/root/data.root"); TTree *tree = (TTree*) f.Get("tree"); TBitmapIndex bitmapIndex; bitmapIndex.Init(); bitmapIndex.Draw(tree, "a1:a2", "a1 < 200 && a2 > 700");

  10. Performance Measurements • Compare performance of TTreeFormula with TBitmapIndex::EvaluateQuery • Do not include time for drawing histograms • Run multi-dimensional queries (cuts with multiple predicates)

  11. Experimental Setup • Software/Hardware: • Bitmap Index Software is implemented in C++ • Tests carried out on: • Linux CentOS • 2.8 GHz Intel Pentium IV with 1 GB RAM • Hardware RAID with SCSI disk • Data: • 7.6 million records with ~100 attributes each • Babar data set: • Bitmap Indices: • 10 out of ~100 attributes • 1000 equality-encoded bins • 100 range-encoded bins • Bitmap Index Compression algorithm: WAH (Word-Aligned Hybrid)

  12. Size of Compressed Bitmap Indices EE-BMI: equality-encoded bitmap index RE-BMI: range-encoded bitmap index

  13. Query Performance - TTreeFormula vs. Bitmap Indices Performance improvement of bitmap indices over TTreeFormula up to a factor of 10.

  14. Query Performance - TTreeFormula vs. Bitmap Indices

  15. Query Performance - TTreeFormula vs. Bitmap Indices Performance improvement of bitmap indices over TTreeFormula up to a factor of 10.

  16. Approximate Answers • For bitmap indices with binning the exact answers are yielded during the Candidate Check Phase • Read certain records from disk to check if they fulfill the query constraint • Approximate answers are returned if the Candidate Check is omitted • The error of the approximate depends on the number of bins: • Note: the query result includes more events • However, no correct events are dropped • We used two different binning strategies: • Equality Encoding with 1000 bins: error rate 0.1% • Range Encoding with 100 bins: error rate 1%

  17. Query Performance - Approximate Answers (Error 0.1- 1%) Performance improvement of bitmap indices over TTreeFormula up to a factor of 30.

  18. Query Performance - Approximate Answers (Error 0.1- 1%)

  19. Query Performance - Approximate Answers (Error 0.1- 1%) Performance improvement of bitmap indices over TTreeFormula up to a factor of 30.

  20. Conclusions • We integrated bitmap indices into ROOT to support: • TTree::Draw • TChain::Draw • Bitmap indices significantly improve the performance of end-user analysis by up to a factor of 10. • With approximate answers of 0.1-1% error the performance improvement is up to a factor of 30. • Bitmap indices are also used successfully in STAR experiment at Brookhaven to access ROOT-files with GridCollector. • Future work: • Store bitmap indices as ROOT-tree. • Integrate with PROOF to support parallel index evaluation.

More Related