150 likes | 264 Views
An evaluation of Data Storage and Analysis With Oracle DBMS. Eric Grancher, CERN IT/DB Database Workshop, CERN July 11 th -13 th 2001. TAG data analysis. First step of the evaluation of the possibility to use advanced index features to perform simple cuts
E N D
An evaluation of Data Storage and Analysis With Oracle DBMS Eric Grancher, CERN IT/DB Database Workshop, CERN July 11th-13th 2001
TAG data analysis • First step of the evaluation of the possibility to use advanced index features to perform simple cuts • Most of the credit goes to Maciej Marczukajtis (technical student) • TAG data imported from a ALEPH PAW file
Model • Data model has 16 double attributes • Cut: “retrieve entries for which field1 > 12 and field2 < 23 and field3 > 56” • The model can be implemented as a relational table, mapping every attribute to a column
Performing the Cuts • Normal B*tree index can’t be used in an efficient way as they can’t be merged • Bitmap indexes can be merged • “how to use the bitmap indexes to perform cuts?”
Size = f(cardinality) 1 million rows 5 million rows size size cardinality cardinality
0 10 20 30 40 Binning 1 2 3 4 0 • Double values, problem with cardinality • Binning idea (c.f. Kurt Stockinger) • Purpose - reducing the cardinality. Binning function maps the whole range of values in column into bins (sub ranges) • Function used in tests [lower, upper]-> [0…number of bins]
Function based indexes • create bitmap index f_x1 on tag_data(binning(field1)) • Queries have to exactly match the index creation statements
Lower limit Upper limit Max Min 0 Selected by the second query Selected by the first query Checked against table Several ways to use the bitmap indexes Even if SQL is said “non-procedural”, the way you write the statement produces different execution plans ! • Bitmap index 1 • SELECT COUNT(PT_LEP_MC) FROM tag_data WHERE F1(PT_LEP_MC) < F1(20) AND F2(ETA_LEP_MC) > F2(0) UNION ALL SELECT COUNT(PT_LEP_MC) FROM(SELECT PT_LEP_MC, ETA_LEP_MC FROM tag_data WHERE (F1(PT_LEP_MC) <= F1(20) AND F2(ETA_LEP_MC) >= F2(0)) AND NOT (F1(PT_LEP_MC) < F1(20) AND F2(ETA_LEP_MC) > F2(0)))WHERE PT_LEP_MC < 20 AND ETA_LEP_MC > 0; ETA_LEP_MC F1 and F2 are the binningfunctions
Lower limit Upper limit Max Min 0 Selected by the inner query Checked against table Several Ways to Use the Bitmap Indexes • Bitmap index 2 • SELECT COUNT(PT_LEP_MC) FROM (SELECT PT_LEP_MC, ETA_LEP_MC FROM tag_data WHERE F1(PT_LEP_MC) <= F1(20) AND F2(ETA_LEP_MC) >= F2(0)) WHERE PT_LEP_MC < 20 AND ETA_LEP_MC > 0; ETA_LEP_MC F1 and F2 are the binningfunctions
Performance • Parallel query: several processes to perform a single statement • Check the execution plan! • For some queries, it is almost as efficient in time to perform a full table scan (but with several users !)
Size • Block size 8KB • Table size = 203.1 MB • Average bitmap index size = 24.3 MB-> more space used for the indexes than the data (not unusual for some heavily queried Oracle database)
Results Bitmap index 1: only candidates are checked against the table Bitmap index 2 all hits and candidates are checked against the table Heavily depends on the query. I/O doesn’t mean time.
Conclusion • Using bitmap index for tag data in the DBMS can provide a dramatic gain • Clear advantage in using database features when appropriate • Works efficiently with high selectivity (few records result from the query) • B*Tree indexes take a lot of space and are not efficient for multiple dimensional queries
Perspectives • Proper binning referring to the distribution of the data can improve performance of bitmap index, depends on distribution + queries • Possibility to combine bitmap and B*tree indexes • How to this fit for ESD/AOD and references/OCCI?
References & Questions RAW tests http://edmsoraweb.cern.ch:8001/cedar/doc.info?document_id=318241&version=1 TAG tests http://edmsoraweb.cern.ch:8001/cedar/project.info?proj_id=CERN-0000005796