1 / 29

Oracle Index study for Event TAG DB

Oracle Index study for Event TAG DB. M. Boschini matteo.boschini@mib.infn.it S. Della Torre stefano.dellatorre@mib.infn.it. Outline. TAG DB issue: space usage issue INDEX accounting for ~ 50% of used space... query performance Tests with random data Tests with AMS01 data

Download Presentation

Oracle Index study for Event TAG DB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oracle Index study for Event TAG DB M. Boschini matteo.boschini@mib.infn.it S. Della Torrestefano.dellatorre@mib.infn.it

  2. Outline TAG DB issue: space usage issue INDEX accounting for ~ 50% of used space... query performance Tests with random data Tests with AMS01 data Tests with AMS02 Cosmic data M. Boschini Oracle Index Study for STATUS DB – TIM October 2008 - CERN

  3. EVENT TAG DB Goal: test feasibility of storing in an Oracle DB TAG info for AMS02 events TAG is a 64bit number AMS02 MC has, as for now, no meaningful TAG info (0 or error bit)‏ We start from previous studies: http://ams.cern.ch/AMS/Reports/AMSnote-2000_09_05b.ps.gz http://ams.cern.ch/AMS/Reports/Computing/Apr2004/M.Boschini_LV3_DB.pdf

  4. EVENT TAG DB Test INDEX type space usage Test INDEX usage Time insert & selections What can we use ?

  5. test environment SLC 4.6 (64bit)‏ 2 dual core Intel(R) Xeon(R) CPU @ 2.00GHz Oracle 11g 64 bit we opted for Oracle11g because of product lifetime All data is fed to DB using C-OCI programs Oracle setup: user AMSDES with dedicated Bigfile tablespace overkill wrt Oracle's suggestion (>= 1 TB) ...? default TEMP TBLSpace (3 GB)‏

  6. test environment Dummy Table: RUN (NUMBER) EVENT (NUMBER) TAG (BINARY_DOUBLE)‏ TAG field will be used for indexing ...assuming queries will be mainly on TAG 10^8 records, equivalent to ~ 10 days of DAQ Test B-Tree vs BitMap Index as hinted by our previous studies...

  7. Indexes (theory)‏ Btree default index type in Oracle very space consuming. Not advised for dataware housing dataware house: write few, read many, HUGE data sample BitMap advised for dataware housing Bitmap indexes are stored as an array of zero-and-one values, with one entry for each row. Should be used only on low cardinality fields

  8. Indexes (theory)‏ BitMap From Oracle Advanced Programming White Paper, Sep. 2008 “...Conventional wisdom holds that bitmap indexes are most appropriate for columns having low distinct values—such as GENDER, MARITAL_STATUS. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems...”

  9. Tests Items to be tested are: SPACE OCCUPANCY = f(index type)‏ QUERY PERFORMANCE = f(index type)‏ Index types: B-Tree and Bitmap We started with a random generated sample We then used an AMS01 sample 40 ntuples no request on charge, pmass, pmom, lat (nothing!) AMS02 Cosmic data

  10. Disk Space TABLE SIZE = 5 GB this means that for AMS02 we'll need 500 GB B-Tree INDEX SIZE = 4.8 GB this means that for AMS02 we'd need 480 GB BitMap INDEX SIZE = 70 MB this means that for AMS02 we'd need 7 GB thus, at least for space reasons, we should use BitMap INDEX, a factor 30 smaller.

  11. when to create an INDEX After TABLE has been populated ! B-Tree INDEX creation time: 4 µsec/record mainly SORT and disk space allocation BitMap INDEX creation time: 0.5 µsec/record

  12. Theory: is an INDEX useful ? Not always... ORACLE has a built-in decision taking algorithm (EXECUTION PLAN) which decides how to actually implement a query. Execution plan tries to optimize disk accesses and CPU usage. So, e.g., if reading an index which returns “many” records takes more than X, than the index is not used. These are based, since Oracle 10.0.2i on Cost Based Optimizer and Dynamic Sampling (by default ON)‏

  13. EXECUTION PLAN PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 3591811347 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 36793 | 503K| 297 (1) | 00:00:04 | | 1 | TABLE ACCESS FULL | N_TAG_RAND_NOISE | 36793 | 503K| 297 (1)| 00:00:05 |

  14. EXECUTION PLAN PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 3591811347 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 36793 | 503K| 297 (1) | 00:00:04 | | 1 | TABLE ACCESS BY INDEX ROWID| N_TAG_RAND_NOISE | 36793 | 503K| 297 (1)| 00:00:04 | |* 2 | INDEX RANGE SCAN | TAG_RAND_IDX | 36793 | | 106 (0)| 00:00:02 |

  15. IDX usage Test To test if Index is actually used, we wrote a simple PLSQL+Perl program that Dynamically populates statistics table HOW-MANY <--> TAG_STA For each TAG_STA value, generates and analyzes EXECUTION PLAN for SELECT run, event from TABLE wher tag=TAG_STA; We thus run CARDINALITY queries to DB and analyze the Optimizer decisions...

  16. Sample1 CARDINALITY:5relative population > 5.7%

  17. Sample2 CARDINALITY:2517relative population: 6x10E-4

  18. Performance Sample1 and Sample2 both do NOT use INDEX in the select query. This is because Oracle's built-in optimization algorithm discovers that too many records will be returned, and thus decides to ignore the INDEX Thus: LOW/MEDIUM cardinality with HIGH relative population makes INDEX uneffective

  19. Sample3 CARDINALITY:1024relative population 6x10E-5

  20. Performance Sample3 uses INDEX ! This is because Oracle's builtin optimization algorithm discovers that TABLE ACCESS BY INDEX ROWID is efficient. Thus LOW/MEDIUM cardinality with “correct for Oracle” relative population makes INDEX effective !

  21. Real Data: AMS01 We then used an AMS01 sample 40 ntuples (no request on charge, pmass, pmom, ecc)‏ “scrambled” this sample to get 10^8 records Average relative cardinality = 2x10E-6 BitMap Index is USED ! Selection time no IDX: 22 sec IDX: 0.05 sec INDEX really effective

  22. Real Data: AMS02 !!! We also used AMS02 Cosmic Rays data we used only 2 2x10E8 events no request on charge, pmass, pmom, ...(nothing!) Table size: 5.5 GB  2 BitMap IDX size: 57 MB  2 just for the sake of curiosity: a UNIQUE requirement on RUN+EVENTNO creates an INDEX 5 GB big !!! Btree IDX size: 4.2 GB  2

  23. Real Data: AMS02 !!! Mean Relative cardinality: 1.8x10E-4 Max Relative cardinality: 3x10E-3 BitMap Index can be USED ! tested EXECUTION PLAN for all TAG values.

  24. Real Data: AMS02 !!! Selection time no IDX: Average Selection time: 24 sec 0.01 sec/record once retrieved IDX: Average Selection time: 0.05 sec 0.005 sec/record once retrieved Again, B-Map index is very effective.

  25. naïve usage... The most naïve way to design the DB is 1 table for each month of DAQ. This leads to ~36 tables. Most naïve way to query them all is using a UNION statement which still uses index with nearly no overhead for the SORT UNIQUE/UNION-ALL part...

  26. PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2624995586 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 44 | 1408 | 290K (50)| 00:58:03 | | 1 | SORT UNIQUE | | 44 | 1408 | 290K (50)| 00:58:03 | | 2 | UNION-ALL | | | | | | | 3 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV | 42 | 1344 | 145K (1)| 00:29:10 | | 4 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 5 | BITMAP INDEX SINGLE VALUE | COMIX_TDV | | | | | | 6 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV1 | 2 | 64 | 144K (1)| 00:28:53 | | 7 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 8 | BITMAP INDEX SINGLE VALUE | COMIX_TDV1 | | | | | ---------------------------------------------------------------------------------------------- 44 rows returned out of 377,516,896 rows in total Rough estimate for SELECT on 3 years: 25 minutes...

  27. Conclusions In general, Bit-Map index is very efficient in space usage 30 times smaller than normal Btree index Index is used in queries if relative cardinality is low If so, selection time is very low. According to AMS01 and AMS02 Cosmic Rays analyzed, BitMap index can be used We could thus use it for AMS02 expected space usage: 500 GB (table)+ 6 GB (idx)‏ EventStatusTable02 files will use ~ 150 GB...

  28. AMS01 sample CARDINALITY:89860average relative population 5.7%

More Related