1 / 34

I/O Trap

I/O Trap. Reading existing data Changing existing data Update existing records Adding new records Deleting records All these involve going to disk => slowest device by considerable margin Minimise / reschedule physical I/O. Issues in Database Performance.

Download Presentation

I/O Trap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I/O Trap • Reading existing data • Changing existing data • Update existing records • Adding new records • Deleting records • All these involve going to disk => slowest device by considerable margin • Minimise / reschedule physical I/O

  2. Issues in Database Performance • Performance in Read / write are hardware issues => throw money at it • Performance of DB = ability of engine to locate data • Factors affecting speed of retrieval • Cache (sizing of objects) • Access method (table scan / indexes…) • Contention between processes • Indirect processes (roll back, archiving…)

  3. Reading data in Oracle 1 – determine how to access block where data is located • read data dictionary stored in separate part of DB • DD may be loaded up in cache (aka row cache) to limit I/O activity 2 - DD indicates preferred access method for block • B tree index, partitioning, hash clusters etc… 3 - Search begins either in full scan or with index until data found

  4. Designing DB for reading • Supply methods for high precision access to data • But some queries will defeat the strategies • E.g. credit cards transactions – monthly report of scattered items • No solution = take off-line

  5. Changing data • Oracle makes hard work of changes • Rollback data (immediate) • Log files (long term) • Changed blocks read and updated in buffer • Released to disk as buffer is cleared • But rollback info generate most I/O operations • In sensitive environments, simultaneous archiving makes it worse (ARCHIVELOG mode)

  6. Indexing Problems • Super-fast indexes need updating as data is changed => DB slows down. • More complex index = more complex update mechanism = more rollback …. • DB physical structure degrades, so does index (eg split blocks) • Performance decreases over time • Rebuild needed (which interferes with operations) INDEX_STATS tells you how big the index has grown

  7. Side effects • Definition of DB = consistency • A reading process may need an older version of the data • Need to create a private version of the data • Cl: processes that should be Read only require writes Attempt to describe all required operations in finishing to execute query on old data

  8. Effects of read consistency • Must find an older version of data • Must apply roll back • Must find old roll back block (I/O) • Roll back index (I/O) • Find data (I/O) • Roll back data • Read old data • Reverse all changes • Significant I/O implications + buffer full of old stuff

  9. Conclusions • Writers and readers DO interfere with each other • The mechanisms used by Oracle to bypass locking have performance side effects • Performance come with minimising I/O • i.e. with good access techniques • Precision of data location • Physical proximity of related data (cf: caching) • However: Techniques to reduce I/O numbers tend to reduce the speed of access!

  10. Indexing example (see figure): • Alphabetic search using B tree index for name = Oscar Smith • Split the table in sections and read until find start overshoot letter “S”: Smith, N is the one • Then look for page with Smith, N in header • Then scan for actual entry in index • Read pointer • Move to table • Read

  11. About Btree indexes • Root and Branch blocks = approx 2% of index (small) • In frequent hit situations, both blocks loaded in Data Buffer all the time • Then only 2 I/O may be required: • One to read leaf block • One to read the table • In practice, read in index and in table may require reading several blocks • E.g: Several similar names • Index spans two (or more) leaf blocks • Each record is in a different data block

  12. Creating and Using Indexes • Important for live access, but even more for querying with multiple tables • Value matching is costly process in RDB • No pointers • Connection purely on comparison basis only • One row with all other rows • If link between 2 huge tables, perf is low • All RDBs use some form of indexing • Some complexity involved as index can reduce physical I/O while increasing logical I/O (ie CPU time)

  13. Btree indexing • Creating an index means creating a table with X+1 columns • X = number of columns in index • Rowid (added field) [table block + row] • Index is then copied into consecutive blocks • PCTFREE function leaves space for growth of data (but high value will generate many leaf blocks) • Pointer is added to previous and next leaf blocks in header of block

  14. Btree indexing (2) • Then branch layer is built: • If index > one block • Collect all first entries + block address of each leaf block • Write down into the first level branch block (packed) • If branch block is full, initiates second level of branch blocks etc…. • Room is saved in branch blocks: • No forward and backward pointer in branch blocks • Entries are “trimmed” to the bare minimum • First entries are omitted • See figure 6.1

  15. Syntax • CREATE INDEX name ON table name (field1, field2 …) PCTFREE 80; • Oracle has many utility programmes to assess the performance of indexes – use INDEX_STATS (see handout) • Practical problem: is it easy to create a new index for a large table? NO!

  16. Updating indexes • Index entries are NEVER changed • Marked as deleted and re-inserted • Space made available cannot be used until after index is re-built • Inserts that don’t fit split the block (rarely 50/50!) • If a blocks becomes empty, it is marked as free, but is never removed • Also, blocks never merge automatically

  17. Some problems • Some situations cannot be addressed with indexes • e.g. In a FIFO processing situation (e.g. a queue), indexes will prove counterproductive • Index may grow to stupid proportions even with small error rate (unsuccessful processing of data) • Every time a transaction is added or processed (deleted) the index must change

  18. Alternative: Bitmap indexing • Imagine following query in huge table Find customers living in London, with 2 cars and 3 children occupying a 4 bed house • Index not useful – why? • Too big • If query changes in any way =>new index needed • Maintaining a set of indexes for each query would just be too costly • Use a bitmap (see table 6.1, 6.2 and 6.3)

  19. Bitmap indexes (2) • Special for data warehouse type DBs • Build one bitmap for each relevant parameter • Combine bitmaps using the “and” SQL keyword • Also possible to use “not”ing of bitmap (see table 6.4, 6.5)

  20. Key points to remember • What is the key advantage of a bitmap index? • What situation does it best suit ? • Bitmaps can also be packed by Oracle compression features • But size is unpredictable – why?

  21. Example: • Table with 1,000,000 rows • Bitmap on one column that can contain one of 8 different values (e.g. city names) • Data is such that all same city together 125,000 times • Write the bitmap • Imagine what compression can be achieved • Data is such that cities are in random order, but same number of each • Same questions

  22. solution First scenario • Bitmap for first city: 125,000 ones and 875,000 zeros [trimmed off] • Size ~ 125,000 bits or approx 18Kbytes • Full bitmap = 156 Kbytes Second scenario: • Bitmap for first city sequences of 1’s and 7 zeros, repeated 125,000 times • Size ~ 1,000,000 bits or approx 140 Kbytes • Full bitmap = 1.12 Mbytes • But BTree index for such data would be around 12MB

  23. Conclusion • Bitmap indexes work best when combined • They are very quick to build • Up to a million rows for 10 seconds • Work best when limited number of values + when high repetitions • Best way to deal with huge volumes => make drastic selection of interesting rows before reading the table • Warning: one entry in a Bitmap = hundreds of records = > locking can be crazy (OLTP systems) • => for datawarehouse type applications (no contention)

  24. What oracle says about it • Use index for queries with low hit ratio or when queries access < 2 - 4% of data • Index maintenance is costly so index “just in case” is silly • Must analyse the type of data when deciding what kind of index • Do NOT use columns with loads of changes in an index • Use indexed fields in “where” statement • Can also write queries with NO_INDEX

  25. Administering indexes • Indexes degrade over time • Should stabilise around 75% efficiency, but don’t • Run stats: Analyse index NAME validate structure Analyse index NAME compute statistics Analyse index NAME estimate statistics sample 1 percent See table 6.6

  26. Partitioned tables • Partitions / partitioning / partitioned tables • For very large tables • Improve querying • Easier admin • Backup and recovery easier • Optimiser knows when partitioning used • Can use in SQL also

  27. Creating a PT • Create table FRED ( ID number name varchar2(25) age number constraint fred_pk primary key (ID) ) partition by range (age) (partition PART1 values less than (21) partition PART2 values less than (40) partition PART3 values less than (maxvalue)

  28. Warning • Specification of partition is exclusive E.g. partition by range (name) (partition part1 values less than (‘F’) implies that f is excluded • Maxvalue is a general term to pick up anything that failed so far • Works for text as well as number

  29. Hash partition • Only in Oracle 8i and above • Uses a numerical algorithm based on partition key to determine where to place data • Range partition = consecutive values together • Hash = consecutive values may be in different partitions • Also gives more partitions = reduces the risk of contention

  30. What is Hash? • Imagine 8GB table – split in 8 / 1 GB • No intuitively clever way to split data • Or obvious way is totally imbalanced • 1 partition 7BG + 7 140MB • Huge variations in performance • Randomise breakdown of data so objects of similar size • Select one column • Select number of chuncks • Oracle does the rest!

  31. Mechanics of hashing • Each record is allocated into a bucket based on key value – e.g. Name = Joe • Applying the hashing function to the value Joe uniquely returns the bucket number where the record is located: • E.g. using prime number • divide KEY by a prime number • If text, translation into numeric value using ASCII code • use remainder of the division = address on the disk • if record already at same address - pointer to overflow area.

  32. Hash partition - SQL Create table FRED ( Name varchar2(25) primary key, Age number, Years abroad number ) Partition by hash (age) Partitions 2 Store in (Part1_fred, Part2_fred); (Not compulsory)

  33. Sub-partitions Create table FRED ( Name varchar2(25) primary key, Age number, Years abroad number ) Partition by range (years abroad) Subpartition by hash (name) Subpartitions 5 (partition Part1 values less than (1) partition Part2 values less than (3) partition Part3 values less than (6) partition Part4 values less than (MAXVALUE));

  34. Indexing partitions • Performance requirements may mean Partitioned tables should be indexed (separate issue) Create index FRED_NAME on FRED (name) Local Partitions (Part1, Part2, Part3, Part4) • Local means create separate index for each partition of the table • Alternative is to create a global index with values from different partitions • Global indexes cannot be created for Hash partitions

More Related