1 / 88

Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va

Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va. Disk access. DBs traditionally stored on disk Cheaper to store on disk than in memory Costs for: Seek time, latency, data transfer time  Disk access is page (block) oriented 2 - 4 KB page size.

Download Presentation

Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va

  2. Disk access • DBs traditionally stored on disk • Cheaper to store on disk than in memory • Costs for: • Seek time, latency, data transfer time •  Disk access is page (block) oriented • 2 - 4 KB page size

  3. Access time • Access time is the time to randomly access a page • System initially determines if page in memory buffer (page tables, etc.) • Large disparity between disk access and memory access

  4. Select operation using table scan • If read the entire table for a select – table scan • Improvements to table scan of disk: • Parallel access • Sequential prefetch

  5. Parallel access • Linear search - all data rows read in from disk • I/O parallelism can be used (Raid) • multiple I/O read requests satisfied at the same time • stripe the data across different disks        • Problems with parallelism? • must balance disk arm load to gain maximum parallelism • requires the same total number of random I/O's, but using devices for a shorter time

  6. Sequential prefetch I/O • Retrieve one disk page after another (on same track) – (32 in DB2, varies in Oracle) • Seek time no longer a problem • Must know in advance to read 32 successive pages • Speed up of I/O by a factor of ≈10 (500 I/O's per second vs. 70)

  7. Access time • Seek time –as low as 4 ms server • Latency time –as low as 1 ms or less • Data transfer time – .4-2 ms • Solid state disks up to 100,000 I/Os per sec. – still expensive

  8. Access time for fast I/O RIO            Seq. Prefetch .004             .004                    Seek - disk arm to cylinder .001             .001                    Latency - platter to sector .0005  .016                Data transfer - Page .0055           .021                   1 page vs. 32 pages .176* seconds  .021 seconds 32 pages for both * .0055X32=.176 for 32 pages of RIO vs .021 for 32 pages of Seq. Prefetch

  9. Organizing disk space • How to store data so minimize access time if read the entire table?

  10. Disk allocation • Disk Resource Allocation for Databases (DBA has control) • Goal – contiguous sectors on disk - want data as close together as possible  to minimize seek time • No standard SQL approach, but general way to deal with allocation • Some OS allow specification of size of file and disk device

  11. Types of Files • Heap files (unordered – sequential) • Sorted files (ordered – sort key) • Hash files (hash key, hash function) • B+-trees • Storage Area Networks SAN – ERP (enterprise resource planning) and DW (data warehouses) • Storage devices configured as nodes in network – can attach/detach

  12. Tablespace Tablespace is: • Allocation medium for tables and indexes for ORACLE, DB2, etc. • Can put >1 table in a table space if accessed together • Tablespace corresponds to 1 or more OS files and can span disk devices • Usually relations cannot span disk devices

  13. DB storage structures DB Company Database Table- tspace 1 system space OS files fname1fname2 fname3 Tables Empl Dept Proj Dep EmpIndx Segments data data data data index Extents

  14. Tablespace • ORACLE DB's contain several tablespaces, including one called system - data description +  indexes + user-defined tables • default tablespace given to each user • if multiple tablespaces - better control over load balancing • can take some disk space off-line

  15. Extent • Relation composed of 1 or more extents • Extent - contiguous storage on disk • when data segment or index segment first created, given an initial extent from tablespace 10KB (5 pages) • if need more space given next contiguous extent

  16. DB storage structures DB Company Database Table- tspace 1 system space OS files fname1fname2 fname3 Tables Empl Dept Proj Dep EmpIndx Segments data data data data index Extents

  17. Extent • Can increase the size by a positive % (cannot decrease) • initial n - size of initial extent • next n - size of next • max extents - maximum number of extents • min extents - number of extents initially allocated • pct increase n - % by which next extent grows over previous one

  18. Oracle create tablespace • http://www.adp-gmbh.ch/ora/sql/create_tablespace.html

  19. Create table • Create table statement - can specify tablespace, no. of extents • When initial extent full, new extent allocated • pctfree - determine how much space in a page can be used for inserts of new rows • if pctfree =10%, inserts stop when page is 90% full • Uses another page • pctused – determines when new inserts start again • if fall below certain percentage of total, default pctused = 40%                  pctfree + pctused < 100

  20. Rows • Row layout on each disk page 1 2 3… N Row N Row N-1 … Row 1 Header info Row directory free space data rows • Header - • Row directory – row number and page byte offset • Row number is row number in page – also called slot# • Page byte offset – with varchar, row size not constant • To identify a particular row use RID (RowID) – page #, slot # [file#] slot# is number in row directory (logical #)

  21. Differences in DBMSs re: rows • ROWID can be retrieved in ORACLE but not DB2 (violates relational model rule) • ORACLE • rows can be split between pages (row record fragmentation) • Can have rows from multiple tables on same page, more info • DB2, no splitting, entire row moved to new page, need forwarding pointer

  22. Select operation using Indexes • Alternative to table scan

  23.    Why use an index?  • If use a select (or join) on the same attribute frequently • want a way to improve performance - use indexes • For example: Select from Employee where ssn = 333445555

  24. B+-tree • Most commonly used index structure type in DBs today • Based on B-tree • Good for equality and range searches • B+ tree:dynamic, adjusts gracefully under inserts and deletes. • Used to minimize disk I/O • available in DB2, ORACLE also has hash cluster, Ingres has heap structure, B-tree, isam (chain together new nodes)

  25. Index Entries (Direct search) Data Entries ("Sequence set") Structure of B+ Trees • leaf level pointers to data (RIDs) • the remaining are directory (index) nodes that point to other index nodes Fig.

  26. Example of B+Tree 10 20 40 1 2 3 10 12 20 35 40 42 50 Points to data

  27. Characteristics of B+ Tree • Order of tree (fan out) – max number of child nodes • Minimum 50% occupancy (except for root). Each node contains d/2 <= m <= d-1 entries. • Where the parameter d is the order of the tree. • Insert/delete at log F N cost; keep tree height-balanced.(F = fanout, N = # leaf pages) • Supports equality and range-searches efficiently

  28. Cost of I/O for B+-tree • One index node is one page • If tree with depth of 3, 3 I/Os to get pointer to data • Read in index node can remain in memory • likely since frequent access to upper -level nodes of actively used B+-trees

  29. B+ Trees in Practice • Typical order: between 100-200 children • Typical fill-factor: 2/3 full (66.6%) • average fanout = 133 (if 200 children) • Typical capacities: • Height 4: 1334 = 312,900,700 records • Height 3: 1333 = 2,352,637 records • Can often hold top levels in buffer pool: • Level 1 = 1 page = 8 Kbytes • Level 2 = 133 pages = 1 Mbyte • Level 3 = 17,689 pages = 133 MBytes

  30. Why B+-tree • Directory structure - retrieve range of values efficiently • search for leftmost index entry Si such that X <= Si • Index entries always in sequence by value - can use sequential prefetch on index • Index entries shorter than data rows - less I/O

  31. B+-tree • Balancing of B+-trees - insert, delete • Nodes usually not full • Utilities to reorganize to lower disk I/O • Most systems allow nodes to become depopulated- no automatic algorithm to balance • Average node below root level 71% full in active growing B+-trees

  32. Duplicate key values • Duplicate key values in index • leaf nodes have sibling pointers • but a delete of a row that has a heavily duplicated key entails a long search through the leaf-level of the B+-tree • Index compression - with multiple duplicates | header info | PrX keyval RID RID ... RID | PrX keyval RID…RID| where PrX is count of RID values

  33. Create Index    Options:         multiple columns         tablespace         storage - initial extents, etc.         percent free default = 10 % of each page left unfilled (creation) free page (1 free page for every n index pages during creation)

  34. Types of indexes (textbook) • Primary index - key field is a candidate key (must be unique) – data file ordered by key field • Clustering index - key field is not unique, data file is ordered – all records with same values on same pages • Secondary index - non-clustering index – data file not ordered • First record in the data page (or block) is called the anchor record • Non-dense index - pointer in index entry points to anchor • Dense index - pointer to every record in the file

  35. Clustering • Efficiency advantage        read in a page, get all of the rows with the same value • clustering is useful for range queries         e.g.  between keyval1 and keyval2

  36. Clustering • Can only cluster table by 1 clustering index at a time • In SQL server • creates clustered index on PK automatically if no other clustered index on table and PK nonclustered index not specified • In DB2 – • if the table is empty, rows sorted as placed on disk • subsequent insertions not clustered, must use REORG • In Oracle- • Cluster index – now available for PK in 10g • Define a cluster to create cluster index for 2 tables

  37. Please help me to remember to TURN OFF THE PROJECTOR!!

  38. Indexes vs. table scan • To illustrate the difference between table scan, secondary index (non clustered) and clustered index Assume 10 M customers, 200 cities 2KB/page, row = 100 bytes, 20 rows/page             Select *             From Customers             Where city = Birmingham 1/200 * 10M if assume selectivity = 1/200 50,000 customers in a city

  39. Rules of Thumb for I/O • Assume slightly slower times than before: • Random I/O – 160 pages/second, .00625 • Sequential prefetch I/O – 1600 pages/second, .000625 Will discuss later: • List prefetch I/O – 400 pages/second, .0025

  40. Table Scan Table Scan - read entire table If used an random I/O (RIO) – WHICH ONE WOULD NEVER DO 10,000,000/20 = 500,000 pages  500,000*RIO = 3125 Instead, it makes more sense to use: sequential prefetch (SP) read 32 pages at a time 500,000*SP = 312

  41. Clustering Index Clustering Index – • All entries for B'ham clustered on same pages • 50,000/20 = 2500 data pages (with 20 rows per page) • Assume 3 upper nodes of the tree   • Assume 1000 index entries per leaf node, read 50000/1000 = 50 index pages 3 + 50000/1000 + 50,000/20 = number of pages to access • If top 3 levels of tree in memory, count access time as 0 • Access time: (3*0) + (50*SP) + (2500*SP) = 2,550 * .000625 = 1.6

  42. Secondary Index • In the worst case 1 entry for B'ham per page • 50,000 data pages pages (10M/200) 3 + 50 + 50,000 = 50, 053 number of accesses (3*0)+(50*SP) + (50,000*RIO)=312.5 access time REALLY slow – see next slide for a better solution! Use List Prefetch instead of RIO

  43. List Prefetch – Better solution Create list of data pages to access Pages not necessarily in contiguous sequential order System orders pages to minimize disk I/O E.g. elevator algorithm for disk request scheduling Using list prefetch (LP) 0+(50*SP)+50,000*LP=125.03 access time

  44. % Free • Redo the previous calculations assuming relations created with 50% free option specified.

  45. Creating Indexes • When determining what indexes to create consider: • workload - mix of queries and frequencies of requests • 20% of requests are updates, etc. • can create lots of indexes but: • cost to create • insertions • initial load time high if a large table • index entries can become longer and longer as multiple columns included

  46. Multiple Indexes • More than one index on a relation • e.g. age – one index, class - one index, gender - one index

  47. Composite Index • One index based on more than one attribute Create Index index_name on Table (col1, col2,... coln) •    Composite index entry - values for each attribute             age, class, gender             entry in index is:  C1, C2, C3, RID

  48. Using Indexes • System must decide if to use index • What if more than one index, which one? • What if composite index?

More Related