1 / 46

Foundations of Database Systems

Foundations of Database Systems. Indexing Instructor: Zhijun Wang. DBMS and Applications. 1. Announcements. Quiz#2 will be given next week. DBMS and Applications. 2. Physical Database Design . DBMS and Applications. 3. File structures.

shiri
Download Presentation

Foundations of Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Foundations of Database Systems Indexing Instructor: Zhijun Wang DBMS and Applications 1

  2. Announcements Quiz#2 will be given next week DBMS and Applications 2

  3. Physical Database Design DBMS and Applications 3

  4. File structures Selecting among alternative file structures is one of the most important choices in physical database design. In order to choose intelligently, you must understand characteristics of available file structures. DBMS and Applications 4

  5. Sequential Files Simplest kind of file structure Unordered: insertion order Ordered: key order Simple to maintain Provide good performance for processing large numbers of records DBMS and Applications 5

  6. Unordered Sequential File DBMS and Applications 6

  7. Ordered Sequential File DBMS and Applications 7

  8. Indexes Indexes are special data structures used to improve database performance SQL Server automatically creates an index on all primary and foreign keys Additional indexes may be assigned on other columns that are: Frequently used in WHERE clauses Used for sorting data DBMS and Applications 8

  9. Indexes SQL Server supports two kinds of indexes: Clustered index: the data are stored in the bottom level of the index and in the same order as that index Nonclustered index: the bottom level of an index contains pointers to the data Clustered indexes are faster than nonclustered indexes for updating and retrieval DBMS and Applications 9

  10. Creating an Index: By GUI in Enterprise Manager DBMS and Applications 10

  11. Hash Files Support fast access unique key value Converts a key value into a physical record address Mod function: typical hash function Divisor: large prime number close to the file capacity Physical record number: hash function plus the starting physical record number DBMS and Applications 11

  12. Example: Hash Function Calculations for StdSSN Key Assume the file capacity is 100, 97 is the biggest prime number less than to 100. Physical Record DBMS and Applications 12

  13. Hash File after Insertions DBMS and Applications 13

  14. Linear Probe Collision Handling During an Insert Operation DBMS and Applications 14

  15. Multi-Way Tree (Btrees) Files A popular file structure supported by most DBMSs. Btree provides good performance on both sequential search and key search. DBMS and Applications 15

  16. Properties of Btrees Balanced: all leaf nodes reside on the same level Bushy: the number of branches is large, 50-200 Block-oriented: each node in a tree is a block Dynamic: the shape of a Btree is changed as logic records are inserted and deleted Ubiquitous: the Btree is a widely implemented and used file strcture. DBMS and Applications 16

  17. Structure of a Btree of Height 3 DBMS and Applications 17

  18. Btree Node Containing Keys and Pointers DBMS and Applications 18

  19. Btree Insertion Examples DBMS and Applications 19

  20. Btree Deletion Examples DBMS and Applications 20

  21. B+ Tree: The Most Widely-used Index Index Entries (Direct search) Data Entries ("Sequence set") • Supports equality and range-searches efficiently. • Minimum 50% occupancy (except for root). • Each node contains d <= m <= 2d entries. • The parameter d is called the order of the tree. DBMS and Applications 21

  22. Root 33 24 14 19 39* 22* 24* 27* 38* 3* 5* 19* 20* 29* 33* 34* 2* 7* 14* 16* An Example of a B+ Tree • Search begins at root, and key comparisons direct it to a leaf. • Search for 5*, 15*, all data entries >= 24* ... • Based on the search for 15* in the appropriate leaf, we know it is not in the tree! DBMS and Applications 22

  23. Inserting a Data Entry into a B+ Tree • Find correct leaf L. • Put data entry onto L. • If L has enough space, done! • Else, must splitL (into L and a new node L2) • Redistribute entries evenly, copy upmiddle key. • Insert index entry (into parent of L) pointing to L2. • This can happen recursively • To split index node, redistribute entries evenly, but push upmiddle key. (Contrast with leaf splits.) • Splits “grow” tree; root split increases height. • Tree growth: gets wider or one level taller at top. DBMS and Applications 23

  24. Entry to be inserted in parent node. Entry to be inserted in parent node. (Note that 19 is pushed up and Only appears once in the index. Contrast this with a leaf split.) (Note that 5 is s copied up and 5 continues to appear in the leaf.) 19 5* 3* 7* 2* 8* 24 33 5 14 Inserting 8* into Example B+ Tree • Observe how minimum occupancy is guaranteed in both leaf and index pg splits. • Note difference between copy-upand push-up; be sure you understand the reasons for this. DBMS and Applications 24

  25. Root 19 24 33 5 14 39* 2* 3* 19* 20* 22* 24* 27* 38* 5* 7* 8* 29* 33* 34* 14* 16* Example B+ Tree After Inserting 8* • Notice that root was split, leading to increase in height. • In this example, we can avoid split by re-distributing entries; • however, this is usually not done in practice. DBMS and Applications 25

  26. Deleting a Data Entry from a B+ Tree • Start at root, find leaf L where entry belongs. • Remove the entry. • If L is at least half-full, done! • If L has only d-1 entries, • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). • If re-distribution fails, mergeL and sibling. • If merge occurred, must delete entry (pointing to L or sibling) from parent of L. • Merge could propagate to root, decreasing height. DBMS and Applications 26

  27. Root 19 27 33 5 14 39* 2* 3* 19* 24* 27* 29* 38* 5* 7* 8* 33* 34* 14* 16* Example Tree After (Inserting 8*, Then) Deleting 20* and 22* ... • Deleting 20* is easy. • Deleting 22* is done with re-distribution. • Notice how middle key (27) is copied up. DBMS and Applications 27

  28. ... And Then Deleting 24* Root 5 14 19 33 33 39* 22* 27* 38* 29* 33* 34* 39* 3* 19* 38* 2* 5* 7* 8* 27* 33* 34* 29* 14* 16* • Must merge leaf nodes. • Observe ‘toss’of index entry 27 (on right), and ‘pull down’of index entry 19 (below). DBMS and Applications 28

  29. Example of Non-leaf Re-distribution 2* 3* 39* 5* 7* 8* 38* 17* 18* 20* 21* 22* 27* 29* 33* 34* 14* 16* Root 22 33 17 20 5 14 • Tree is shown below during deletion of 24*. • In contrast to previous example, can re-distribute entry from left child of root to right child. DBMS and Applications 29

  30. After Re-distribution Root 17 22 33 5 13 20 2* 3* 39* 5* 7* 8* 38* 17* 18* 20* 21* 22* 27* 29* 33* 34* 14* 16* • Intuitively, entries are re-distributed by ‘pushingthrough’the splitting entry in the parent node. • It suffices to re-distribute index entry with key 20; we’ve re-distributed 17 as well for illustration. DBMS and Applications 30

  31. Cost of Operations The height of Btree dominates the number of physical record accesses operation. Logarithmic search cost Upper bound of height: log function’ Log base: minimum number of keys in a node The cost to insert a key = [the cost to locate the nearest key] + [the cost to change nodes]. DBMS and Applications 31

  32. B+Tree Provides improved performance on sequential and range searches. In a B+tree, all keys are redundantly stored in the leaf nodes. To ensure that physical records are not replaced, the B+tree variation is usually implemented. DBMS and Applications 32

  33. Index Matching Determining usage of an index for a query Complexity of condition determines match. Single column indexes: =, <, >, <=, >=, IN <list of values>, BETWEEN, IS NULL, LIKE ‘Pattern’ (meta character not the first symbol) Composite indexes: more complex and restrictive rules DBMS and Applications 33

  34. Bitmap Index Can be useful for stable columns with few values Bitmap: String of bits: 0 (no match) or 1 (match) One bit for each row Bitmap index record Column value Bitmap DBMS converts bit position into row identifier. DBMS and Applications 34

  35. Bitmap Index Example Bitmap Index on FacRank Faculty Table DBMS and Applications 35

  36. Bitmap Join Index Bitmap identifies rows of a related table. Represents a precomputed join Can define for a join column or a non-join column Typically used in query dominated environments such as data warehouses DBMS and Applications 36

  37. Summary of File Structures DBMS and Applications 37

  38. Index Selection Most important decision Difficult decision Choice of clustered and nonclustered indexes DBMS and Applications 38

  39. Clustering Index Example DBMS and Applications 39

  40. Nonclustering Index Example DBMS and Applications 40

  41. Inputs and Outputs of Index Selection DBMS and Applications 41

  42. Trade-offs in Index Selection Balance retrieval against update performance Nonclustering index usage: Few rows satisfy the condition in the query Join column usage if a small number of rows result in child table Clustering index usage: Larger number of rows satisfy a condition than for nonclustering index Use in sort merge join algorithm to avoid sorting More expensive to maintain DBMS and Applications 42

  43. Difficulties of Index Selection Application weights are difficult to specify. Distribution of parameter values needed Behavior of the query optimization component must be known. The number of choices is large. Index choices can be interrelated. DBMS and Applications 43

  44. Selection Rules Rule 1: A primary key is a good candidate for a clustering index. Rule 2: To support joins, consider indexes on foreign keys. Rule 3: A column with many values may be a good choice for a non-clustering index if it is used in equality conditions. Rule 4: A column used in highly selective range conditions is a good candidate for a non-clustering index. DBMS and Applications 44

  45. Selection Rules Rule 5: A frequently updated column is not a good index candidate. Rule 6: Volatile tables (lots of insertions and deletions) should not have many indexes. Rule 7: Stable columns with few values are good candidates for bitmap indexes if the columns appear in WHERE conditions. Rule 8: Avoid indexes on combinations of columns. Most optimization components can use multiple indexes on the same table. DBMS and Applications 45

  46. Index Creation To create the indexes, the CREATE INDEX statement can be used. The word following the INDEX keyword is the name of the index. CREATE INDEX is not part of SQL:1999. Example: DBMS and Applications 46

More Related