1 / 50

CPS216: Advanced Database Systems

Learn about optimizing data access operators to enhance efficiency & performance in database systems. Topics include single-attribute indexes, B-trees, hash-based indexes, and more.

dbrunelle
Download Presentation

CPS216: Advanced Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu

  2. Problem • Relation: Employee (ID, Name, Dept, …) • 10 M tuples • (Filter) Query:SELECT * FROM Employee WHERE Name = “Bob”

  3. Solution #1: Full Table Scan • Storage: • Employee relation stored in contiguous blocks • Query plan: • Scan the entire relation, output tuples with Name = “Bob” • Cost: • Size of each record = 100 bytes • Size of relation = 10 M x 100 = 1 GB • Time @ 20 MB/s ≈ 1 Minute

  4. Solution #2 • Storage: • Employee relation sorted on Name attribute • Query plan: • Binary search

  5. Solution #2 • Cost: • Size of a block: 1024 bytes • Number of records per block: 1024 / 100 = 10 • Total number of blocks: 10 M / 10 = 1 M • Blocks accessed by binary search: 20 • Total time: 20 ms x 20 = 400 ms

  6. Solution #2: Issues • Filters on different attributes: SELECT * FROM Employee WHERE Dept = “Sales” • Inserts and Deletes

  7. Indexes • Data structures that efficiently evaluate a class of filter predicates over a relation • Class of filter predicates: • Single or multi-attributes (index-key attributes) • Range and/or equality predicates • (Usually) independent of physical storage of relation: • Multiple indexes per relation

  8. Indexes • Disk resident • Large to fit in memory • Persistent • Updated when indexed relation updated • Relation updates costlier • Query cheaper

  9. Problem • Relation: Employee (ID, Name, Dept, …) • (Filter) Query:SELECT * FROM Employee WHERE Name = “Bob” Single-Attribute Index on Name that supports equality predicates

  10. Roadmap • Motivation • Single-Attribute Indexes: Overview • Order-based Indexes • B-Trees • Hash-based Indexes • Extensible Hashing • Linear Hashing • Multi-Attribute Indexes (Chapter 14 GMUW, May cover in future)

  11. a b 1 1 a b i i a b 2 2 a b n n Single Attribute Index: General Construction A B

  12. a a b 1 1 1 a a b i i i a a b 2 2 2 a a b n n n Single Attribute Index: General Construction A B A = val A > lowA < high

  13. Exceptions • Sparse Indexes • Require specific physical layout of relation • Example: Relation sorted on indexed attribute • More efficient

  14. a a b 1 1 1 a a b i i i a a b 2 2 2 a a b n n n Single Attribute Index: General Construction Textbook: Dense Index A B A = val A > lowA < high

  15. a 1 a i a 2 a n Single Attribute Index: General Construction How do we organize(attribute, pointer) pairs? A = val Idea: Use dictionary data structures A > lowA < high Issue: Disk resident?

  16. Roadmap • Motivation • Single-Attribute Indexes: Overview • Order-based Indexes • B-Trees • Hash-based Indexes • Extensible Hashing • Linear Hashing • Multi-Attribute Indexes (Next class)

  17. B-Trees • Adaptation of search tree data structure • 2-3 trees • Supports range predicates (and equality)

  18. 71 32 83 16 54 92 74 16 32 54 71 74 83 92 Use Binary Search Tree Directly?

  19. Use Binary Search Tree Directly? • Store records of type <key, left-ptr, right-ptr, data-ptr> • Remember position of root • Question: will this work? • Yes • But we can do better!

  20. Use Binary Search Tree Directly? • Number of keys: 1 M • Number of levels: log (2^20) = 20 • Total cost index lookup: 20 random disk I/O • 20 x 20 ms = 400 ms B-Tree: less than 3 random disk I/O

  21. B-Tree vs. Binary Search Tree k1 k2 k3 k40 k 1 Random I/O prunes tree by 40 1 Random I/O prunes tree by half

  22. 15 36 57 63 76 87 92 100 B-Tree Example

  23. B-Tree Example 63 36 84 91 15 36 57 63 92 100 76 87 null

  24. Meaning of Internal Node 84 91 key < 84 91 ≤ key 84 ≤ key < 91

  25. B-Tree Example 63 36 84 91 15 36 57 63 92 100 76 87 null

  26. Meaning of Leaf Nodes 63 76 Next leaf pointer to record 63 pointer to record 76

  27. Equality Predicates key = 87 63 36 84 91 15 36 57 63 92 100 76 87 null

  28. Equality Predicates key = 87 63 36 84 91 15 36 57 63 92 100 76 87 null

  29. Equality Predicates key = 87 63 36 84 91 15 36 57 63 92 100 76 87 null

  30. Equality Predicates key = 87 63 36 84 91 15 36 57 63 92 100 76 87 null

  31. Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 92 100 76 87 null

  32. Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 92 100 76 87 null

  33. Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 92 100 76 87 null

  34. Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 92 100 76 87 null

  35. Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 92 100 76 87 null

  36. Range Predicates 57 ≤ key < 95 63 36 84 91 15 36 57 63 92 100 76 87 null

  37. General B-Trees • Fixed parameter: n • Number of keys: n • Number of pointers: n + 1

  38. B-Tree Example n = 2 63 36 84 91 15 36 57 63 92 100 76 87 null

  39. General B-Trees • Fixed parameter: n • Number of keys: n • Number of pointers: n + 1 • All leaves at same depth • All (key, record pointer) in leaves

  40. B-Tree Example n = 2 63 36 84 91 15 36 57 63 92 100 76 87 null

  41. General B-Trees: Space related constraints • Use at leastRoot: 2 pointersInternal: (n+1)/2 pointersLeaf: (n+1)/2 pointers to data

  42. n=3 Min Max 5 15 21 15 Internal 42 56 31 31 42 Leaf

  43. Leaf Nodes n key slots (n+1) pointer slots

  44. k k 3 1 Leaf Nodes unused k k n key slots … … 2 m (n+1) pointer slots … … … nextleaf record of k 1 record of k record of k m 2

  45. (n+1)   2 k k 3 1 Leaf Nodes m ≥ unused k k n key slots … … 2 m (n+1) pointer slots … … … nextleaf record of k 1 record of k record of k m 2

  46. Internal Nodes n key slots (n+1) pointer slots

  47. k k 3 1 Internal Nodes unused k k n key slots 2 m (n+1) pointer slots key < k 1 k ≤ key < k k ≤ key 1 2 m

  48. (n+1)  2 k k 3 1 Internal Nodes (m+1) ≥ unused k k n key slots 2 m (n+1) pointer slots key < k 1 k ≤ key < k k ≤ key 1 2 m

  49. k k 3 1 Root Node 2 (m+1) ≥ unused k k n key slots 2 m (n+1) pointer slots key < k 1 k ≤ key < k k ≤ key 1 2 m

  50. Limits • Why the specific limits(n+1)/2 and (n+1)/2 ? • Why different limits for leaf and internal nodes? • Can we reduce each limit? • Can we increase each limit? • What are the implications?

More Related