1 / 12

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Databases and Information Systems 1 Part 3: Storage Structures and Indices. Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010. Contents: database buffer storage structures indices. Database buffer - why?.

ila-farrell
Download Presentation

Databases and Information Systems 1 Part 3: Storage Structures and Indices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databases and Information Systems 1Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 • Contents: • database buffer • storage structures • indices

  2. Database buffer - why? unsafe data safe data database load database in main memory on disk store DB in main memory  organized in pages disk read or write = 100.000-1.000.000 main memory accesses  optimize / reduce disk access  changes of tuples are collected in pages + page is stored on disk from time to time

  3. Database buffer - own management Log unsafe data safe data database load database in main memory on disk log and store • DBMS has own main memory management because of: • controlled page replacement (needed for recoverability) • page replacement optimized for query optimization •  may store uncommitted data on disk •  recovery uses additional Log • to be able to restore consistent data

  4. Coupling database buffer and application program database load database in main memory on disk store Communication area application programin main memory data exchange between application program and main memory part of DB Application program must not directly read or write to database in main memory  Communication area can be used to write to database in main memory

  5. Storage structures and indices secondary index primary index database

  6. Storage Structures - B-trees p = maximum number of sub-tree pointers q = given number of pointers, qp q-1 = number of keys unique tree depth at least p/2 pointers per node (except root and leaf nodes) Tree Ptr 1 Key 1 Data Ptr 1 Tree Ptr 2 Key q-1 Data Ptr q-1 Tree Ptr q Tree Ptr 1 Key 1 Data Ptr 1 Tree Ptr 2 Key q-1 Data Ptr q-1 Tree Ptr q B-Tree 1 B-Tree 2 B-Tree q B-Tree 1 B-Tree 2 B-Tree q Data Data Data Data key in sub-B-tree K < key K  key in sub-B-Tree K+1 key in sub-B-tree K < key K  key in sub-B-Tree K+1

  7. Example of B-trees • 64 bit–addresses for data, i.e. 8 Byte for each pointer • 4K = 4096 Byte per disk block (and main memory page) • 4 Byte for Integer-key  at most 2^32 values • ==> pages can store (p-1) triples of • (pointer to left sub-tree , key , pointer to data record) • plus pointer to right-most sub-tree • ==> fan-out of the tree is (4096-8) div ( 8+4+8 ) + 1 = 205 • each page can address at most 204 data records • B-tree depth items addressable at least / at most • 1 0 204 • 2 2*102 205*204  4*104 • 3 2*102*102 205*205*204  8*106 • 4 2*102*102*102 205*205*205*204  1.7*109

  8. Improvement: B+-trees - inner nodes p = maximum number of pointers q = given number of pointers, qp q-1 = number of keys unique tree depth at least p/2 pointers per node (except root and leaf nodes) Tree Ptr 1 Key 1 Data Ptr 1 Tree Ptr 2 Key q-1 Data Ptr q-1 Tree Ptr q B+-Tree 1 B+-Tree 2 B+-Tree q Data Data Key in Sub-B+-Tree K < Key K  Key in Sub-B+-Tree K+1

  9. Leaf nodes of B+-Trees • contain pointers to the data • do not contain any pointer to a sub-tree • contain a pointer to the next leaf node Tree Ptr 1 Key 1 Data Ptr 1 Tree Ptr 2 Key q-1 Data Ptr q-1 Leaf- Ptr q Data Data keys correspond with keys in the data.

  10. Root nodes of B+-trees p = maximum number of pointers minimum number of pointers: 2 (except for trivial case of less than 2 data records, where root node is leaf node) trivial case is ignored here

  11. Example of B+-trees • 64 bit–addresses for data, i.e. 8 Byte for each pointer • 4K = 4096 Byte per disk block (and main memory page) • 4 Byte for Integer-key  at most 2^32 values • ==> leaf pages can store • one pointer to next leaf page (8 Byte) and • 4096 div ( 4+8 ) = 340 (key, data pointer) - pairs • ==> fan-out 340 • internal pages can store • one pointer to last sub-tree (8 Byte) and • 4096 div ( 4+8 ) = 340 (Key, data pointer) - pairs • ==> fan-out 341 • B+-tree depth items addressable at least / at most • 1 0 340 • 2 2*170 341*340  105 • 3 2*171*170 341*341*340  4* 107 • 4 2*171*171*170 341*341*341*340  1010

  12. Storage Structures - Hashing hash-function h : key  bucket (=data container) insert: full ?  overflow container search: also in  overflow container h(key) overflow container

More Related