CS4432: Database Systems II

CS4432: Database Systems II

Index definition in SQL • Createindex name on rel (attr) (Check online for index definitions in SQL) • Drop INDEX name

Note ATTRIBUTE LIST MULTIKEY INDEX e.g., CREATEINDEX foo ON R(A,B,C)

Multi-key Index Motivation: Find records where DEPT = “Toy” AND SAL > 50k

Strategy I: • Use one index, say Dept. • Get all Dept = “Toy” records and check their salary I1

Strategy II: • Use 2 Indexes; Manipulate Pointers Toy Sal > 50k

Strategy III: • Multiple Key Index One idea: I2 I3 I1

Art Sales Toy Example 10k 15k Example Record Dept Index Salary Index 17k 21k Name=Joe DEPT=Sales SAL=15k 12k 15k 15k 19k

For which queries is this index good? Find RECs Dept = “Sales” SAL=20k Find RECs Dept = “Sales” SAL > 20k Find RECs Dept = “Sales” Find RECs SAL = 20k

Many alternate methods for indexing

Hashing key  h(key) <key> Buckets (typically 1 disk block) . . .

One example hash function • Key = ‘x1 x2 … xn’ n-byte character string • Have b buckets • Hash function : • h: add (x1 + x2 + ….. Xn) modulo b

 This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash  Expected number of function: keys/bucket is the same for all buckets

Within a bucket: • Do we keep keys sorted? • Yes, if CPU time critical • & Inserts/Deletes not too frequent

Next: example to illustrate inserts, overflows, deletes h(K)

d a e c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1

d maybe move “g” up EXAMPLE: deletion Delete:ef 0 1 2 3 a b d c c e f g

If < 50%, wasting space • If > 80%, overflows significant depends on how good hash function is & on # keys/bucket Rule of thumb: • Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit

Extensible hashing • Others … How do we cope with growth? • Overflows and reorganizations • Dynamic hashing

Extensible hashing : idea 1 (a) Use i of b bits output by hash function b h(K)  use i grows over time…. Note: enables future doubling of space ! 00110101

Extensible hashing : idea 2 (b) Hash to directory of pointers to buckets (instead of buckets directly) h(K)[i ] to bucket Note : Double space by doubling the directory ! . . . . . .

i = 2 00 01 10 11 1 1 2 1010 New directory 2 1100 Example: h(k) is 4 bits; 2 keys/bucket 1 0001 i = 1 0 1 1001 1100 Insert 1010

2 0000 0001 2 0111 2 2 Example continued i = 2 00 01 10 11 1 0001 0111 1001 1010 Insert: 0111 0000 1100

i = 3 000 001 010 011 100 101 110 111 3 1001 1001 2 1001 1010 3 1010 1100 2 Example continued 0000 2 0001 i = 2 00 01 10 11 0111 2 Insert: 1001

Extensible hashing: deletion • Merge blocks and cut directory if possible (Reverse insert procedure)

Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - - Summary Extensible hashing + If directory fits into main memory, then access cost is 1 IO, otherwise 2 IOs Can handle growing files - with less wasted space - with no full reorganizations +

Use what when : • Indexing : Tree-Structures vs Hashing

Indexing vs Hashing • Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5

Indexing vs Hashing • INDEXING (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5

Reading Chapter 14 • Read • 14.3.1 and 14.3.2

The BIG picture…. • Chapters 11 & 12: Storage, records, blocks... • Chapter 13 & 14: Access Mechanisms - Indexes - B trees - Hashing - Multi key • Chapter 15 & 16: Query Processing NEXT

CS4432: Database Systems II

CS4432: Database Systems II

Presentation Transcript

DATABASE CONCEPTS

Database Operations on GPU

Compiler Concepts for Database Systems

Database Systems: Design, Implementation, and Management Ninth Edition

Mobile Database Systems

Database Security and Authorization

Object oriented Database

DB Programming

Chapter 6 DATABASES, DATA WAREHOUSES AND OLAP

Database Access using SQL

Database Management Systems

INFORMATION SYSTEMS IN BIOLOGY: FOCUS ON DATABASE

Progress Database Admin

Microsoft Access

INFO2120 – INFO2820 – COMP5138 Database Systems

Database Theory

Oracle Database Systems

Chapter 2- From Systems to Descriptive Models

Oracle Database 12c Release 1 (12.1.0.2 )

ICOM 5016 – Introduction to Database Systems