420 likes | 548 Views
C20.0046: Database Management Systems Lecture #26. Matthew P. Johnson Stern School of Business, NYU Spring, 2004. Agenda. Previously: Indices Next: Finish Indices, advanced indices Failure/recovery Data warehousing & mining Websearch Hw3 due today no extensions! 1-minute responses
E N D
C20.0046: Database Management SystemsLecture #26 Matthew P. Johnson Stern School of Business, NYU Spring, 2004 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Agenda • Previously: Indices • Next: • Finish Indices, advanced indices • Failure/recovery • Data warehousing & mining • Websearch • Hw3 due today • no extensions! • 1-minute responses • Review: clustered, dense, primary, #/tbl, syntax M.P. Johnson, DBMS, Stern/NYU, Sp2004
Let’s get physical Query update User/ Application Query compiler/optimizer Query execution plan Transaction commands Record, index requests Execution engine Index/record mgr. • Transaction manager: • Concurrency control • Logging/recovery Page commands Buffer manager Read/write pages Storage manager storage M.P. Johnson, DBMS, Stern/NYU, Sp2004
BSTs • Very simple data structure in CS: BSTs • Binary Search Trees • Keep balanced • Each node ~ one item • Each node has two children: • Left subtree: < • Right subtree: >= • Can search, insert, delete in log time • log2(1MB = 220) = 20 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Search for DBMS • Big improvement: log2(1MB) = 20 • Each op divides remaining range in half! • But recall: all that matters is #disk accesses • 20 is better than 220 but: Can we do better? M.P. Johnson, DBMS, Stern/NYU, Sp2004
BSTs B-trees • Like BSTs except each node ~ one block • Branching factor is >> 2 • Each access divides remaining range by, say, 300 • B-trees = BSTs + blocks • B+ trees are a variant of B-trees • Data stored only in leaves • Leaves form a (sorted) linked list • Better supports range queries • Consequences: • Much shorter depth Many fewer disk reads • Must find element within node • Trades CPU/RAM time for disk time M.P. Johnson, DBMS, Stern/NYU, Sp2004
B+ Trees • Parameter n branching factor is n+1 • Largest number s.t. one block can contain n search-key values and n+1 pointers • Each node (except root) has at least n/2 keys Keys k < 30 Keys 120<=k<240 Keys 240<=k Keys 30<=k<120 Next leaf 40 50 60 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Searching a B+ Tree Select name From people Where age = 25 • Exact key values: • Start at the root • If we’re in leaf, walk through its key values; • If not, look at keys K1..Kn • If Ki <= K <= Ki+1, look in child i • Range queries: • As above • Then walk left until test fails Select name From people Where 20 <= age and age <= 30 M.P. Johnson, DBMS, Stern/NYU, Sp2004
B+ Tree Example Find the key 40 n = 4 40 80 20 < 40 60 30 < 40 40 10 15 18 20 30 40 50 60 65 80 85 90 NB: Leaf keys are sorted; data pointed to is only if clustered M.P. Johnson, DBMS, Stern/NYU, Sp2004
Clustered & unclustered B-trees Data entries Dataentries (Index File) (Data file) DataRecords Data Records CLUSTERED UNCLUSTERED
B+ trees, and, or • Assume index on a,b,c • Intuition: phone book • WHERE a = ‘x’ and b = ‘y’ • WHERE b = ‘y’ and c = ‘z’ • WHERE a = ‘a’ and c = ‘z’ • WHERE a = ‘x’ or b = ‘y’ or c = ‘z’ M.P. Johnson, DBMS, Stern/NYU, Sp2004
B+ trees and LIKE • Supports only hard-coded prefix LIKE checks • Intuition: phone book • Select * from T where a like ‘xyz%’ • Select * from T where a like ‘%xyz’ • Select * from T where a like ‘xyz%zyx%’ M.P. Johnson, DBMS, Stern/NYU, Sp2004
B-tree search efficiency • With params: • block=4k • integer = 4b, • pointer = 8b • the largest n satisfying 4n+8(n+1) <= 4096 is n=340 • Each node has 170..340 keys • assume on avg has (170+340)/2=255 • Then: • 255 rows depth = 1 • 2552 = 64k rows depth = 2 • 2553 = 16M rows depth = 3 • 2554 = 4G rows depth = 4 M.P. Johnson, DBMS, Stern/NYU, Sp2004
B-trees in practice • Most DBMSs use B-trees for most indices • Default in MySQL • Default in Oracle • Speeds up • where clauses • Some like checks • Min or max functions • joins • Limitation: fields used must • Be a prefix of indexed fields • Be ANDed together M.P. Johnson, DBMS, Stern/NYU, Sp2004
Next topic: Advanced types of indices • Spatial indices based on R-trees (R = region) • Support multi-dimensional searches on “geometry” fields • 2-d not 1-d ranges • Oracle: • MySQL: CREATE INDEX geology_rtree_idx ON geology_tab(geometry) INDEXTYPE IS MDSYS.SPATIAL_INDEX; CREATE TABLE geom (g GEOMETRY NOT NULL, SPATIAL INDEX(g)); M.P. Johnson, DBMS, Stern/NYU, Sp2004
Advanced types of indices • Inverted indices for web doc search • First, think of each webpage as a tuple • One column for every possible word • True means the word appears on the page • Index on all columns • Now can search: you’re fired • select * from T where youre=T and fired=T M.P. Johnson, DBMS, Stern/NYU, Sp2004
Advanced types of indices • Can simplify somewhat: • For each field index, delete False entries • True entries for each index become a bucket • Create “inverted index”: • One entry for each search word • Search word entry points to corresponding bucket • Bucket points to pages with its word • Amazon M.P. Johnson, DBMS, Stern/NYU, Sp2004
Advanced types of indices • Function-based indices • Speeds up WHERE upper(name)=‘BUSH’, etc. • Now supported in Oracle 8, not MySQL • Bitmap indices • Speeds up arbitrary combination of reqs • Not limited to prefixes or conjunctions • Now supported in Oracle 9, not MySQL create index on T(my_soundex(name)); create index on T(substr(DOB),4,5)); M.P. Johnson, DBMS, Stern/NYU, Sp2004
Bitmap indices • Assume table has n records • Assume F is a field with m different values • Bitmap index on F: m length-n bitstrings • One bitstring for each value of F • Each one says which rows have that value for F • Example: • n = , mF = , mG = • Q: find rows where F=50 or (F=30 and G=‘Baz’) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Bitmap index search • Larger example: (age,salary) of jewelry buyers: • Bitmaps for age: • 25:100000001000, 30:000000010000, 45:01000000100, 50:001110000010, 60:000000000001, 70:000001000000, 85:000000100000 • Bitmaps for salary: • 60:110000000000, 75:001000000000, 100:000100000000, 110:000001000000, 120:000010000000, 140:000000100000, 260:000000010001, 275:000000000010, 350:000000000100, 400:000000001000 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Bitmap index search • Query: find buyers of age 45-55 with salary 100-200 • Age range: 010000000100 (45) | 001110000010 (50) = 011110000110 • Bitwise or of Salary range: 000111100000 • AND together: 011110000110 & 000111100000 = 000110000000 • What does this mean? M.P. Johnson, DBMS, Stern/NYU, Sp2004
Bitmap index search • Once we have row numbers, then what? • Get rows with those numbers (How?) • Bitmap indices in Oracle: • Best for low-cardinality fields • Boolean, enum, gender • lots of 0s in our bitmaps • Compress: 000000100001 6141 • “run-length encoding” CREATE BITMAP INDEX ON T(F,G); M.P. Johnson, DBMS, Stern/NYU, Sp2004
New topic: Recovery M.P. Johnson, DBMS, Stern/NYU, Sp2004
System Failures • Each transaction has internal state • When system crashes, internal state is lost • Don’t know which parts executed and which didn’t • Remedy: use a log • A file that records each action of each xact • Trail of breadcrumbs M.P. Johnson, DBMS, Stern/NYU, Sp2004
Media Failures • Rule of thumb: Pr(hard drive has head crash within 10 years) = 50% • Simpler rule of thumb: Pr(hard drive has head crash within 1 years) = 10% • Serious problem • Soln: different RAID strategies • RAID: Redundant Arrays of Independent Disks M.P. Johnson, DBMS, Stern/NYU, Sp2004
RAID levels • RAID level 1: each disk gets a mirror • RAID level 4: one disk is xor of all others • Each bit is sum mod 2 of corresponding bits • E.g.: • Disk 1: 11110000 • Disk 2: 10101010 • Disk 3: 00111000 • Disk 4: • How to recover? M.P. Johnson, DBMS, Stern/NYU, Sp2004
Transactions • Transaction: unit of code to be executed atomically • In ad-hoc SQL • one command = one transaction • In embedded SQL • Transaction starts = first SQL command issued • Transaction ends = • COMMIT • ROLLBACK (=abort) • Can turn off/on autocommit M.P. Johnson, DBMS, Stern/NYU, Sp2004
Primitive operations of transactions • Each xact reads/writes rows or blocks: elms • INPUT(X) • read element X to memory buffer • READ(X,t) • copy element X to transaction local variable t • WRITE(X,t) • copy transaction local variable t to element X • OUTPUT(X) • write element X to disk • LOG RECORD M.P. Johnson, DBMS, Stern/NYU, Sp2004
Transaction example • Xact: Transfer $100 from savings to checking • A = A+100; • B = B-100; • READ(A,t); • t := t+100; • WRITE(A,t); • READ(B,t); • t := t-100; • WRITE(B,t) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Transaction example • READ(A,t); t := t+100;WRITE(A,t); READ(B,t); t := t-100;WRITE(B,t) M.P. Johnson, DBMS, Stern/NYU, Sp2004
The log • An append-only file containing log records • Note: multiple transactions run concurrently, log records are interleaved • After a system crash, use log to: • Redo some transaction that didn’t commit • Undo other transactions that didn’t commit • Three kinds of logs: undo, redo, undo/redo • We’ll discuss only Undo M.P. Johnson, DBMS, Stern/NYU, Sp2004
Undo Logging • Log records • <START T> • transaction T has begun • <COMMIT T> • T has committed • <ABORT T> • T has aborted • <T,X,v> • T has updated element X, and its old value was v M.P. Johnson, DBMS, Stern/NYU, Sp2004
Undo-Logging Rules • U1: Changes logged (<T,X,v>) before being written to disk • U2: Commits logged (<COMMIT T>) after being written to disk • Results: • May forget we did whole xact (and so wrongly undo) • Will never forget did partial xact (and so leave) • Log-change, change, log-change, change, Commit, log-commit M.P. Johnson, DBMS, Stern/NYU, Sp2004
Undo-Logging e.g. (inputs omitted) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Recovery with Undo Log • After system’s crash, run recovery manager • Decide for each xact T whether it was completed • Undo all modifications from incomplete xacts, in reverse order (why?) and abort each <START T>….<COMMIT T> yes <START T>….<ABORT T> yes <START T>…………………… no M.P. Johnson, DBMS, Stern/NYU, Sp2004
Recovery with Undo Log • Read log from the end; cases: • <COMMIT T>: mark T as completed • <ABORT T>: mark T as completed • <T,X,v>: • <START T>: ignore if T is not completed then write X=v to disk else ignore M.P. Johnson, DBMS, Stern/NYU, Sp2004
Recovery with Undo Log … … <T2,X2,v2> … … <START T5> <START T4> <T1,X1,v1> <T5,X5,v5> <T4,X4,v4> <COMMIT T5> <T3,X3,v3> <T2,X2,v2> Start: Q: Which updates areundone? Crash! M.P. Johnson, DBMS, Stern/NYU, Sp2004
Recovery with Undo Log • Note: undo commands are idempotent • No harm done if we repeat them • Q: What if system crashes during recovery? • How far back in the log do we go? • Don’t go all the way back to the start • May be very large • Better idea: use checkpointing M.P. Johnson, DBMS, Stern/NYU, Sp2004
Checkpointing • Checkpoint the database periodically • Stop accepting new transactions • Wait until all current xacts complete • Flush log to disk • Write a <CKPT> log record, flush log • Resume accepting new xacts M.P. Johnson, DBMS, Stern/NYU, Sp2004
Undo Recovery with Checkpointing … … <T1,X1,v1> … … (all completed) <CKPT> <START T2> <START T3 <START T5> <START T4> <T4,X4,v4> <T5,X5,v5> <T4,X4,v4> <COMMIT T5> <T3,X3,v3> <T2,X2,v2> other xacts During recovery, can stop at first <CKPT> xacts T2,T3,T4,T5 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Non-quiescent Checkpointing • Problem: database must freeze during checkpoint • Would like to checkpoint while database is operational • Idea: non-quiescent checkpointing • Quiescent: quiet, still, at rest; inactive M.P. Johnson, DBMS, Stern/NYU, Sp2004
Next time • Next: Data warehousing mining! • For next time: reading online • Proj5 due next Thursday • no extensions! • Now: one-minute responses • Relative weight: warehousing, mining, websearch • Data mining techniques • NNs • GAs • kNN • Decision Trees M.P. Johnson, DBMS, Stern/NYU, Sp2004