370 likes | 580 Views
Advanced Topics: Indexes & Transactions. Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu. Indexes. Why Indexes. With or without indexes, the query answer should be the same Indexes are needed for efficiency and fast access of data. Without index, we check all 10,000 students.
E N D
Advanced Topics: Indexes & Transactions Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu cs3431
Indexes cs3431
Why Indexes • With or without indexes, the query answer should be the same • Indexes are needed for efficiency and fast access of data Without index, we check all 10,000 students Assume we have 10,000 students SELECT* FROMStudent WHEREsNumber = 76544357; With index, we can reach that student directly cs3431
Direct Access vs. Sequential Access SELECT* FROMStudent WHEREsNumber = 76544357; Without index, we check all 10,000 students (sequential access) With index, we can reach that student directly (direct access) cs3431
What is an Index • A index is an auxiliary file that makes it more efficient to search for a record in the data file • The index is usually specified on one field of the file • Although it could be specified on several fields • The index is stored separately from the base table • Each table may have multiple indexes Can create an index on sNumber Student Can create a second index on sName cs3431
Example: Index on sNumber Student • Index file is always sorted • Index size is much smaller than the table size • Now any query (equality or range) on sNumber can be efficiently answered (Binary search on the index) Index on sNumber
Example: Index on sName Student • Duplicates values have duplicate entries in the index • Now any query (equality or range) on sName can be efficiently answered (Binary search on the index) Index on sName
Creating an Index Create Index <name> On <tablename>(<colNames>); Student DB System knows how to: 1- create the index 2- when and how to use it Create Index sNumberIndexOn Student(sNumber); Create Index sNameIndexOn Student(SName);
Multiple Predicates 1- The best the DBMS can do is using addressIndex ‘320FL’ 2- From those tuples, check sName = ‘Dave’ Student SELECT* FROM Student WHERE address = ‘320FL’ AND sName = ‘Dave’; Create Index addessIndexOn Student(address); cs3431
Multi-Column Indexes • Columns X, Y are frequently queried together (with AND) • Each column has many duplicates • Then, consider creating a multi-column index on X, Y SELECT* FROM Student WHERE address = ‘320FL’ AND sName = ‘Dave’; Directly returns this record only Create Index nameAddOn Student(sName, address);
Using an Index • DBMS automatically figures out which index to use based on the query SELECT* FROMStudent WHEREsNumber = 76544357; Student Automatically uses SNumberIndex Create Index sNumberIndexOn Student(sNumber); Create Index sNameIndexOn Student(SName); cs3431
How Do Indexes Work? cs3431
Types of Indexes • Primary vs. Secondary • Single-Level vs. Multi-Level (Tree Structure) • Clustered vs. Non-Clustered cs3431
Primary vs. Secondary Indexes • Index on the primary key of a relation is called primary index (only one) • Index on any other column is called secondary index (can be many) • In primary index, all values are unique • In secondary indexes, values may have duplicates Student Index on SSN is a Primary Index Index on sNumberis a Secondary Index Index on sNameis a Secondary Index
Single-Level Indexes • Index is one-level sorted list • Given a value v to query • Perform a binary search in the index to find it (Fast) • Follow the link to reach the actual record Student Index on sNumber
Multi-Level Index • Build index on top of the index (can go multiple levels) • When searching for value v: • Find the largest entry ≤ v, and follow its pointer Student 2nd level 1st level cs3431 Index on sNumber
Clustered vs. Non-Clustered Assume there is index X on column C • If the records in the table are stored sorted based on C • X Clustered index • Otherwise, X Non-Clustered index • Primary index is a clustered index Student • Non-Clustered • index • Clustered index
Index Maintenance • Indexes are used in queries • But, need to be maintained when data change • Insert, update, delete • DBMS automatically handles the index maintenance • When insert new records the indexed field is added to the index • When delete records their values are deleted from the index • When update an indexed value delete the old value from index & insert the new value • There is a cost for maintaining an index, however its benefit is usually more (if used a lot) cs3431
Summary of Indexes • Indexes are auxiliary structures for efficient searching and querying • Query answer is the same with or without index • What to index depends on which columns are frequently queried (in Where clause) • Main operations Create Index <name> On <tablename>(<colNames>); Drop Index <name>; cs3431
Transactions cs3431
What is a Transaction • A set of operations on a database that are treated as one unit • Execute Allor None • Transactions have semantics at the application level • Want to reserve two seats in a flight • Transfer money from account A to account B • … • What if two users are reserving the same flight seat at the same time??? Transactions solve these problems
Transactions • By default, each SQL statement is a transaction • Can change the default behavior SQL > Start transaction; SQL > Insert …. SQL > Update … SQL > Delete .. SQL > Select … SQL> Commit | Rollback; All of these statements are now one unit (either all succeed all fail) End transaction successfully Cancel the transaction
Transaction Properties • Four main properties • Atomicity– A transaction if one atomic unit • Consistency– A transaction ensures DB is consistent • Isolation– A transaction is considered as if no other transaction was executing simultaneously • Durability– Changes made by a transaction must persist • ACID: Atomicity, Consistency, Isolation, Durability • ACID properties are enforced by the DBMS cs3431
Consistency Issue • Many users may update the data at the same time • How to ensure the result is consistent 2 1 Update T Set x = x * 3; Update T Set x = x + 2; 3 What is the right answer??? Wrong, Inconsistent data
Serial Order of Transactions • Given N concurrent transactions T1, T2, …TN • Serial order is any permutation of these transactions (N!) • T1, T2, T3, …TN • T2, T3, T1, …, TN • … • DBMS will ensure that the end-result from executing the N transactions (concurrently) matches one of the serial order execution • That is called Serializability • As if transactions are executed in serial order cs3431
Serializable Execution • Given N concurrent transactions T1, T2, …TN • DBMS will execute them concurrently (at the same time) • But, the final effect matches one of the serial order executions Update T Set x = x * 3; Update T Set x = x + 2;
Isolation Levels • Read Uncommitted • Read Committed • Repeatable Read • Serializable Gets stronger & avoids problems That is the default in DBMS cs3431
1- READ UNCOMMITTED Session 2 -------BEGIN TRANSACTION----- select color from cust where id=500; color ------ red select color from cust where id=500; color ----- blue select color from cust where id=500; color ----- blue -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- update cust set color='blue' where id=500; -----------COMMIT------------ | | | | V Time NonRepeatable read (bad) Dirty read (bad)
2- READ COMMITTED Dirty Read Solved Session 2 -------BEGIN TRANSACTION----- select color from cust where id=500; color ------ red select color from cust where id=500; color ----- red select color from cust where id=500; color ----- blue -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- update cust set color='blue' where id=500; -----------COMMIT------------ | | | | V Time NonRepeatable read (bad)
2- READ COMMITTED Session 2 -------BEGIN TRANSACTION----- select color from cust where id=500; color ------ red select color from cust where id=500; color ----- red select color from cust where id=500; color ----- -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- delete cust where id=500; -----------COMMIT------------ | | | | V Time Phantom (bad)
3- REPEATABLE READ NonRepeatable Read Solved Session 2 -------BEGIN TRANSACTION----- select color from cust where id=500; color ------ red select color from cust where id=500; color ----- red select color from cust where id=500; color ----- red -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- update cust set color='blue' where id=500; -----------COMMIT------------ | | | | V Time
3- REPEATABLE READ Phantom (For Delete) Solved Session 2 -------BEGIN TRANSACTION----- select color from cust where id=500; color ------ red select color from cust where id=500; color ----- red select color from cust where id=500; color ----- red -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- delete cust where id=500; -----------COMMIT------------ | | | | V Time
3- REPEATABLE READ Session 2 -------BEGIN TRANSACTION----- select id from cust where color=‘blue’; id -- select id from cust where color=‘blue’; id -- select id from cust where color=‘blue’; id -- 500 -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- Insert into cust(id, color) values (500, ‘blue’); -----------COMMIT------------ | | | | V Time Phantom Insert (bad)
4- SERIALIZABLE Phantom Solved Session 2 -------BEGIN TRANSACTION----- select id from cust where color=‘blue’; id -- select id from cust where color=‘blue’; id -- select id from cust where color=‘blue’; id -- -----------COMMIT------------ Session 1 -------BEGIN TRANSACTION----- Insert into cust(id, color) values (500, ‘blue’); -----------COMMIT------------ | | | | V Time
Summary of Transactions • Unit of work in DBMS • Either executed All or None • Ensures consistency among many concurrent transactions • Ensures persistent data once committed (using recovery techniques) • Main ACID properties • Atomicity, Consistency, Isolation, Durability cs3431
END !!! cs3431
Final Exam • Dec. 13, at 8:15am – 9:30am (75 mins) • Closed book, open sheet • Answer in the same exam sheet • Material Included • ERD • SQL (Select, Insert, Update, Delete) • Views, Triggers, Assertions • Cursors, Stored Procedures/Functions • Material Excluded • Relational Model & Algebra • Normalization Theory • ODBC/JDBC • Indexes and Transactions Friday’s Lecture (Revision + short Quiz)