250 likes | 346 Views
CS4432: Database Systems II. Course Introduction Mohamed Eltabakh. What Is a Relational Database Management System ?. Database Management System = DBMS Relational DBMS = RDBMS A collection of files that store the data But: Files that we do not directly access or read
E N D
CS4432: Database Systems II Course Introduction Mohamed Eltabakh
What Is a Relational Database Management System ? Database Management System = DBMS Relational DBMS = RDBMS • A collection of files that store the data • But: Files that we do not directly access or read • A big C program written by someone else that accesses and updates those files for you • But: Huge program containing 100s of 1000s of lines
Where are RDBMS used ? • Backend for traditional “database” applications • Backend for large Websites • Backend for Web services Airline reservation University registration Bank accounts and transactions Movie Database Hospital System
Example of a Traditional Database Application Suppose we are building a system to store the information about: • students • courses • professors • who takes what, who teaches what University registration
Can we do it without a DBMS ? Sure we can! Start by storing the data in files: students.txtcourses.txtprofessors.txt Now write C or Java programs to implement specific tasks File System Approach
Doing it without a DBMS... • Enroll “Mary Johnson” in “CSE444”: Write a C program to do the following: • Read ‘students.txt’ • Read ‘courses.txt’ • Find&update the record “Mary Johnson” • Find&update the record “CSE444” • Write “students.txt” • Write “courses.txt”
What Can Go Wrong Several drawbacks of using file systems • Data redundancy and inconsistency • Multiple file formats, duplication of information in different files • Multiple records formats within the same file • No order enforced between fields • Difficulty in accessing data • Need to write a new program to carry out each new task • No indexes, always scan the entire file • Integrity problems • Modify one file (or field in a file), and not changing the dependent fields or files • Integrity constraints (e.g., account balance > 0) become “buried” in program code rather than being stated explicitly
What Can Go Wrong • Concurrent access by multiple users • Many users need to access/update the data at the same time (concurrent access) • Uncontrolled concurrent access can lead to inconsistencies • Example: Two people are updating the same bank account at the same time • Security problems • Hard to provide user access to some, but not all, data • Recovery from crashes • While updating the data the system crashes • Maintenance problems • Hard to search for or update a field • Hard to add new fields
Connection (ODBC, JDBC) Enters a DMBS “Two tier database system” Applications Direct SQL Database server(someone else’sC program) Data files
Functionality of a DBMS The programmer sees SQL, which has two components: • Data Definition Language - DDL • Data Manipulation Language – DML Behind the scene the DBMS has: • Query Optimizer • Query Engine • Storage Management • Transaction Management (concurrency, recovery) Frontend (CS3431) Backend (CS4432)
How the Programmer Sees the DBMS Frontend • Start with DDL to create tables: • Continue with DML to populate tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) . . . INSERT INTO Students VALUES(‘Charles’, ‘123456789’, ‘undergraduate’) . . . .
How the Programmer Sees the DBMS Frontend • Tables: Students: Takes: Courses: “data independence” = separate logical view from physical implementation
What is Hidden ??? Backend CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) . . . Creating file (Data) Updating catalog tables May create indexes Reading and updating file from disk INSERT INTO Students VALUES(‘Charles’, ‘123456789’, ‘undergraduate’) . . . . Check constraints May update indexes
Queries Frontend • Find all courses that “Mary” takes • We did not specify how to execute • We did not specify how to optimize SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid Should be happy not to do it…
Cname sid=sid cid=cid name=“Mary” Courses Takes Students What is Hidden ??? Backend Imperative query execution plan: Declarative SQL query SELECT C.name FROMStudents S, Takes T, Courses C WHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid Complex cost model & many alternatives The optimizer chooses the best execution plan for a query
Transactions Frontend • Enroll “Mary Johnson” in “CSE444”: BEGIN TRANSACTION; INSERT INTO Takes SELECT Students.SSN, Courses.CID FROM Students, Courses WHERE Students.name = ‘Mary Johnson’ and Courses.name = ‘CSE444’ -- More updates here.... IF everything-went-OK THEN COMMIT; ELSE ROLLBACK If system crashes, the transaction is still either committed or aborted
Transactions • A transaction = sequence of statements that either all succeed, or all fail • Basic unit of processing • Transactions have the ACID properties: A = atomicity C = consistency I = independence (Isolation) D = durability
Transaction ACID Properties • Each transaction has a Start and End and does many things in between • “A” Atomic: Either the entire transaction is done (all its actions) or none. • “C” Consistency: A transaction must move the DB from one consistent state to another consistent state T1 T2 T3 T4
Transaction ACID Properties (Cont’d) • What about interaction • Can T2 read what T1 is writing? • Can T3 read what T1 is reading? • Can T4 read what T1 wrote? • “I” Isolation: Although running concurrently, they should appear as if they run is a certain serial order T1 T2 T3 T4
Transaction ACID Properties (Cont’d) • If T1 failed & T2 completed This means what? • T1 Rolledback & T2 Committed • “D” Durability: The effect of a committed transaction must be persistent (not lost) T1 T2 T3 T4
Transactions Backend Many transactions at the same time (Concurrency Control) Ensure the ACID properties Logging and Recovery Control
DBMS Backend Components • We will cover several of these components Chapter 1 in textbook
Topics To Be Covered… • File & System Structure Records in blocks, dictionary, buffer management,… • Indexing & Hashing B-Trees, hashing,… • Query Processing Query costs, join strategies,… • Crash Recovery Failures, stable storage,… • Concurrency Control Correctness, locks,… • Transaction Processing Logs, deadlocks,…
Database Material at WPI B, C terms CS 3431 you are here D term (alternate) CS 4432 CS 542 CS 561 CS525 MQP DSRG Grad. DB Advanced DB Selected Topics Selected DB Project DB Research at WPI Varies Any time year round DON’T TAKE!
Database Systems • The big commercial database vendors: • Oracle • IBM (with DB2) bought Informix recently • Microsoft (SQL Server) • Sybase • Some free database systems (Unix) : • Postgres • Mysql • Predator • In CS4432 we use Oracle & SimpleDB!