270 likes | 568 Views
ECE 569 Database System Engineering Fall 2004. Yanyong Zhang: www.ece.rutgers.edu/~yyzhang Course URL: www.ece.rutgers.edu/~yyzhang/fall04. About the instructor (Yanyong Zhang). Yanyong Office: Core 518 Office hours: TBD (Th 1-2:50??) Office number: 5-0608
E N D
ECE 569 Database System EngineeringFall 2004 Yanyong Zhang: www.ece.rutgers.edu/~yyzhang Course URL: www.ece.rutgers.edu/~yyzhang/fall04
About the instructor (Yanyong Zhang) • Yanyong • Office: Core 518 • Office hours: TBD (Th 1-2:50??) • Office number: 5-0608 • Email: yyzhang@ece.rutgers.edu • URL: www.ece.rutgers.edu/~yyzhang • Research interests: • distributed computing • operating systems • sensor networks
Something about the background • What is database? • a very large, integrated collection of data • Query • Transaction • A group of queries which possess the ACID (atomic, consistent, isolated, and durable) property • DBMS (DataBase Management System) • a software package designed to store and manage databases
Overview User programs Database System Application programs / Queries DBMS Software to process queries Software to access stored data Stored database definition Stored database
DBMS Overview • A database management system (DBMS) provides efficient access to large amounts of persistent data • Data models and query languages allow efficient access while hiding complexity from users • Efficient shared access requires concurrency. Transactions provide transparency to this concurrency. Application programs are easier to write. • In many cases the data is valuable. It must be protected from the effects of failure (resiliency) and sabotage (security).
Files vs. DBMS • Applications must stage large datasets between main memory and secondary storage (e.g., buffering, page-oriented access, 32-bit addressing) • Special code for different queries • Must protect data from inconsistency due to multiple concurrent users • Crash recovery • Security and access control
Why DBMS? • Data independence and efficient data access • Reduced application development time • Data integrity and security • Uniform data administration • Concurrent accesses, recovery from crashes
Data Models • A data model is a collection of concepts for describing data • A schema is a description of a particular collection of data, using a given data model. • The relational model of data is the most widely used model today • Main concept: relation, basically a table with rows and columns • Every relation has a schema, which describes the columns, or fields
Levels of Abstractions • Abstraction is used to hide complexity and allow for a separation of concerns (What vs. How). • Many views, single conceptual (logical) schema, and single physical schema • Views describe how users see the data • Conceptual schema defines logical structure • Physical schema describes the files and indexes used. View 1 View 2 View 3 Specialized view of enterprise Subschema definition language Conceptual Schema Complete model of enterprise Data definition language Physical Schema Records, pointers, indices
Example • Sample applications • Admit_patient • Make_diagnosis • Record_vital_signs • In relational data model we can express schema with following tables: • patient (name, address, balance_due, room#) • payments (name, amount, date) • vital_signs (name, pulse, bp, time) • diagnosis (patient_name, disease_name) • disease (disease_name, treatment)
Examples • Physical Level • Specify indices, e.g., CREATE INDEX room_index ON patient(room#); • Specify performance related characteristics of relations • Conceptual Level • Define tables, specifying data types for each attribute. CR CREATE TABLE patient ( name char(30), address char(100), balance_due number(6,2), room# integer, PRIMARY KEY (name));
Examples – cont’d • External Level • Define views for various purposes, e.g., CREATE VIEW doctor-view-diagnosis AS SELECT name, room#, disease_name,treatment FROM patient, diagnosis, diseases WHERE name = patient_name AND diagnosis.disease_name = disease.disease_name;
Data Independence • Applications insulated from how data is structured and stored • Logical data independence: protection from change in logical structure of data • Physical data independence: protection from changes in physical structure of data
Concurrency Control • Concurrent execution of user programs is essential for good DBMS performance • Why?? • Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed • DBMS ensures such problems don’t arise: users can pretend they are using a single-user system
Transaction: An execution of a DB program • Key concept is transaction, which is an atomic sequence of database actions • Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins. • Users can specify some simple integrity constraints on the data, and DBMS will enforce them • DBMS doesn’t understand the semantics of the data • Ensuring that a transaction (run alone) preserves consistency is ultimately the user’s responsibility.
Scheduling concurrent transactions • DBMS ensures that execution of {T1, T2, …, Tn} is equivalent to some serial execution T1’…Tn’. • locking scheme • Two-phase locking
Ensuring atomicity • DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact. • Idea: keep a log (history) of all actions carried out by the DBMS while executing a set of Xacts
Structure of a DBMS • A typical DBMS has a layered architecture • The figure does not show the concurrency control and recovery component • This is one of several possible architectures; each system has its own variations. Query optimization and execution Relational operators Files and access methods Buffer management Disk space management
About the course • What will we focus on? • Relational data model • Transaction processing • DBMS design • What will we not focus on? • OO data model, etc • SQL programming • Goal • Understand DBMS design issues • Develop background for research in database area
What should you’ve know • Data structure and algorithms • Operations system knowledge • C, Unix • Background in data model and query languages recommended
What will you encounter - topics • 1. Relational Data Model (2-4) • 2. DBMS Design / Implementation (5-11) • a) File organization (5-6) • b) Access methods (7-9) • c) Query processing (10-11) • 3. Transaction Processing • a) Transaction Models (12-13) • b) Isolation (14-20) • c) Performance (21-22) • d) B-tree Synchronization (23-24) • e) Recovery (25-29)
What will you encounter - projects • Projects • Develop a client/server relational DBMS • Query processing / Physical data model / Data dictionary • Concurrency control / Recovery • Work in groups of at most 4. • You may choose groups but I must approve. • At least three members of each group should be strong C programmers. • Projects are difficult and time-consuming. • ~10K lines of codes • Use threads and RPC • Code is difficult to debug • Projects are interesting and rewarding.
Grading Policy • 3 Homework assignments (15%) • Project (45%) • Two exams (20% each) • Course URL: www.ece.rutgers.edu/~yyzhang/fall04
Database Literature • Journals • IEEE Transaction on Knowledge and Data Engineering • ACM Transactions on Database Systems • VLDB Journal • Conferences • IEEE Data Engineering Conference • ACM SIGMDO • Very Large Database (VLDB)
Vital Sign Balance Blood Diagnosed Room# Name Time Amount Treatment Name Date Address Pulse Made To From Billed Patient Disease Payment Account Example – medical database • Entities in database, the types and names of their attributes, and relationships between entities.
System Architecture DDL: Data Definition Language DML: Data Manipulation Language