180 likes | 200 Views
Introduction to Database Systems Chpt 1. Instructor: Weichao Wang. http://www.sigmod.org/record/issues/0606/index.html. History. 60s C. Bachman GE network data model Late 60s IBM IMS hierarchical data model 70 E.Codd relational model
E N D
Introduction to Database SystemsChpt 1 Instructor: Weichao Wang Ramakrishnan & Gehrke
http://www.sigmod.org/record/issues/0606/index.html Ramakrishnan & Gehrke
History 60s C. Bachman GE network data model Late 60s IBM IMS hierarchical data model 70 E.Codd relational model 80s SQL IBM R trasaction J. Gray Late 80s-90s DB2, Oracle, informix, sybase 90s DW, internet, distributed database Now Big Data Turing award and Turing test? Ramakrishnan & Gehrke
What Is a DBMS? • A very large, integrated collection of data. • Models real-world enterprise. • Entities (e.g., students, courses) • Relationships (e.g., Madonna is taking ITCS6160) • A Database Management System (DBMS)is a software package designed to maintain and utilize databases. Ramakrishnan & Gehrke
Why not just OS file systems? Size of the data and size of your memory/harddisk Query processing: remember your file read/write C programs? Now think about several tera-bytes of data. You need a separate program for every query. Consistency: multiple users access the same data Recovery: is it on harddisk now? All these can be implemented directly upon OS. But then you are just designing your own DB and DBMS. Ramakrishnan & Gehrke
Why Use a DBMS? • Data independence and efficient access. • Reduced application development time. • Data integrity and security. • Uniform data administration. (not sure about this now) • Concurrent access, recovery from crashes. Ramakrishnan & Gehrke
Why Study Databases?? • Shift from computation to information (application oriented vs data oriented) • at the “low end”: scramble to webspace • at the “high end”: scientific applications • Datasets increasing in diversity and volume. • Digital libraries, interactive video, Human Genome project, EOS project • ... need for DBMS exploding • DBMS encompasses most of CS • OS, languages, theory, AI, multimedia, logic Ramakrishnan & Gehrke
Data Models • A data modelis a collection of concepts for describing data. • Aschemais a description of a particular collection of data, using the given data model. • The relational model of datais the most widely used model today. • Main concept: relation, basically a table with rows and columns. • Every relation has a schema, which describes the columns, or fields. Ramakrishnan & Gehrke
Levels of Abstraction View 1 View 2 View 3 • Many views, single conceptual (logical) schemaand physical schema. • Views describe how users see the data. • Conceptual schema defines logical structure • Physical schema describes the files and indexes used. Conceptual Schema Physical Schema • Schemas are defined using DDL; data is modified/queried using DML. Ramakrishnan & Gehrke
Example: University Database • Conceptual schema: • Students(sid: string, name: string, login: string, age: integer, gpa:real) • Courses(cid: string, cname:string, credits:integer) • Enrolled(sid:string, cid:string, grade:string) • Physical schema: • Relations stored as unordered files. • Index on first column of Students. • External Schema (View): • Course_info(cid:string,enrollment:integer) • Each data entry is stored only once. Views are created. Ramakrishnan & Gehrke
Data Independence • Applications insulated from how data is structured and stored. • Logical data independence: Protection from changes in logical structure of data. • Physical data independence: Protection from changes in physical structure of data. • Key is to reduce workload and overhead of end users. • One of the most important benefits of using a DBMS! Ramakrishnan & Gehrke
Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Structure of a DBMS These layers must consider concurrency control and recovery • A typical DBMS has a layered architecture. • The figure does not show the concurrency control and recovery components. • This is one of several possible architectures; each system has its own variations. Ramakrishnan & Gehrke
Transaction Management: ACID properties • Atomicity: All actions in the Xact happen, or none happen. • Consistency: If each Xact is consistent, and the DB starts consistent, it ends up consistent. • Isolation: Execution of one Xact is isolated from that of other Xacts. • D urability: If a Xact commits, its effects persist. • The Recovery Manager guarantees Atomicity & Durability. Ramakrishnan & Gehrke
Motivation of concurrency control • Consistency • Isolation • Example • Two parallel transactions T1 and T2 • Serial execution • Execution with interleaving actions • Similar situations in OS and any other resource competitions Ramakrishnan & Gehrke
Motivation of recovery management • Atomicity: • Transactions may abort (“Rollback”). • Durability: • What if DBMS stops running? (Causes?) • Desired Behavior after system restarts: • T1, T2 & T3 should be durable. • T4 & T5should be aborted (effects not seen). crash! T1 T2 T3 T4 T5 Ramakrishnan & Gehrke
Databases make these folks happy ... • End users and DBMS vendors • DB application programmers • E.g. smart webmasters • Database administrator (DBA) • Designs logical /physical schemas • Handles security and authorization • Data availability, crash recovery • Database tuning as needs evolve Must understand how a DBMS works! Ramakrishnan & Gehrke
New challenges • Application oriented to data oriented • Unstructured data • Conflict b/w data and user privacy • Data taint/trace • Challenges caused by cloud: • Storage places • Index of encrypted data files • Proof of retrievability • Mobile: compute it locally or transmit it Ramakrishnan & Gehrke
Summary • DBMS used to maintain, query large datasets. • Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. • Levels of abstraction give data independence. • A DBMS typically has a layered architecture. • DBAs hold responsible jobs and are well-paid! • DBMS R&D is one of the broadest, most exciting areas in CS. Ramakrishnan & Gehrke