310 likes | 318 Views
The Relational Model (cont’d) Introduction to Disks and Storage. CS 186, Spring 2007, Lecture 3 Cow book Section 1.5, Chapter 3 (cont’d) Cow book Chapter 9 Mary Roth. Administrivia. Homework 0 due today 10 p.m.! Nathan and Erinaios posted their office hours on class homepage
E N D
The Relational Model (cont’d) Introduction to Disks and Storage CS 186, Spring 2007, Lecture 3 Cow book Section 1.5, Chapter 3 (cont’d) Cow book Chapter 9 Mary Roth
Administrivia • Homework 0 due today 10 p.m.! • Nathan and Erinaios posted their office hours on class homepage • Homework 1 available today from class web site Submit team members online Read thru description; we’ll talk more about it after today’s lecture • Questions from last time?
Outline • What we learned last time • Components of a DBMS • Relational Data Model • New stuff • Storage, Disks and Files
Review: Components of a DBMS Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management Today we go here… DB • A DBMS is like an ogre; it has layers
Review: Relational Data Model • Most widely used data model today. • Relations: • Schema : specifies name of relation, plus name and type of each column. • Instance : a table, with rows and columns that contain data. • SQL is a query language for relational data model • DDL: To define/modify/change schemas • DML: To query data in table. • Keys are a way to associate tuples in different relations
Let’s return to our bank… • Can we apply a relational model to our bank spreadsheet? CREATE TABLE CUSTOMERS (CustomerID INTEGER, Name VARCHAR(128), Address VARCHAR(256), AccountID INTEGER); CREATE TABLE ACCOUNTS (AccountID INTEGER, Balance Double);
Primary Keys • A set of fields is a superkey if: • No two distinct tuples can have same values in all key fields • A set of fields is a key for a relation if : • It is a superkey • No subset of the fields is a superkey • what if >1 key for a relation? • One of the keys is chosen (by DBA) to be the primary key. Other keys are called candidate keys. • e.g. • {sid, gpa} is an example of a superkey. • sid is a key for Students. • what about name? login? Students sid name login age gpa 53666 Jones jones@cs 18 3.4 53688 Smith smith@eecs 18 3.2 53650 Smith smith@math 19 3.8
Primary and Candidate Keys in SQL • Keys must be chosen and defined carefully! • They imply semantics! • What does this set of key definitions imply about students? CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY(sid), UNIQUE (cid, grade)) “Students can take only one course, and no two students in a course receive the same grade.”
Primary and Candidate Keys in SQL • Better definition: CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY(sid,cid)) “For a given student and course, there is a single grade.”
Foreign Keys, Referential Integrity • Foreign key:Set of fields in one relation that is used to `refer’ to a tuple in another relation. • Must correspond to the primary key of the other relation. • Like a `logical pointer’. • Plays the same role as the physical pointer in IMS • If all foreign keys in a table refer to tuples in the other, referential integrity is achieved (i.e., no dangling references.)
Foreign Keys in SQL 11111 English102 A • E.g. Only students listed in the Students relation should be allowed to enroll for courses. • sid is a foreign key referring to Students: CREATE TABLE Enrolled (sid CHAR(20),cid CHAR(20),grade CHAR(2), PRIMARY KEY(sid,cid), FOREIGN KEY(sid) REFERENCESStudents ) Enrolled Students sid cid grade sid name login age gpa 53666 Carnatic101 C 53666 Jones jones@cs 18 3.4 53666 Reggae203 B 53688 Smith smith@eecs 18 3.2 53650 Topology112 A 53650 Smith smith@math 19 3.8 53666 History105 B
Let’s return to our bank… • Can we define keys for our relations? CREATE TABLE CUSTOMERS (CustomerID INTEGER NOT NULL, Name VARCHAR(128), Address VARCHAR(256), AccountID INTEGER, PRIMARY KEY(CustomerID), FOREIGN KEY(accountid) references ACCOUNTS); CREATE TABLE ACCOUNTS (AccountID INTEGER NOT NULL, Balance Double, PRIMARY KEY (AccountID)); • Why do we need NOT NULL? • What would happen if I executed these commands in this order?
Let’s return to our bank… We’ll come back to these later… • Write a SQL query (DML) that returns the names and account balances for all customers that have an account balance > 2500. • Write a SQL query (DML) that withdraws $300 from Frodo’s account.
Disks, Memory, and Files Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB You are here…
Disks and Files • DBMS stores information on disks. • Data must be transferred to and from disk and RAM • READ: transfer data from disk to main memory (RAM). • WRITE: transfer data from RAM to disk. • READ and WRITE are expensive and must be planned carefully! • DBMS architecture is designed to minimize both
Why Not Store Everything in Main Memory? • Costs too much. For ~$300, PCConnection will sell you: • ~1GB of RAM • ~30GB of flash • ~1 TB of disk • Main memory is volatile. We want data to be saved between runs. (Obviously!)
The Storage Hierarchy Smaller, Faster • Main memory (RAM) for currently used data. • Disk for the main database (secondary storage). • Tapes for archiving older versions of the data (tertiary storage). Bigger, Slower Source: Operating Systems Concepts 5th Edition
Jim Gray’s Storage Latency Analogy: How Far Away is the Data? Andromeda 9 10 Tape /Optical 2,000 Years Robot 6 Pluto Disk 2 Years 10 1.5 hr Sacramento 100 RAM This Lecture Hall 10 10 min On Board Cache 2 On Chip Cache This Room 1 Registers My Head 1 min
Disks • Secondary storage device of choice. • Main advantage over tapes: • faster time to retrieve • random access vs.sequential. • Data is stored and retrieved in units called disk blocks or pages. • Unlike RAM, time to retrieve a disk block varies depending upon location on disk. • Therefore, relative placement of blocks on disk has major impact on DBMS performance!
Components of a Disk Tracks Arm movement Arm assembly Spindle Disk head The platters spin (say, 120 rps). The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder(imaginary!). Sector Platters Only one head reads/writes at any one time. • Block size is a multiple of sector size (which is fixed).
Accessing a Disk Page Arm movement Transfer time • Time to access (read/write) a disk block: • seek time (moving arms to position disk head on track) • rotational delay (waiting for block to rotate under head) • transfer time (actually moving data to/from disk surface) Seek time Rotational delay
Accessing a Disk Page • Seek time and rotational delay dominate. • Seek time varies between about 0.3 and 10msec • Rotational delay varies from 0 to 4msec • Transfer rate around .08msec per 8K block • Key to lower I/O cost: reduce seek/rotation delays!
Arranging Pages on Disk • `Next’ block concept: • blocks on same track, followed by • blocks on same cylinder, followed by • blocks on adjacent cylinder • Blocks in a file should be arranged sequentially on disk (by `next’), to minimize seek and rotational delay. • For a sequential scan, pre-fetchingseveral pages at a time is a big win!
Summary: Disk Space Manager • Lowest layer of DBMS software manages space on disk (using OS file system or not?). • Higher levels call upon this layer to: • allocate/de-allocate a page • read/write a page • Best if a request for a sequence of pages is satisfied by pages stored sequentially on disk! • Responsibility of disk space manager. • Higher levels don’t know how this is done, or how free space is managed. • Though they may make performance assumptions! • Hence disk space manager should do a decent job.
Buffer Management Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB • Data must be in RAM for DBMS to operate on it! • Buffer Mgr hides the fact that not all data is in RAM You are here…
Buffer Management in a DBMS DB Page Requests from Higher Levels • Buffer pool information table contains: <frame#, pageid, pin_count, dirty> BUFFER POOL disk page free frame MAIN MEMORY DISK choice of frame dictated by replacement policy
Requesting a page I need page 3 Higher level DBMS component Buf Mgr BUFFER POOL 22 disk page 3 I need page 3 3 free frames MAIN MEMORY Disk Mgr DISK 1 2 3 … 22 … 90 • If requests can be predicted (e.g., sequential scans) pages can be pre-fetchedseveral pages at a time!
Releasing a page I read page 3 and I’m done with it Higher level DBMS component Buf Mgr BUFFER POOL 22 disk page 3 free frames MAIN MEMORY Disk Mgr DISK 1 2 3 … 22 … 90
Releasing a page I wrote on page 3 and I’m done with it Higher level DBMS component Buf Mgr BUFFER POOL 22 disk page 3’ free frames 3’ MAIN MEMORY Disk Mgr DISK 3’ 1 2 3 … 22 … 90
More on Buffer Management • Requestor of page must eventually unpin it, and indicate whether page has been modified: • dirtybit is used for this. • Page in pool may be requested many times, • a pin count is used. • To pin a page, pin_count++ • A page is a candidate for replacement iff pin count == 0 (“unpinned”) • CC & recovery may entail additional I/O when a frame is chosen for replacement. • Write-Ahead Log protocol; more later!