420 likes | 424 Views
Important updates for CS3431 students: Project Phase 1 deadline is Nov 4, Homework 1 deadline is Nov 8. TAs assigned to project teams. Check course website for details.
E N D
The Relational Model Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu CS3431-B11
Announcements • Project-Phase 1 is already out on Oct. 28 and due Nov. 4 • Homework 1 is out today Nov. 1 and due Nov. 8 • TAs Assigned to Project Teams • All phases of a given team will be graded by the same TA • The list is on the course website under “Homeworks & Projects” and under blackboard system under “Project Phases” as well • Homeworks & Project Due Date • Reducing the number of HWs and project phases by one so the last due data is Dec. 6 cs3431
One Last Example of ERD !!! cs3431
What is Bad??? Age Model Make VIN Name DoB ID ColorName car buys Customer colorID Date Price contains Date takes CarMiles FeatureName Car-feature Loan Bank amount = A customer can buy many cars = A customer may take a loan to buy a specific car number
From the Previous Example • ColorId & ColorName (cause redundancy & inconsistency) • Car can have one feature (wrong cardinality) • Car-feature has one attribute (should not be an entity) • CarMiles should be attached to the car (not to the relationship) • Age should be a derived attribute • A car can be bought by one (or zero) customers (wrong arrow head) • Loan and Car are not linked together (buys should be 3-way) • Or create a new entity set “Contract” and link it to the three entity sets
Relational Model • Another model for describing your data and application requirements • Currently the most widely used model • Vendors: Oracle, Microsoft, IBM • Recent competitions • Object Oriented Model: ObjectStore, Oracle • XML Databases : native and extensions • What about ER model • ER model is the first step in the design • ER model is then translated (mapped) to relational model
Relational Model • Structure: • Relations (also called Tables) • Attributes (also called Columns or Fields)
Attributes in Relational Model • Attribute Name • Each attribute of a relation has a unique name within the relation • Attribute Domain • The set of allowed values for each attribute is called the domain of the attribute • E.g., integer, real, date, string {“Male”, “Female”}, etc. • Atomic Attributes • Attribute values are required to be atomic (primitive or derived) • Note: multivalued attributes are not atomic • Note: composite attributes are not atomic • We will see how to handle non-atomic attributes • Special Null Value • The special value null is a member of every domain • The null value means the values does not exist (unknown)
Relational Schema • Schemais the structure of the relation • Attributes A1, A2, ..., An of relation R • R=(A1,A2,...,An ) is the relations schema of R • Relational Schema of “customer” is • customer = (customer_name, customer_street, customer_city) • customer (customer_name, customer_street, customer_city)
Relational Instance • Instance is the current schema plus the current tuples in the relation • The current instance of “customer” has four tuples (rows or records) • Instances are frequently changing • E.g., inserting new tuples, deleting or updating existing tuples • Schemas may also change but not frequently
Arity & Cardinality of Relations • Arity (Degree) of a relation • Number of attributes of that relation • Cardinality of a relation • Number of tuples in that relation Arity = 3 Cardinality = 4
Relations are Unordered • Order of tuples is not important • Order of columns is not important • Think of a relation as a set (or bag)of tuples not as a list of tuples • Most current DBMSs allow bags Student Student These two relations are equivalent
Keys of Relations • Super Keys • Subset of attributes that uniquely identify each tuple • Candidate Keys • Minimal super key • Primary Keys • Choose any of the candidate keys • Represented by “underline” under the selected attributes • Set vs. Bag • If a relation has a primary key, then it is a set not a bag How do we create these relations? SQL Language Assuming the customer name is unique • customer (customer_name, customer_street, • customer_city)
Relational Query Languages (SQL Standard) • Developed by IBM (system R) in 1970s • Standard as used by many vendors (portable) • SQL (Structured Query Language) • Standards: • SQL-86 • SQL-89 (minor revision) • SQL-92 (major revision) • SQL-99 (major extensions in OO, current standard)
SQL Language • Data Definition Language (DDL) • Create tables, specifying the columns and their types of columns, the primary keys, etc. • Drop tables, add/drop columns, add/drop constraints – primary key, unique, etc. • Data Manipulation Language (DML) • Update, Insert, Delete tuples • Query the data
SQL DDL DDL allows the specification of: • The schema for each relation • The type and domain of values associated with each attribute • Integrity constraints • Will see many types of these constraints • The set of indices to be maintained for each relation • Security and authorization information for each relation • The physical storage structure of each relation on disk
DDL: Creating Tables CREATE TABLE <tableName> ( <col> <type>, <col> <type>, … <col> <type>, [CONSTRAINT <cName>] PRIMARY KEY (…), [CONSTRAINT <cName>] UNIQUE (…), [CONSTRAINT <cName>] FOREIGN KEY (…) REFERENCES <tableName> (…) );
Domain Types in SQL • String: • string: variable size string with no maximum (theoretically ) • char(n): Fixed length character string, with user-specified length n • varchar(n): Variable length character strings, with user-specified maximum length n • Numbers: • Int: Integer • Smallint (short): Small integer • float(n): Floating point number, with user-specified precision of at least n digits. • Decimal(n,d): Double-precision of n decimal digits with decimal point at position d • Bits: • Bit(n): String of bits of fixed size n (user-defined) • Bit Varying(n): String of variable-size bits of maximum n (user-defined) • Boolean: Boolean type, True or False • Date & Time: • Date: Date field with certain format • Time: Date plus time with a certain format • customer (name: string, DoB: Date, Phone: Char(10), …)
DDL ---- Creating Relations • Create “Courses” relation • Create “Students” relation CREATE TABLE Students (sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL); CREATE TABLE Courses (cid: Varchar(20), name: string, maxCredits : integer, graduateFlag: boolean); • Create “Enrolled” relation • Observe that type (domain) of each field is specified • Types are enforced by DBMS whenever tuples are added or modified CREATE TABLE Enrolled (sid: CHAR(20), cid: Varchar(20), enrollDate:date, grade: CHAR(2))
Dropping & Altering Tables • Dropping (Deleting) a table • DROP TABLE <tableName>; • Altering a table to add a column • ALTER TABLE <tableName> ADD <col> <type>; • The initial value of the column is Null • Altering a table to drop a column • ALTER TABLE <tableName> DROP COLUMN <col>; • Some DBMSs do not support dropping columns
Integrity Constraints (ICs) • IC: condition that must be Truefor any instance of the database • ICs are specified when schema is defined • ICs are checked when relations are modified • A legalinstance of a relation is one that satisfies all specified ICs. • DBMS should not allow illegal instances.
IC: Key Constraints • Two types of key constraints can be defined • Primary Key • Unique • The other candidate keys • Each relation may have only one primary key but, possibly many unique keys • Primary key attributes do not accept Null • Create “Students” relation • Create “Courses” relation CREATE TABLE Students (sid: CHAR(20) Primary Key, name: CHAR(20), login: CHAR(10) Unique, age: INTEGER, gpa: REAL); CREATE TABLE Courses (cid: Varchar(20) Primary Key, name: string, maxCredits : integer, graduateFlag: boolean);
IC: Key Constraints (Cont’d) • Another way to define Keys • Create “Students” relation • Create “Courses” relation CREATE TABLE Students (sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL, Primary Key (sid), Unique (login)); CREATE TABLE Courses (cid: Varchar(20), name: string, maxCredits : integer, graduateFlag: boolean, Primary Key (cid)); • Create “Enrolled” relation CREATE TABLE Enrolled (sid: CHAR(20), cid: Varchar(20), enrollDate: date, grade: CHAR(2), Primary Key (sid, cid, enrollDate));
Violations of Key Constraints • DBMS prevents this from happening • Example customer (name: string, DoB: Date, Phone: Char(10), address: string); (“Mike”, “10/25/1988”, “1765432123”, “NY”); Inserted (“Mike”, “5/10/1990”, “1999887665”, “CA”); Raise an error (null , “5/10/1990”, “1999887665”, “CA”); Raise an error
Violation of key constraints • Primary key violation if: • There is a row with null values for any attribute of primary key, OR • Two rows with same values for all attributes of primary key • Unique key violation if: • Two rows have the same non-null values in the unique attributes • Null values are not equal to each other • DBMS will prevent this from happening
Adding/Dropping Constraints • Adding a constraint (after creating the table) • ALTER TABLE <tableName> ADD CONSTRAINT <cName> <constraint type> … • Dropping a constraint • ALTER TABLE <tableName> DROP CONSTRAINT <cName>
Constraints Limits the Data • Create “Enrolled” relation • Create “Enrolled” relation CREATE TABLE Enrolled (sid: CHAR(20), cid: Varchar(20), enrollDate: date, grade: CHAR(2), Primary Key (sid, cid)); CREATE TABLE Enrolled (sid: CHAR(20), cid: Varchar(20), enrollDate: date, grade: CHAR(2), Primary Key (sid, cid, enrollDate)); A student can take a course only once. A student can take a course many times on different dates.
Other Constraint Types • Create “Students” relation • Create “Courses” relation CREATE TABLE Students (sid: CHAR(20), name: CHAR(20) NOT NULL, login: CHAR(10), age: INTEGER, gpa: REAL Default 0.0, Primary Key (sid), Unique (login)); CREATE TABLE Courses (cid: Varchar(20), name: string, maxCredits : integer, graduateFlag: booleanDefault False, Primary Key (cid)); NOT NULL & DEFAULT constraints
Other Constraint Types (Cont’d) • Create “Enrolled” relation CREATE TABLE Enrolled (sid: CHAR(20), cid: Varchar(20), enrollDate: date, grade: CHAR(2), Primary Key (sid, cid, enrollDate), Constraint gradeVal check grade in (‘A+’, ‘A-’, ‘B+’, ‘B-’)); Domain constraint
IC: Foreign Key Constraints(Referential Integrity ) • Create “Students” relation • Create “Courses” relation CREATE TABLE Students (sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL); CREATE TABLE Courses (cid: Varchar(20), name: string, maxCredits : integer, graduateFlag: boolean); Foreign key Foreign key • Create “Enrolled” relation CREATE TABLE Enrolled (sid: CHAR(20), cid: Varchar(20), enrollDate: date, grade: CHAR(2));
Enrolled (referencing relation) Students (referenced relation) Primary Key Foreign Key Foreign Keys, Referential Integrity • Foreign key : Set of fields in one relation that "refer" to a tuple in another relation (like a pointer) • Foreign key : • FK in referencingrelation must match PK of referenced relation. • Match = same number of columns, compatible data types (column names can be different) • The relationship is many-to-one from the “referencing” to the “referenced”
Foreign Keys in SQL • Create “Students” relation • Create “Courses” relation CREATE TABLE Students (sid: CHAR(20) Primary Key, name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL); CREATE TABLE Courses (cid: Varchar(20) Primary Key, name: string, maxCredits : integer, graduateFlag: boolean); • Create “Enrolled” relation CREATE TABLE Enrolled (sid: CHAR(20) Foreign Key References (Students.sid), cid: Varchar(20), enrollDate: date, grade: CHAR(2), Constraints fk_cidForeign Key cid References (Courses.cid)); sid & cid have to be already defined as primary keys Alter Table Enrolled Add Constraints fk_cidForeign Key cid References Courses(cid));
Violation of Foreign Key Constraints • Suppose we have: • FOREIGN KEY R1 (S1) REFERENCES R2 (S2) • This constraint is violated if: • If there is a value in R1.S1 which does not exist in R2.S2 • The DBMS prevents this from happening
Enforcing Referential Integrity • Consider Students and Enrolled; sid in Enrolled is a foreign key that references Students. • What if inserting in “Enrolled” values (“4545”, “History105”, “A”) • DBMS will reject it because no student with sid “4545”
Enforcing Referential Integrity (Cont’d) • Deletion: What if an Enrolled tuple is deleted? • No problem. It is allowed
Enforcing Referential Integrity (Cont’d) • Deletion: What if a Student tuple is deleted? • Cascading -- Also delete all Enrolled tuples that refer to it • No Action -- Disallow deletion of a Students tuple since there are dependent tuples • Set Default -- Set sid in Enrolled tuples that refer to it to a default sid • Set Null -- Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown’ (Not always applicable) • Same rules apply when updating Students.sid The two most common actions are: Cascading or No Action
Referential Integrity in SQL • SQL/99 supports all 4 options on deletes & updates: CREATE TABLE Enrolled (sid: CHAR(20) Foreign Key References (Students.sid), cid: Varchar(20), enrollDate: date, grade: CHAR(2), Constraints fk_cidForeign Key cid References (Courses.cid) ON DELETE CASCADE ON UPDATE NO ACTION);
Cyclic Dependencies • What if cyclic dependencies between two tables exists? • Table R has a foreign key to Table S (Say R.a references S.b) • Table S has a foreign key to Table R (Say S.c references R.d) • At insertion time: • When insert into R will be rejected because S.b does not yet exits • When insert into S will be rejected because R.d does not yet exits • Solution (Many) – The easiest is: • Disable the constraint on one table, say S • Insert the tuple into S, then insert into R • Enable the constraint on S again Alter Table S Disable Constraint <name>; Alter Table S Enable Constraint <name>;
Where do ICs Come From? • ICs are based upon semantics of real-world enterprise being described in database relations. • We can check a database instance to see if an IC is violated ? • In typical cases, DBMS should make sure all constraints are enforced • Given an instance, can we tell what are the ICs? • No, NEVER ! • An IC is a statement about all possible instances! • They need to be checked from the schema definition not from a given instance
Relational Model: Summary • Structure • Relations (Tables) • Attributes (Columns, Fields) • Constraints + Constraint Enforcement • Domain, Default, and Not Null Constraint • Key Constraint • Primary key, candidate key (unique) • Foreign Key Constraint
Inserting, Updating, Deleting Data • This is performed using Data Manipulation Language of SQL (DML) • Insertion • Insert into Students values (“1111”, …); • Deletion • Delete from Students; • Delete from Students Where sid = “1111”; • Update • Update Students Set GPA = GPA + 0.4; • Update Students Set GPA = GPA + 0.4 Where sid = “1111”;