450 likes | 776 Views
Data Models. How to structure data. What is a Data Model?. Having formed a model of the enterprise, we now need to represent the data. The data model tells us the structure of the database. Historically, three data models: Hierarchical data model Network data model Relational data model.
E N D
Data Models How to structure data
What is a Data Model? • Having formed a model of the enterprise, we now need to represent the data. • The data model tells us the structure of the database. • Historically, three data models: • Hierarchical data model • Network data model • Relational data model
Hierarchical and Network Data Models • Hierarchical and network data models have been superseded by the relational data model. • Reasons: • Lack of expressive power • E.g., one cannot express many-to-many relationships in the hierarchical model • More closely tied to the underlying implementation. Hence, less data independence. • Relational data model has a clean mathematical basis.
The Relational Model • Due to Codd. • Everything is represented as a relation in the mathematical sense. Also called tables. • A database therefore is a collection of tables, each of which has a unique name, and each of which is described by a schema. • In addition, Codd defined a data manipulation language.
Example of Schemas in the Relational Model • Example of a representation of entity sets: Student(sid,name,addr) Course(cid,title,eid) Empl(eid, ename, deptid) Dept(deptid, dname, loc) • Primary keys are underlined. • Recall that a primary key is one that uniquely identifies an entity. • An entity is a row in a table.
More Example Schemas • Relationship sets between entity sets are also represented in tables. • Example of a table corresponding to a relationship: Enrol(sid, cid, grade) • Again, a relationship is represented by a row (or a tuple) in a relation.
Relational Databases:Basic Concepts I • Attribute: • A column in a table • Domain • The set of values from which the values of an attribute are drawn. • Null value • A special value, meaning “not known” or “not applicable”. • Relation schema • A set of attribute names
Relational Databases: Basic Concepts II • Tuple • A set of values, one for each attribute in the relation scheme over which the tuple is defined, i.e. a mapping from attributes to the appropriate domains • Relation instance • A set of tuples over the scheme of the relation
Relational Databases:Basic Concepts III • Relational Database • A set of relations, each with a unique name • Normalized Relation • A relation in which every value is atomic (non-decomposable). Hence, every attribute in every tuple has a single value.
Keys • Candidate Key • A minimal set of attributes that uniquely identifies a tuple • Primary Key • The candidate key chosen as the identifying key of the relation • Alternate Key • Candidate keys which are not primary keys
Foreign Key • An attribute (or set of attributes) in table R1 which also occurs as the primary key of relation R2. • R2 is called the referenced relation. • Foreign keys are also called connection keys or reference attributes.
Integrity Rules: Entity Constraint • Entity constraint • All attributes in a primary key must be non-null. • Motivation: If the primary key uniquely identifies an entity in an entity set, then we must ensure that we have all the relevant information
Integrity Rules:Referential Integrity • Referential integrity • A database cannot contain a tuple with a value for a foreign key that does not match a primary key value in the referenced relation. • Or, a foreign key must refer to a tuple that exists. • Motivation: If referential integrity were violated, we could have relationships between entities that we do not have any information about.
Data Manipulation Languages • In order for a database to be useful, it should be possible to store and retrieve information from it. This is the role of the data manipulation language. • One of the attractions of the relational data model is that it comes with a well-defined data manipulation language.
Types of DML • Two types of data manipulation languages • Navigational (procedural) • The query specifies (to some extent) the strategy used to find the desired result e.g. relational algebra. • Non-navigational(non-procedural) • The query only specifies what data is wanted, not how to find it e.g. relational calculus.
Relational Algebra • Codd defined a number of algebraic operations for the relational model. • Unary operations take as input a single table and produce as output another table. • Binary operations take as input two tables and produce as output another table.
Unary Operations: Select • Select produces a table that only contains the tuples that satisfy a particular condition, in other words a “horizontal” subset. • Appearance: sC(R) • where C is a selection condition • and R is the relation over which the selection takes place
Example of Select Student sid name addr 123 Fred 3 Oxford 345 John 6 Hope Rd. 567 Ann 5 Garden s sid > 300(Student) yields 345 John 6 Hope Rd. 567 Ann 5 Garden
Unary Operations:Project • Project produces a table consisting of only some of the attributes. It creates a “vertical” subset. • Note that a project eliminates duplicates. • Appearance: ПA(R) • where A is a set of attributes of R • and R is the relation over which the project takes place.
Example of Project Enrol sid cid grade 123 CS51T 76 234 CS52S 50 345 CS52S 55 Пcid(Enrol) yields CS51T CS52S
Binary Operations • Two relations are (union) compatible if they have the same set of attributes. • Example, one table may represent suppliers in one country, while another table with same schema represents suppliers in another country. • For the union, intersection and set-difference operations, the relations must be compatible.
Union, Intersection, Set-difference • R1 R2 • The union is the table comprised of all tuples in R1 or R2. • R1 R2 • The intersection is the table comprised of all tuples in R1 and R2 • R1 - R2 • The set-difference between R1 and R2 is the table consisting of all tuples in R1 but not in R2.
Cartesian Product • R1 R2 • The Cartesian product is the table consisting of all tuples formed by concatenating each tuple in R1 with a tuple in R2, for all tuples in R2.
Example of a Cartesian Product R1 A B 1 x 2 y R2 C D a s b t c u R1 R2 A B C D 1 x a s 1 x b t 1 x c u 2 y a s 2 y b t 2 y c u
Natural Join • R1 R2 • Assume R1 and R2 have attributes A in common. Natural join is formed by concatenating all tuples from R1 and R2 with same values for A, and dropping the occurrences of A in R2 • R1 R2 = ПA’(sC(R1 R2)) • where C is the condition that the values for R1 and R2 are the same for all attributes in A and A’ is all attributes in R1 and R2 apart from the occurrences of A in R2. • hence, natural join is syntactic sugar
Example of a Natural Join I Course cid title eid CS51T DBMS 123 CS52S OS 345 CS52T Networking 345 CS51S ES 456 Instructor eid ename 123 Rao 345 Allen 456 Mansingh
Example of a Natural Join II Course Instructor cid title eid ename CS51T DBMS 123 Rao CS52S OS 345 Allen CS52T Net... 345 Allen CS51S ES 456 Mansingh
Division • R1 R2 • Assume that the schema for R2 is a proper subset of the one for R1. • We form the division by • Ordering the tuples in R1 so that all the tuples with the same value for the non-common attributes are grouped together. • Each group contributes a tuple to the result if the group’s values on the common attributes form a superset of the values of these attributes in R2.
Example of Division I Enrol cid sid grade CS51T 123 A CS52S 123 A CS51T 234 C CS52S 234 B CS51T 345 C CS52S 345 C Temp sid grade 123 A 234 B
Example of Division II Enrol cid sid grade CS51T 123 A CS51T 234 C CS51T 345 C CS52S 123 A CS52S 234 B CS52S 345 C Enrol Temp cid CS52S • Thus, the division gives all courses for which 123 got an A and 234 a B.
Assignment • Allows the expression to be written in parts. • Assigns the part to a temporary variable. • This variable can be used in subsequent expressions. • E.g. • sid(title = ‘DBMS’ (Enrol Course) • Could be re-written as: • r Enrol Course • sid(title = ‘DBMS’(r))
Rename Operation • Names the result of an expression. • x(A1,A2,…,An) (E) • returns the result of expression E under the name x with the attributes renamed as A1,A2,…,An. • E.g. S (Student) • Renames Student table to S.
Database Modification • Insert • r r E • e.g. • Course Course {(‘CS51T’,’DBMS’)} • Delete • r r - E • e.g. • Student Student - sid=‘1’(Student) • Update • r F1,F2,…,Fn(r) • e.g. • Enrol sid,cid,grade grade + 2 (Enrol)
Examples • Assume the following schema: Student(sid,sname,saddr) Course(cid,title,lid) Enrol(sid, cid, grade) Lecturer(lid,lname,deptname) • Query 1: Find the name of all students that have taken the course entitled ‘Expert Systems’. • Query 2: Find the titles of all courses that student ‘Mark Smith’ has done. • Query 3: Find the id of students that have enrolled in all the courses that lecturer with id. = ‘234’ has taught. • Query 4: Find the highest grade for ‘CS51T’.
Relational Calculus • A relational calculus expression defines a new relation in terms of other relations. • A tuple variable ranges over a named relation. So, its values are tuples from that relation. • Example: • Get grades for CS51T • e(Enrol) {<e.grade>: e.cid = ‘CS51T’ }
Basic Syntax for Relational Calculus Expressions • r(R),…,s(S) { <target> : predicate} • where • R,..,S are tables • r,..,s are tuple variables • target specifies the attributes of the resulting relation • predicate is a formula giving a condition that tuples must satisfy to qualify for the resulting relation.
The Predicate • Predicate is constructed from • attribute names • constants • comparison operators • logical connectives • quantified tuple variables t(R), t(R)
Examples of Relational Calculus • Example 2 • Get names and grades for students enrolled in CS51T • e(Enrol), s(Student) {<s.name, e.grade>: e.cid = ‘CS51T’ s.sid = e.sid} • In relation algebra Пcid, name( s CID =‘ CS51T’(Grade Student))
Example 3 • Give the names of all students who got at least one A. • s(Student) {<s.name>: e(Enrol) (e.grade = ‘A’ s.sid = e.sid)} • Tuple variables not mentioned in the target list must be bound in the predicate.
Example 4 • Get the names of all students who only got A’s • s(Student) {<s.name>: e(Enrol)( s.sid = e.sid e.grade = ‘A’) e2(Enrol) (s.sid = e2.sid)}
Example 5 • Get the names of all students who got an A and a B • s(Student) {<s.name>: e(Enrol) (e.grade = ‘B’ s.sid = e.sid) e2(Enrol) (e2.grade = ‘A’ s.sid = e2.sid)}
Example 6 • Get the course titles and names for the courses for which the student did not get an A • c(Course), s(Student) {<s.name, c.title>: g(Enrol) s.sid = g.sid g.cid = c.cid g.grade ‘A’}