500 likes | 857 Views
Κεφάλαιο 3. ΣΧΕΣΙΑΚΟ ΜΟΝΤΕΛΟ. DATABASE SYSTEMS: The Relational Model and Relational Database Systems. OUTLINE Informal and Formal Definition of the Model Structures, Constraints, Operations Relational Algebra Relational Calculus The languages SQL and QBE
E N D
Κεφάλαιο 3 ΣΧΕΣΙΑΚΟ ΜΟΝΤΕΛΟ
DATABASE SYSTEMS: The Relational Model and Relational Database Systems • OUTLINE • Informal and Formal Definition of the Model Structures, Constraints, Operations • Relational Algebra • Relational Calculus • The languages SQL and QBE • Views - Integrity Constraints using SQL • Normalization and Relational Database Design • Relational Database Systems
Relational Model: Informal Definition • Proposed in 1970 by E.F. Codd (“A relational model for large shared data banks”, CACM), as a theory of a database model • Spurred tremendous research in the database field and became the most popular logical data model - many relational DBMSs are today available on nearly all platforms • A relational database is a set of relations • RELATION: A table of values. Each columnin the table has a header, called an attribute (field). Each row in the table is called a tuple (record) and stands for an entity or a relationship.
Formal Definition • STRUCTURES • Only one kind: relations(which have a name) AdomainD is a set of values , D= {d1, d2 , ..., dn} e.g., DOMAIN OF NAMES = the set of all names DOMAIN of WEIGHT = the set of all weights CHAR STRINGS from 1 to 10 in length, etc. An attributeA names a property of interest in a relation and takes its values from some associated domain D(A). e.g., EMPLOYEE_NAME, WEIGHT, etc. Attributes are the column names (headers) in a relation (Notation: R.A, or R[A] where R is the relation name)
Structure Definitions (2) A relation schema Ris the name and attributes of a relation, with the underlying domains for the attributes. When obvious, the domains are ignored Notation: R(A1 , A2 , ... An) e.g., STUDENT(Name, SSN, BirthDate, Address) Thedegreee n of a relation R is the number of attributes in R Adatabase schema S is a set of relation schemas. Notation: S = {R1 , R2 , ... Rm } e.g., COMPANY = { EMPLOYEE, PROJECT, ... }
Structure Definitions (3) -- Atuple t of a relation R(A1 , A2 , ... An) is an (ordered) set of values t = <v1 , v2 , ... vn >, where each vi is an element of the domain D(Ai). -- Arelation instance r(R), simply,relation, is a set of tuples r(R) = { t1 , t2 , ... tk } alternatively, it is a subset of the Cartesian product r(R) Ì D(A1) x D(A2) x ... x D(An) -- Thecardinalityof R is the number of tuples in r(R), it is denoted by CARDR -- Arelational database is a set of relations (instances)
Characteristics of Relations • ORDERING of attributes in a relational schemais essential • ORDERING of tuples in a relationis not important • Every tuple is stored only ONCE in a relation (it is aset) • A value may appear MULTIPLE TIMES in a column and is considered ATOMIC (indivisible) - at times this is referred as First Normal Form (1-NF) relation • A special value, called NULL, is used to represent values that areinapplicable or unknownto the database • e.g, the PhoneNumber value of someone without a phone, Address value for someone who did not supply his address • Notation: component value of a tuple t, t[Ai] = vi
Constraints in the Model • CONSTRAINTS • Three kinds of inherent to the modelconstraints: KEY, ENTITY INTEGRITY, and REFERENTIAL INTEGRITY. • Three basic explicit constraints: DOMAIN, COLUMN and USER-DEFINED (some other explicit constraints, like the Functional Dependencies, will be discussed later.) • KEY CONSTRAINTS: The various keys, as defined for Entities and Relationships, hold in the Relational Model. • Note that a key is a property of the relational schema (not a property of the relation)
Inherent Constraints (1) • A set of attributes SK of a relation schema R for which each tuple in any relation instance r(R) must have unique value(s) is a superkey. That is, for distinct t1 and t2, t1[SK] ¹t2[SK] For instance, SSN of EMPLOYEE, NAME and ADDRESS of EMPLOYEE, SSN and NAME of EMPLOYEE, etc. • A candidate key K is a minimal superkey (that is, no subset of the attributes in K is a superkey). K is also called key. For instance, SSN is a candidate key for EMPLOYEE, but the combination {SSN, NAME} is not. • A primary key PK is one of the candidate keys that is agreed to serve as an identifier for the relation (primary keys are usually distinguished by underlining) For instance, SSN is the primary key of relation EMPLOYEE.
Inherent Constraints (2) • ENTITY INTEGRITY: The primary key attributes PK in a relation schema R cannot have NULL values in any tuple of a relation instance r(R). t[PK] ¹ NULL, for all t in r(R) • The reason behind the above constraint is that a primary key is used to identify a tuple in a relation. • Note that more attributes in R may be constrained to have no NULLS by explicit constraints.
Inherent Constraints (3) • REFERENTIAL INTEGRITY: These constraints involve TWO relations and are used to specify a relationship among tuples of the two relations. They are also called foreign keys. • A foreign key FK is a set of one or more attributes of a relation R1 that forms a primary key for another relation R2. A tuple t1 in r(R1) is said to reference tuple t2 in r(R2), IF t1[FK] = t2[FK] For instance, for the relation WORKING-ON the attribute SSN is a foreign key (it is the primary key of EMPLOYEE).
Explicit Constraints (1) • DOMAIN CONSTRAINTS: They are the rules defined in the domain definition and inherited by columns (attributes) based on that domain. • A domain can be defined, together with all its integrity rules (e.g., the domain of integers having all rules that apply to integers). They are usually the basic data types. • The ideal support is through strong data typing (very rare) • COLUMN CONSTRAINTS: They are additional to the domain constraints and maintain values in a column. • Column rules go beyond the rules inherited by the domain (e.g., the column of small integers or integers between 1 and 10, etc. that further restrict the domain of integers.) • In many systems, support is given with a CHECK option
Explicit Constraints (2) • USER-DEFINED CONSTRAINTS: Any integrity rule, not among the ones discussed before, is classified as user-defined. • To enforce certain business rules, integrity constraints of arbitrary complexity are required. • Such constraints are expressed either procedurally or declaratively (preferred way) • Several mechanisms can be used to implement the enforcement of such rules: stored procedures, triggers, methods (for object-oriented systems) • Generally, relational DBMS are weak in enforcing rules
The COMPANY Database in the Relational Model - Schema EMPLOYEE ( SSN, Name, BirthDate, Address, Sex, Salary, SupSSN, DNumber) DEPARTMENT ( DNumber, DName, MgrSSN, MgrStartDate) PROJECT ( PNumber, PName, Location, DNumber) DEPT_LOCATION ( DNumber, DLocation) WORKS_ON ( SSN, PNumber, HoursPW) DEPENDENT ( SSN, DependName, Sex, BirthDate, Relationship)
The COMPANY Database in the Relational Model - Instance EMPLOYEE . DEPT_LOCATION DEPARTMENT
The COMPANY Database in the Relational Model - Instance PROJECT WORKS_ON . DEPENDENT
Definition of the Model: Operations • OPERATIONS • We can distinguish them in (a) UPDATE, (b) RETRIEVAL • UPDATE operations on Relations • INSERT a tuple • DELETE a tuple • MODIFY a tuple • Integrity constraints should not be violated by the execution of any update operation. For this, updates may propagate to cause other updates automatically. • e.g., when a tuple of EMPLOYEE is deleted, all tuples in WORKING_ON which have the same value for SSN are also deleted (non-existent employees cannot work on projects!)
Operations: Relational Languages • RETRIEVAL operations on Relations • There are two flavors: • (a) RELATIONAL ALGEBRA -- somewhat procedural, tells how to compute the result • (b) RELATIONAL CALCULUS -- somewhat declarative, tells what properties the result should have • No database system supports the two flavors of retrieval languages in their pure forms. This is because the issues of “ease of use”, “convenience”, etc., play an essential role in user interaction. Yet, the languages supported in DBMSs have their roots in either relational algebra or calculus.
Relational Languages • Query languages: Allow manipulation and retrieval of datafrom a database. • Relational model supports simple, powerful QLs: • Strong formal foundation based on logic. • Allows for much optimization. • Query Languages != programming languages! • QLs not expected to be “Turing complete”. • QLs not intended to be used for complex calculations. • QLs support easy, efficient access to large data sets.
Relational Algebra Operations • RELATIONALALGEBRA • A set of operators each of which maps one or more relations into a new relation (the algebra is CLOSED). • The operators, just as in arithmetic algebra, can be nested, since the result of each operation is itself a relation. • There are two types of operators: • traditional (regular) set operators • union, intersection, difference, ... • database specific set operators • projection, selection, join, ...
Relational Algebra: Set Ops • Traditional Set Operators • Union, Intersection, Difference, Cartesian Product • For the first three operators to apply, we must have UNION COMPATIBILITY between the two operand relations. That is: R1 ( A1 , A2 , ... , An ) and R2 ( B1 , B2 , ... , Bn ) must have the same number of attributes and the domains of the corresponding attributes must be compatible, i.e., D(Ai) = D(Bi) , for i = 1, 2, ..., n • By convention, the resulting relation for these operators has the same attribute names as the first operand relation R1
Relational Algebra : Example Database Database Schema STUDENT (SName, SAge) , relation schema R INSTRUCTOR (IName, IAge) relation schema S Database Instance R (STUDENT) S (INSTRUCTOR) CARDR = 5 CARDS = 4
Relational Algebra: Set Ops (2) • UNION - Put all the tuples of two relations in one relation • Notation: RÈS • Formally: R È S = { t | t is in R or t is in S } • Example: STUDENT È INSTRUCTOR CARDR È S <= CARDR + CARDS È =
Relational Algebra: Set Ops (3) • INTERSECTION - Put the common tuples of two relations in one relation • Notation: R Ç S • Formally: R Ç S = { t | t is in R and t is in S } • Example: STUDENT Ç INSTRUCTOR CARDR Ç S <= max(CARDR , CARDS ) Ç =
Relational Algebra: Set Ops (4) • SET DIFFERENCE - Select the tuples of the first relation which are not members of the second relation • Notation: R - S • Formally: R - S = { t | t is in R and t is not in S } • Example: STUDENT - INSTRUCTOR CARDR -S <= CARDR - =
Relational Algebra: Set Ops (5) • CARTESIAN PRODUCT - Combine each tuple of one relation with each tuple of the other • Notation: R 5 S • Formally: R 5 S = { t | t is the concatenation of a tuple in R with a tuple in S } • Example: STUDENT 5 INSTRUCTOR CARDR5S = CARDR x CARDS
Relational Algebra Operators: SELECTION • SELECTION - Selects the subset of tuples of a relation that satisfy a certain condition (qualification) c , which is an arbitrary Boolean expression on the attributes of R (“horizontal” subset of R) • Notation: sc(R) or R[c] • Formally: sc(R) = { t | t is in r(R) and condition c holds for t } • Examples: sDNumber = 4(EMPLOYEE), sSalary>30000(EMPLOYEE) s(Salary>30000 AND DNumber = 4 ) OR DNumber = 5 (EMPLOYEE), EMPLOYEE [ Dnumber = 4 ], EMPLOYEE [ Salary > 30000 ]
Relational Algebra Operators: SELECTION (2) • Selection is both commutative and associative (a) sc1( sc2(R) ) = sc2 ( sc1(R) ) (b) sc1( sc2(R) ) = sc1 AND c2 (R) = sc1 , c2 (R) (c) sc1( sc2( sc3(R) ) ) ) = sc2( sc3( sc1(R) ) ) ) • Example Result: sDNumber = 4(EMPLOYEE) All Employees in department number 4
Relational Algebra Operators: PROJECTION • PROJECTION - Keeps only certain attributes (specified by a list L) and eliminates the other attributes of a relation R and also all duplicate tuples (“vertical” subset of R) • Notation: pL(R) or R[L] • Formally: pL(R) = { t[L] | t is in r(R) and L ÌR} • Example: pLocation(PROJECT), or PROJECT[Location] All Locations where projects are
Relational Algebra Operators: JOINS • There are several types of JOINs - all combining two relations to form a new one: • (theta) join, equality join, natural join, semi-join, outer join • THETA (CONDITION) JOIN: Connect tuples from two relations that match (satisfy a Boolean condition c) on certain attributes • A theta-join is equivalent to a Cartesian product followed by a selection on the condition c. • Notation: R cS or R [ c ] S • The resulting relation has ALL the attributes of R and of S
Relational Algebra Operators: THETA JOIN • Example: DEPARTMENT MgrSSN > MgrSSNDEPARTMENT All department-department combinations where the first department’s number is greater than the second’s
Relational Algebra Operators: EQUALITY JOIN • EQUALITY JOIN: Connect tuples from two relations that match (have equal values) on certain attributes. This is exactly like THETA JOIN, except that the condition c is only allowed to have equalities. • Notation: R cS or R [ c ] S • Example: WORKS_ON HoursPW = DNumberPROJECT A totally MEANINGLESS Relation
Relational Algebra Operators: NATURAL JOIN • NATURAL JOIN: Connect tuples from two relations that match (have equal values) on all common attributes. In the result, the common attributes are kept only once • Notation: R S or R [ X = X ] S • Example: DEPARTMENT DEPT_LOCATION
Relational Algebra Operators: SEMI--JOIN • SEMI-JOIN: Select the subset of one relation that joins with another. A semi-join is equivalent to a join followed by a projection. • Notation: R µcS or R <c ] S • Example: EMPLOYEE µSSN=MgrSSNDEPARTMENT Semi-joins are USEFUL in distributed database operations
Relational Algebra Operators: OUTER--JOIN • Motivation: In a regular join operation, tuples in relations R or S that do not have matching tuples in the other relation do not appear in the result. In some queries, all tuples in R (or S) must appear in the result - when no matching tuples are found, NULLs are placed for the missing attribute values. • Notation: R Å S • OUTER-JOINs are distinguished in: • Left outer join (all tuples in R appear in the result) • Right outer join (all tuples in S appear) • Full outer join (all tuples in R and S appear)
Relational Algebra Operators: DIVISION • DIVISION: Given relations R(X,Y) and S(Y), where X, Y are sets of attributes, a tuple t is a member of the division (denoted: (R / S)[X] ) IF for all tS in S there exist tR in R, such that: tR[Y] = tS [Y] and tR [X] = t [X] • Analogy with number arithmetic: The quotient q of a/b is the largest number s.t. qb <= a The quotient Q of R ¸ S is the maximal relation s.t. Q X S Í R
Relational Algebra Queries (1) • A series of queries in relational algebra are presented in the sequel, using an example relational database that involves SAILORS who RESERVE some BOATS. SAILORS (Sid, SName, Rating) BOATS (Bid, BName, Color) RESERVE (Sid, Bid, Date)
Relational Algebra Queries (2) • QUERY1: Find the names of sailors who have reserved boat number 2 ( RESERVE [Bid=2] [Sid=Sid] SAILORS ) [SName] pSName ( sBid=2RESERVE Sid=Sid SAILORS ) • QUERY2: Find the names of sailors who have reserved a red boat ( BOAT [Color=red] [Bid=Bid] RESERVE [Sid=Sid] SAILORS ) [SName] pSName ( sColor=red BOAT Bid=Bid RESERVE Sid=Sid SAILORS )
Relational Algebra Queries (3) • QUERY3: Find the colors of the boats reserved by eleni (SAILORS [SName=eleni] [Sid=Sid] RESERVE [Bid=Bid] BOATS) [Color] pColor ( sSName=eleni SAILORS Sid=Sid RESERVE Bid=Bid BOATS ) • QUERY4: Find the names of the sailors who have reserved at least one boat ( RESERVE [Sid=Sid] SAILORS ) [SName] pSName( RESERVE Sid=Sid SAILORS )
Relational Algebra Queries (4) • QUERY5: Find the names of sailors who have reserved a red or a green boat pSName ( ( sColor=red BOATS ÈsColor=green BOATS ) pBid=Bid RESERVE Sid=Sid SAILORS ) • QUERY6: Find the names of sailors who have reserved both a red and a green boat pSName ( ( pSid( sColor=red BOATS Bid=Bid RESERVE ) Ç pSid ( sColor=green BOATS Bid=Bid RESERVE ) ) Sid=Sid SAILORS )
Relational Algebra Queries (5) • QUERY7: Find the names of sailors who have reserved all boats pSName ( ( pSid, Bid RESERVE / pBid BOATS ) Sid=Sid SAILORS ) • QUERY8: Find the names and ratings of sailors who have reserved all red boats pSName, Rating ( ( pSid, Bid RESERVE / pBid (sColor=red BOATS ) ) Sid=Sid SAILORS )
Relational Algebra: Comments • There are several properties that hold in a relational algebra expression (commutatitivity, associativity, etc.) • Examples: sc1( pL (R) ) = pL( sc1(R) ) sc1( R c2S ) = sc1( R ) c2S sc1 (R È S) = sc1(R) Èsc1(S) .... • Such properties are very useful in queryoptimization
Relational Algebra: Comments • COMPLETE SET OF OPERATIONS • The set of operators {s, p, È, -, 5} is called a complete set of relational algebra operations. The implication is that ALL other operators can be described as a sequence of the above operators. • For example, the division operator can be described as: R / S = pX(R)- ( (pX (R)5 S) - R ) where X are the non-common attributes in R and S • Equivalently, it is expressed as: (R / S) [X] = R[X] - ( ( R[X] x S ) - R )[X]
Relational Algebra Competeness • There are several combinations of relational algebra operators that define a complete set. • Any Query Language equivalent to a complete set of operations is called RELATIONALLY COMPLETE • NOTE: This does not imply that the language is adequate to do all database operations (e.g., a good language must support aggregates, many forms of joins, built-in functions,...) • An interesting operator -which goes beyond the expressive power of the relational set of operators as defined by Codd- is that of transitive closure. This is a form of recursion in relational databases and is very useful in many applications.