350 likes | 505 Views
CM036 Advanced Databases. Lecture 1 : Relational Data Models. Content. 1 Relational Models, Languages and their Use in databases 2 The Relational Algebra 3 The Relational Calculus. 1.1 Relational and other data models.
E N D
CM036 Advanced Databases Lecture 1: Relational Data Models
Content 1 Relational Models, Languages and their Use in databases 2 The Relational Algebra 3 The Relational Calculus CM036 Advanced DatabasesLecture 1: Relational Models
1.1 Relational and other data models • Relational models (relational algebra, relational calculus, predicate calculus) • Hierarchical (object-relational) • Object-oriented CM036 Advanced DatabasesLecture 1: Relational Models
1.2 Relational Languages and their Use • Data manipulation language (DML) • Use: Populates, updates, and queries relational databases • Example: relational algebra, SQL DML • Data definition language (DDL) • Use: Specifies the data structures and defines the relational schema • Example: relational calculus, SQL DDL • Data control language (DCL) • Use: Specifies operation permissions, data access discipline and administrative privileges, and defines the database users • Example: SQL DCL, LDAP CM036 Advanced DatabasesLecture 1: Relational Models
2 The Relational Algebra • Proposed by Codd in 1970 as a formal data model. Describes the operations to manipulate relations. • Used to specify retrieval requests (queries). Query result is also in the form of a relation. • Relational operators transform either a simple relation or a pair of relations into another relation Relational Operations: • RESTRICT(s) and PROJECT() operations. • Set operations: UNION(), INTERSECTION(),DIFFERENCE(—), CARTESIAN PRODUCT(). • JOIN operations (⋈). • Other relational operations: DIVISION, OUTER JOIN, AGGREGATE FUNCTIONS. CM036 Advanced DatabasesLecture 1: Relational Models
2.1 RESTRICT s and PROJECT P • RESTRICToperation (called also SELECT- denoted bys ): • Selects the tuples (rows) from a relation R that satisfy a certain selection conditionc on the attributes of R: sc(R) • Resulting relation includes each tuple in R whose attribute values satisfy the condition c, i.e. ithas the same attributes as R Examples: sDNO=4(EMPLOYEE) sSALARY>30000(EMPLOYEE) s(DNO=4 AND SALARY>25000) OR DNO=5(EMPLOYEE) CM036 Advanced DatabasesLecture 1: Relational Models
PROJECToperation (denoted by ): • Keeps only certain attributes (columns) from a relation R specified in an attribute listL: L(R) • Resulting relation has only those attributes of R specified in L Example:FNAME,LNAME,SALARY(EMPLOYEE) • The PROJECT operation eliminates duplicate tuples in the resulting relation so that it remains a true set (no duplicate elements) Example:SEX,SALARY(EMPLOYEE) • If several male employees have salary 30000, only a single tuple <M, 30000> is kept in the resulting relation. CM036 Advanced DatabasesLecture 1: Relational Models
Sequence of operations • Several operations can be combined to form a relational algebra expression (query) Example:Retrieve the names and salaries of employees who work in department 4 FNAME,LNAME,SALARY (sDNO=4(EMPLOYEE) ) • Alternatively, we specify explicit intermediate relations for each step: DEPT4_EMPS sDNO=4(EMPLOYEE) R FNAME,LNAME,SALARY(DEPT4_EMPS) • Attributes can optionally be renamed in the resulting left-hand-side relation (this may be required for some operations that will be presented later): DEPT4_EMPS sDNO=4(EMPLOYEE) R(FNAME,LNAME,SALARY) FNAME,LNAME,SALARY(DEPT4_EMPS) CM036 Advanced DatabasesLecture 1: Relational Models
2.2 Set Operations • Binary operations from set theory:UNION: R1 R2,INTERSECTION: R1 R2, DIFFERENCE: R1— R2, • For , ,—, the operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must have the same number of attributes, and the domains of corresponding attributes must be compatible; that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n. This condition is called union compatibility. • The resulting relation for , , or — has the same attribute names as the first operand relation R1 (by convention). CM036 Advanced DatabasesLecture 1: Relational Models
CARTESIAN PRODUCT R(A1, A2, ..., Am, B1, ..., Bn) R1(A1, A2, ..., Am) R2(B1, ..., Bn) • A tuple t exists in R for each combination of tuples t1 from R1 and t2 from R2such that: t[A1, A2, ..., Am] = t1 and t[B1, B2, ..., Bn] = t2 • The resulting relation R has n1 + n2 columns • If R1 has n1 tuples and R2 has n2 tuples, then R will have n1*n2 tuples. CM036 Advanced DatabasesLecture 1: Relational Models
Obviously, CARTESIAN PRODUCT is useless if alone, since it generates all possible combinations. It can combine related tuples from two relations in a more informative way if followed by the appropriate RESTRICT operation Example:Retrieve a list of the names of dependents for each femaleemployee FEMALE_EMPS sSEX=‘F’(EMPLOYEE) EMPNAMES FNAME,LNAME,SSN(FEMALE_EMPS) EMP_DEPENDENTS EMPNAMES DEPENDENT ACTUAL_DEPENDENTS s SSN=ESSN(EMP_DEPENDENTS) RESULT FNAME,LNAME,DEPENDENT_NAME(ACTUAL_DEPENDENTS) CM036 Advanced DatabasesLecture 1: Relational Models
2.3 JOIN Operations • THETA JOIN: Similar to a CARTESIAN PRODUCT followed by a RESTRICT. The condition c is called a join condition. R(A1, A2, ..., Am, B1, B2, ..., Bn) R1(A1, A2, ..., Am) ⋈c R2 (B1, B2, ..., Bn) • EQUIJOIN: The join condition c includes equality comparisons involving attributes from R1 and R2. That is, c is of the form: (Ai=Bj) AND ... AND (Ah=Bk); 1<i,h<m, 1<j,k<n In the above EQUIJOIN operation: • Ai, ..., Ah are called the join attributes of R1 • Bj, ..., Bk are called the join attributes of R2 Example: Retrieve each department's name and its manager's name: T DEPARTMENT ⋈MGRSSN=SSN EMPLOYEE RESULT DNAME,FNAME,LNAME(T) CM036 Advanced DatabasesLecture 1: Relational Models
DEPT_MGRDNAME,…,MGRSSN,…LNAME,…,SSN…(DEPARTMENT ⋈MGRSSN=SSN EMPLOYEE) CM036 Advanced DatabasesLecture 1: Relational Models
NATURAL JOIN • In an EQUIJOIN R R1⋈c R2, the join attribute of R2 appear redundantly in the result relation R. In a NATURAL JOIN, the join attributes of R2 are eliminated from R. The equality is implied and there is no need to specify it. The form of the operator is R R1⋈(join attributes of R1),(join attributes of R2) R2 Example:Retrieve each employee's name and the name of the department he/she works for: T EMPLOYEE ⋈(DNO),(DNUMBER)DEPARTMENT RESULT FNAME,LNAME,DNAME(T) • If the join attributes have the same names in both relations, they need not be specified and we can write R R1 * R2. Example:Retrieve each employee's name and the name of his/her supervisor: SUPERVISOR(SUPERSSN,SFN,SLN)SN,FNAME,LNAME(EMPLOYEE) T EMPLOYEE * SUPERVISOR RESULT FNAME,LNAME,SFN,SLN(T) CM036 Advanced DatabasesLecture 1: Relational Models
Note: In the original definition of NATURAL JOIN, the join attributes were required to have the same names in both relations. • There can be a more than one set of join attributes with a different meaning between the same two relations. For example: JOIN ATTRIBUTESRELATIONSHIP EMPLOYEE.SSN= EMPLOYEE manages DEPARTMENT.MGRSSN the DEPARTMENT EMPLOYEE.DNO= EMPLOYEE works for DEPARTMENT.DNUMBER the DEPARTMENT Example:Retrieve each employee’s name and the name of the department he/she works for: T EMPLOYEE ⋈DNO=DNUMBERDEPARTMENT RESULT FNAME,LNAME,DNAME(T) CM036 Advanced DatabasesLecture 1: Relational Models
A relation can have a set of join attributes to join it with itself : JOIN ATTRIBUTESRELATIONSHIP EMPLOYEE(1).SUPERSSN= EMPLOYEE(2) supervises EMPLOYEE(2).SSN EMPLOYEE(1) • One can think of this as joining two distinct copies of the relation, although only one relation actually exists • In this case, renaming can be useful Example:Retrieve each employee’s name and the name of his/her supervisor: SUPERVISOR(SSSN,SFN,SLN) SSN,FNAME,LNAME(EMPLOYEE) T EMPLOYEE ⋈SUPERSSN = SSSNSUPERVISOR RESULT FNAME,LNAME,SFN,SLN(T) CM036 Advanced DatabasesLecture 1: Relational Models
2.4 Complete Set of Operations • All the operations discussed so far can be described as a sequence of only the operations RESTRICT, PROJECT, UNION, SETDIFFERENCE, and CARTESIAN PRODUCT. • Hence, the set {s , , ,—, } is called a complete set of relational algebra operations. Any query language equivalent to these operations is called relationally complete. • For database applications, additional operations are needed that were not part of the original relational algebra. These include: • 1. Aggregate functions and grouping. • 2. OUTER JOIN and OUTER UNION. CM036 Advanced DatabasesLecture 1: Relational Models
2.4 Additional Relational Operations • AGGREGATE FUNCTIONS Functions such as SUM, COUNT, AVERAGE, MIN, MAX are often applied to sets of values or sets of tuples in database applications <grouping attributes><function list>(R) • The grouping attributes are optional Example:Retrieve the average salary of all employees (no grouping needed): R(AVGSAL) AVERAGE SALARY(EMPLOYEE) Example:For each department, retrieve the department number, the number of employees, and the average salary: R(DNO,NUMEMPS,AVGSAL) DNOCOUNT SSN, AVERAGE SALARY(EMPLOYEE) • DNO is called the grouping attribute in the above example CM036 Advanced DatabasesLecture 1: Relational Models
OUTER JOIN • In a regular EQUIJOIN or NATURAL JOIN operation, tuples in R1 or R2that do not have matching tuples in the other relation do notappear in the result. Some queries require all tuples in R1 (or R2 or both) to appear in the result • When no matching tuples are found, nulls are placed for the missing attributes • LEFT OUTER JOIN: R1 R2 lets every tuple in R1 appear in theresult • RIGHT OUTER JOIN: R1 R2 lets every tuple in R2 appear in the result • FULL OUTER JOIN: R1 R2lets every tuple in both R1and R2appear in the result CM036 Advanced DatabasesLecture 1: Relational Models
TEMP EMPLOYEE SSN=MGRSSNDEPARTMENT • RESULT FNAME,MINIT,LNAME,DNAME(TEMP) CM036 Advanced DatabasesLecture 1: Relational Models
3 Relational Calculus • Declarative language based on predicate logic - checks what is true in the database rather then looking to get it • The main difference in comparison with the relational algebra is the introduction of variables which range over attributes or tuples • Two different variations of the relational calculus: • domain calculus - specification of formal properties for different data types, used for describing data • tuple calculus - querying and checking formal properties of stored relational data CM036 Advanced DatabasesLecture 1: Relational Models
3.1 Predicate Calculus • Vocabulary of constant names, functional attributes and predicate properties, used for describing the information • Phrases for specification of constant, functionally dependant or predicated properties build using proper vocabulary terms • Statements about the world, formulated as meaningful phrases, connected through logical connectives in logical sentences • conjunction () • disjunction () • negation () • implication () • equivalence () • Possible quantification of the sentences using existential () or universal() quantifiers, ranging over variables Example:All units registered in the database have unit leaders among the lecturers (x).(Unit (x) (y).(Lecturer(y) Leader(y,x))) CM036 Advanced DatabasesLecture 1: Relational Models
3.2 Relational DB as a Predicate Calculus model • Semantic interpretation of the calculus is given in a set, so either all type domains of the relational schema (domain calculus), or the set of all relations in the database (tuple calculus) can serve a model for it • All meaningful sentences in the model are true or false, so the relation tuples can be interpreted as truthful facts describing the world • The constraints describe logical regularities among the attribute values, so they can be expressed as logical sentences Example:All records of unit leaders have primary keys • Relation names are predicates with a place for each attribute Leader(Lno:INTEGER,Lname:CHAR,Uname:CHAR) - Lno, Lname and Uname are attributes of Leader • Data values in the relation tuples are constants from the domains Leader(2,‘Johnson’,’CS234’) - 2,Johnsonand CS234are constants forming one Leader tuple • Constraints can be stated using quantified variables over domains (x)(y,z) .(Leader(x,y,z)) - x stands for the primary key of Leader CM036 Advanced DatabasesLecture 1: Relational Models
Notes: 1. In predicate calculus sentences contain quantified variables only 2. The sub-expressions following the quantified variables form the scope of quantification; it is usually closed in round parentheses Example:The table for unit leaders has foreign keys to both the lecturers and units tables • Relation names are predicates with place for each attribute Unit(Uno:INTEGER, Uname:CHAR) - Uno and Uname are attributes of relation Unit Lecturer(Lno:INTEGER, Lname:CHAR) - Lno and Lname are attributes of relation LecturerLeader(LUno:INTEGER,Lno:INTEGER,Uno:INTEGER) - LUno, Lno and Uno are attributes of relation Leader • All foreign key constraints can be stated logically using quantified sentences, in which variables range over the respective values ( LUno,Uno,Lno).(Leader(LUno,Uno,Lno) ( Lname).(Lecturer(Lno,Lname)) ( Uname).(Unit(Uno,Uname)) ) CM036 Advanced DatabasesLecture 1: Relational Models
3.3 Relational Calculus as database query language • Queries are logical expressions, which contain non-quantified (free) variables for tuples • The free variables are placeholders for the information, we are looking for from the database relations • The logical expressions, used in the query, are its conditions which need to be met during answering the query Example:Who are the employees with salary greater than 40 000? {e.FNAME,e.LNAME |Employee(e) e.SALARY > 40 000} • An answer is a replacement of the free variables in the query with tuple attributes from the relations used to formulate the query conditions; they should make the statement of the query true in database (pattern matching) Answer:James Borg and Jennifer Wallace Free variablesMatchingvalues e.FNAMEJames Jennifer e.LNAMEBorg Wallace CM036 Advanced DatabasesLecture 1: Relational Models
Notes: 1. The question variables are always listed in front of the expression, separated by | from the question condition; 2. Question condition can contain free variables only from the list in the beginning; all other variables should be quantified • Querying related tables requires check of the corresponding keys for equality Example:Retrieve the name and address of all employees who work for the ‘Research’ department {e.FNAME,e.LNAME,e.ADDRESS | Employee(e) ( d).(Department(d) d.DNAME = ‘Research’ d.DNUMBER = e.DNO ) } CM036 Advanced DatabasesLecture 1: Relational Models
3.4 Equvalence between relational algebra and relational calculus • Relational algebra and relational calculus are equivalent with respect to their ability to formulate queries against relational databases (Codd 1972) • Relational algebra concentrates on the procedural (“how-to”)aspects and because of this it is used as an intermediate language for optimization of database queries • Relational calculus is more appropriate to specify declaratively the model properties (“what is true”), without worrying about the way it is achieved and as such it can be used as a specification or querying language The relational language Query-By-Example (QBE), which is developed by IBM during 70ties and is implemented in Paradox and Access desktop database systems is based on the relational calculus CM036 Advanced DatabasesLecture 1: Relational Models
Summary The variety of relational languages currently used to communicate with databases has different usage in database systems lifecycle • Formal languages have unambiguous syntax and clear semantics, which makes them good for specifications and conceptualisation • Formal languages hide the details of implementation and can’t give very deep insight into the real database processes • Formal languages formalize adequately only part of the database operations, which have semantics inherited from the relational model;they do not cover DCL at al. • SQL is not a programming language, but language for communication with DBMS • SQL as a mixture of DDL, DML and DCL is the ultimate choice for practical relational databases development and administration • SQL syntax is ambiguous when it comes to the order of operations,which requiresdeep knowledge about the way it is interpreted CM036 Advanced DatabasesLecture 1: Relational Models