560 likes | 575 Views
CIS 671 Introduction to Database Systems II Introduction. S. Parthasarathy. CIS 671 Course Outline. Review of basic database material (CIS 670) Object-Oriented and Object Relational Database Systems Data Warehousing and Data Mining Active Databases Introduction to Storage Structures
E N D
CIS 671Introduction to Database Systems II Introduction S. Parthasarathy Query Languages
CIS 671 Course Outline • Review of basic database material (CIS 670) • Object-Oriented and Object Relational Database Systems • Data Warehousing and Data Mining • Active Databases • Introduction to Storage Structures • Emerging Technologies Query Languages
Relations • Relation schema R or R(A1, A2, …, An) • fixed set of attributes A1, A2, …, An • each attribute Aj corresponds to exactly one of the underlying domains Dj (j = 1, 2, …, n), not necessarily distinct. Example: EMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION) note: EMPNO and MGR take values from the same domain, the set of valid employee numbers N, the number of attributes in the relation scheme is called the degree of the relation. Degree 1 unary degree 3 ternary degree 2 binary degree n n-ary Query Languages
Keys • A relation is a set of tuples. • All elements of a set are distinct. • Hence all tuples must be distinct. • There may also be a subset of the attributes with the property that values must be distinct. Such a set is called a superkey. • SK a set of attributes • t1 and t2 tuples • t1 [SK] t2 [SK] Guaranteed by the “real world”. Query Languages
Candidate Key -a minimal superkey • A set of attributes, CK is a superkey, • but no proper subset is a superkey. Example EMPNO NAME ??? Primary Key - one arbitrarily chosen candidate key. Example EMPNO Query Languages
Theoretical Properties of Relations • Tuples are unordered. • There are no tuple duplicates. • Attributes are unordered • All attribute values are atomic. • Implementations vary in terms of how closely they follow the above properties. Query Languages
Relational Database Schema S • Set of Relation Schemas S = {R1, R2, …, Rp} • Set of integrity constraints, IC • Integrity constraints will be discussed next. Query Languages
Relational Database (Instance) DB of Schema S • Set of relation instances, DB • DB = {r1, r2, …, rp} • Each rk is an instance of relation RK • The rk’s satisfy the integrity constraints, IC Query Languages
Database Example - Schema EMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION) EMPNO from domain EmployeeNumbers ... Integrity constraints to be added DEPT( DNO, DNAME, LOC) DNO from domain DepartmentNumber Integrity constraints to be added Query Languages
Database Example- Database (instance) EMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION) 1234 Smith D2 J1 4567 30K 10 2222 Jones D2 J2 4567 25K 4567 Blue D1 J4 9999 40K 9999 Green D3 J5 100K DEPT( DNO, DNAME, LOC) D1 xxx yyy D2 aaa yyy D3 bbb zzz Query Languages
Relational Integrity Constraints • Not included in Codd’s original definition of relations. • Not supported in initial commercial relational database systems. • Now supported in all major relational database systems. Query Languages
Integrity Rules:Applicable to All Databases • Key constraint • Entity integrity • Referential integrity Query Languages
The Three Integrity Rules • Key constraint Candidate keys are specified for each relation schema. Candidate key values must be unique for every tuple in any relation instance of that schema. Query Languages
The Three Integrity Rules, Continued • Entity integrity: • No attribute participating in the primary key is allowed to accept null values. Justification: we must be able to identify each tuple uniquely. Query Languages
Foreign KeysEMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION)DEPT( DNO, DNAME, LOC) DNO in EMP - should be allowed only if that DNO appears as primary key in relation DEPT. MGR in EMP - should be allowed only if that MGR appears as primary key in relation EMP. The attributes DNO and MGR called foreign keys in relation EMP. Query Languages
The Three Integrity Rules, Continued • Referential integrity: • If relation schema R1 includes a foreign key, FK, matching the primary key of relation schema R2, then a value of FK in a tuple t1 of R1 must either • Be equal to the value of PK in some tuple t2 of R2, i.e. T1 [FK] = t2 [PK], or • Be wholly null (i.e. Each attribute value participating in that FK value must be null). Note: R1 and R2need not be distinct. Justification: if some tuple t1 references some tuple t2 , then tuple t2 must exist. Query Languages
Implications of the Three Integrity Rules • System could reject the operation. • System could accept the operation, but perform some additional operations so as to reach a new legal state. What should happen if an operation on the database is about to cause the violation of one of the integrity rules? i.e. about to put the database into an “illegal” state. Query Languages
Query Languages Review Srinivasan Parthasarathy Query Languages
SQL - Parts of the Language • Data Definition Language (DDL) • create table • create index • Data Manipulation Language (DML) • select (retrieve) • update • insert • delete Query Languages
Select - Basic Form(select from where) • Cartesian product followed by select and project. • select project-list • from Cartesian-product-list • where select-condition(s) • Abstract example: Given tables R(A,B) and S(B,C) • select R.A, R.B, S.C • from R, S • where R.A > 10 BUT - Duplicates NOT eliminated. Bag vs. Set. Query Languages
CASE STUDY EXAMPLE EMP( EMPNO, … , DNO, JOB, . . .) 100 D3 electrician 200 D3 plumber 300 D3 electrician 400 D1 electrician 500 D1 plumber 600 D1 carpenter 700 D2 electrician 800 D2 carpenter 900 D2 electrician Query Languages
Select as a JOIN • Cartesian product followed by • select (“join” & “select” conditions) and project. • select project-list • from Cartesian-product-list • where join-condition • and select-condition • Abstract example: Given tables R(A,B) and S(B,C) • select R.A, R.B, S.C • from R, S • where R.B = S.B /* join condition */ • and R.A > 10 /* “select” condition */ How is this related to these relational operators? Select (), Project (),Join: Natural (*) Query Languages
Using EMP and DEPTFrom Relational Algebra to SQL EMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION)DEPT( DNO, DNAME, LOC) • List the names, employee numbers, department numbers and locations for all clerks. • select NAME, EMPNO, E.DNO, LOC • from EMP E, DEPT D • where E.DNO = D.DNO /* join condition */ • and JOB = ‘Clerk’ /* “select” condition */ • . Query Languages
Duplicates in project - must use explicit distinct List the different department numbers in theEMP table (eliminate duplicates). select distinctDNO from EMP • Specify sort order List employee number, name, and salary of employees in department 50. select EMPNO, NAME, SAL from EMP where DNO = 50 order byEMPNO Query Languages
Union List the numbers of those departments which have an employee named ‘Smith’ or are located in ‘Columbus’. select DNO from EMP where ENAME = ‘Smith’ union select DNO from DEPT where LOC = ‘Columbus’ Duplicates ARE eliminated by default. union all - leaves duplicates Query Languages
Functions and Groups • List the departments (DNO) and the average salary of each. select DNO, avg(SAL) from EMP E, DEPT D where E.DNO = D.DNO group by DNO • List the departments (DNO, DNAME) in which the average employee salary < $25,000. select DNO, DNAME from EMP E, DEPT D where E.DNO = D.DNO group by DNO, DNAME havingavg(SAL) < 25000 Query Languages
Nested Select: No analog in Relational Algebra • List names of employees in departments 25, 47 and 53. select NAME from EMP where DNO in (25, 47, 53) • List names of employees who work in departments in Ann Arbor. select NAME from EMP where DNO in ( select DNO from DEPT where LOC = ‘Ann Arbor’ ) Query Languages
Null Values • All of the following conditions are always false. • null > 25 null < 25 null = 25 null <> 25 • null >= 25 null <= 25 null = null null <> null • However we can use the following: • select NAME • from EMP • where SAL < 35000 • or SAL is null Query Languages
Q. Find the numbers of those departments that have employees who can do some job that is done by an employee in department D3. Answer: D1, D2, D3 exists, select DNO from DEPT where exists ( select * from EMP ED3 where ED3.DNO = ‘D3’ and exists ( select * from EMP EY where EY.JOB = ED3.JOB and EY.DNO = DEPT.DNO)) The order of the two “selects” does not matter. Query Languages
Ok, lets see why this works Since there is an external reference to DEPT within the second Nested select we will execute the nested select for each DEPT Tuple. Assume the DEPT table contains only three tuples corresponding to D1, D2 and D3 in that order. The first tuple we will evaluate is the DEPT.DNO = D1 The first nested select will simply highlight the tuples in department D3. The second nested select will point to tuples related to the tuple currently pointed to within the department table.
EY.EMPNO, EY.DNO, EY.JOB,..) 100 D3 electrician 200 D3 plumber 300 D3 electrician 400 D1 electrician 500 D1 plumber 600 D1 carpenter 700 D2 electrician 800 D2 carpenter 900 D2 electrician ED3.EMPNO, ED3.DNO, ED3.JOB,..) 100 D3 electrician 200 D3 plumber 300 D3 electrician 400 D1 electrician 500 D1 plumber 600 D1 carpenter 700 D2 electrician 800 D2 carpenter 900 D2 electrician select DNO from DEPT where exists ( select * from EMP ED3 where ED3.DNO = ‘D3’ and exists ( select * from EMP EY where EY.JOB = ED3.JOB and EY.DNO = DEPT.DNO))
Q. Find the numbers of those departments that have employees who can do allthe jobs that are done by an employee in department D3. Answer: D1, but not D2 EMP( EMPNO, … , DNO, JOB, . . .) 100 D3 electrician 200 D3 plumber 300 D3 electrician 400 D1 electrician 500 D1 plumber 600 D1 carpenter 700 D2 electrician 800 D2 carpenter 900 D2 electrician for all, Query Languages
Q. Find the numbers of those departments that have employees who can do allthe jobs that are done by an employee in department D3. Answer: D1,D3, but not D2 select DNO from DEPT where for all ( select * from EMP ED3 where ED3.DNO = ‘D3’ and exists ( select * from EMP EY where EY.JOB = ED3.JOB and EY.DNO = DEPT.DNO)) However no for all exists in SQL. Query Languages
Q. Find the numbers of those departments that have employees who can do allthe jobs that are done by an employee in department D3. Answer: D1,D3, but not D2 However no for all exists in SQL. Use two not exists. (see page 207 for a good list of mathematical logic operations/tricks) select DNO from DEPT where not exists ( select * from EMP ED3 where ED3.DNO = ‘D3’ and notexists ( select * from EMP EY where EY.JOB = ED3.JOB and EY.DNO = DEPT.DNO)) Query Languages
Ok, lets see why this works Since there is an external reference to DEPT within the second Nested select we will execute the nested select for each DEPT Tuple. Assume the DEPT table contains only three tuples corresponding to D1, D2 and D3 in that order. The first tuple we will evaluate is the DEPT.DNO = D1 The first nested select will simply highlight the tuples in department D3. The second nested select will point to tuples related to the tuple currently pointed to within the department table.
ED3.EMPNO, ED3.DNO, ED3.JOB,..) 100 D3 electrician 200 D3 plumber 300 D3 electrician 400 D1 electrician 500 D1 plumber 600 D1 carpenter 700 D2 electrician 800 D2 carpenter 900 D2 electrician EY.EMPNO, EY.DNO, EY.JOB,..) 100 D3 electrician 200 D3 plumber 300 D3 electrician 400 D1 electrician 500 D1 plumber 600 D1 carpenter 700 D2 electrician 800 D2 carpenter 900 D2 electrician select DNO from DEPT where not exists ( select * from EMP ED3 where ED3.DNO = ‘D3’ and notexists ( select * from EMP EY where EY.JOB = ED3.JOB and EY.DNO = DEPT.DNO))
Q12. Find the numbers of those departments that have employees who can do allthe jobs that are done by an employee in department D3. Answer: D1, but not D2 select DNO from DEPT where not exists ( select * from EMP ED3 where ED3.DNO = ‘D3’ and notexists ( select * from EMP EY where EY.JOB = ED3.JOB and EY.DNO = DEPT.DNO) ) and DNO <> ‘D3’ Eliminate department D3. Query Languages
980 900950 700800600 850 100 200 300400 500 SQL:1999 (SQL 3)Recursive Closure Q 16. List all the superiors of EMPNO 500. 600,950, 980 Q 17. List all those supervised by EMPNO 900. 700, 800, 100, 200, 300, 400 How to express these queries? Query Languages
Q 16. Given EMP (EMPNO, MGR, ...),list all the superiors of EMPNO 500.Generate SUPERIORS (EMPNO, MGR) with recursive SUPERIORS(EMPNO, MGR) as (select EMPNO, MGR from EMP where EMPNO = 500 union all select SUPERIORS.EMPNO, EMP.MGR from SUPERIORS, EMP where EMP.EMPNO = SUPERIOR.MGR) select MGR from SUPERIORS Initial table The recursion Just the superiors600 of500. 950 980 Query Languages
Q 16. List all the superiors of EMPNO 500. SUPERIORS (EMPNO, MGR) 500 600 Initial table EMP( EMPNO, MGR,...) 100 700 200 700 300 800 400 800 500 600 600 950 700 900 800 900 850 950 900 980 950 980 980 980 500 950 500 980 First addition Second addition Query Languages
Entity-Relationship (ER) Model(Peter P.-S. Chen) Review CIS 671 S. Parthasarathy Query Languages
Helpful for conceptualizing the Real World • Entity: a thing that exists • e.g. person, automobile, department, employee • Entity Set: a group of similar entities • e.g. all persons, all automobiles, all employees • Relationship: association between entities • e.g. a person is assigned to a department • Relationship Set : set of similar relationships • Attribute: property of an entity or relationship • e.g. person - name, address • Domain: set of values allowed for an attribute Query Languages
Example Employees E#, ENAME, ADDRESS Departments D#, DNAME Projects P#, PNAME Constraints (cardinality) 1. Employees may be assigned to only 1department at a time. 2. Employees may be assigned to severalprojects at once, each with an associated %time. Constraints (participation) 3. Employees must be assigned to a department. 4. Employees need not be assigned to any projects. Query Languages
D# DNAME Is In E# ENAME ADDRESS P# %TIME PNAME Complete Picture Participation Department 1 N Cardinality Employee N Assigned To M Project Query Languages
Is In %TIME Example: Relationships and Attributes 1. Cardinality N 1 Employee Department 1. Employees may be assigned to only 1department at a time. 2. N M Project Employee Assigned To %TIME 2. Employees may be assigned to severalprojects at once, each with an associated %time. Query Languages
Is In %TIME %TIME Example: Relationships and Attributes Participation N 1 Employee Department 3. Total • 3. Employees must be assigned to a department. • Total: Each entity must be included at least once in the relationship. N M Project Employee Assigned To 4.Partial • 4. Employees need not be assigned to any projects. • Partial: Each entity instance need not be included at least once in the relationship. Query Languages
LName LName FName FName E# Entity-Relationship Enhancements: Attributes 1. Simple (atomic) vs. Composite Attributes • Simple • Composite Employee Employee Employee Name Query Languages
Has Major Entity-Relationship Enhancements: Attributes 2. Single-valued vs. Multi-valuedAttributes • Multi-valued • Multi-valued as Entity Major Student Major N M Student Major_Program Query Languages
Member Of AvgSal Salary Entity-Relationship Enhancements: Attributes 3. Derived Attributes - Include in Department the average salary of the employees in the department. Employee Department Query Languages
Dependent Dependents Of Entity-Relationship Enhancements: Entities • Weak Entity Type, Identifying Relationship Type, Partial Key • E.g. Represent all the dependents of each employee given his or her name. No ID number. Problem: Names are not unique across employees. 1 N Employee Name SSN Name Query Languages