690 likes | 1.18k Views
CS157A. Lecture 11. Functional Dependency Graphs and SQL. Prof. Sin-Min Lee Department of Computer Science San Jose State University. Data Normalization.
E N D
CS157A Lecture 11 Functional Dependency Graphs and SQL Prof. Sin-Min Lee Department of Computer Science San Jose State University
Data Normalization • Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data. • The process of decomposing relations with anomalies to produce smaller, well-structured relations. • Primary Objective: Reduce Redundancy,Reduce nulls, • Improve “modify” activities: • insert, • update, • delete, • but not read • Price: degraded query, display, reporting
Functional Dependency and Keys • Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute. • Candidate Key: Each non-key field is functionally dependent on every candidate key.
Functional dependency • a constraint between two attributes (columns) or two sets of columns • A B if “for every valid instance of A, that value of A uniquely determines the value of B” • or …A B if “there exists at most one value of B for every value of A”
Functional Dependencies R X Y Z • FDs defined over two sets of attributes: X, Y Ì R • Notation: X à Y reads as “X determines Y” • If X à Y, then all tuples that agree on X must also agree on Y 1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9
Functional Dependencies Graph(example) X Y Z X Y Z 1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9
R( A B C D) A B C D * 1 1 2 1 2 1 3 2 3 1 2 3 2 1 2 3 1 2 1 2 3 1 1 2 A B C D * Two Candidate Keys {AC}, {AD} Super Keys {AC}, {AD} {ABC}, {ACD} {ABD}, {ABCD}
… functional dependency • some examples • SSN Name, Address, Birthdate • VIN Make, Model, Color • note: the LHS is the determinant • so functional dependency is the technical term for determines
Candidate Keys • an attribute (or set of attributes) that uniquely identifies a row • primary key is a special candidate key • values cannot be null • e.g. • ENROLL (Student_ID, Name, Address, …) • PK = Student_ID • candidate key = Name, Address
… candidate key • a candidate key must satisfy: • unique identification. • implies that each nonkey attribute is functionally dependent on the key (for not(A B) to be true, A must occur more than once (with a different B), or A must map to more than one B in a given row) • nonredundancy • no attribute in the key can be deleted and still be unique • minimal set of columns (Simsion)
keys and dependencies EMPLOYEE1 (Emp_ID, Name, Dept_Name, Salary) determinant functional dependency
EMPLOYEE2 (Emp_ID, Course_Title, Name, Dept_Name, Salary, Date_Completed) not fully functionally dependant on the primary key
determinants & candidate keys • candidate key is always a determinant (one way to find a determinant) • determinant may or may not be a candidate key • candidate key is a determinant that uniquely identifies the remaining (nonkey) attributes • determinant may be • a candidate key • part of a composite candidate key • nonkey attribute
Introduction • Data integrity maintained by various constraints on data • Functional dependencies are application constraints that help DB model real-world entity • Join dependencies are a further constraint that help resolve some FD constraint limitations
What is SQL? • It is a language used to communicate with a database. • SQL statements are used to perform tasks such as update or retrieve data from a database. • Standard SQL commands: “Select”, “Create”, “Insert”, “Update”, “Delete”, and “Drop” can be used to accomplish almost everything that a database needs to do.
History • Originally developed by IBM at San Jose Research Laboratory (a.k.a. Almaden Research Center) • Oracle, Sybase, Microsoft SQL server are some common relational database management systems that use SQL
Table Basics • Data or information for the database are stored in tables. • Tables are uniquely identified by their names and are comprised of columns and rows. • Rows contain data for the columns.
Basic Structure • The basic structure of an SQL expression consists of three clauses: select, from, and where. • The select clause corresponds to the projection operation of the relational algebra. It is used to list attributes desired in the result. • The from clause corresponds to the Cartesian-product operation of the relational algebra. It lists the relations to be scanned in the evaluation of the expression. • The where clause corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations that appear in the from clause.
Selecting Data • The Select statement is used to query the database and retrieve selected data that match the criteria that you specify • Example: select “column1” [,”column2”, etc] from “tablename” [where “condition”]; [ ] = optional continued…
Example: select “column1” [,”column2”, etc] from “tablename” [where “condition”]; [ ] = optional The column names that follow the select keyword determine which columns will be returned in the results. The table name that follows the keyword from specifies the table that will be queried to retrieve the desired results. cont… Selecting Data cont…
Example: select “column1” [,”column2”, etc] from “tablename” [where “condition”]; [ ] optional The where clause is optional and specifies which data values or rows will be returned or displayed, based on the criteria described on the condition. Conditional selections used in the where clause: =, >, <, >=, <=, and <> (not equal to). Selecting Data cont…
Problem: Find all loan numbers for loans made at the Perryridge branch with loan amounts greater than $1200. Answer: selectloan-number fromloan wherebranch-name = ‘Perryridge’ and amount > 1200 **SQL uses the logical connectives and, or, and not – rather than the mathematical symbols in the where clause. The where Clause
The from clause, by itself, defines a Cartesian product of the relations in the clause. Problem: For all customers who have a loan from the bank, find their names, loan numbers, and loan amount. Answer: select customer-name, borrower.loan- number, amount from borrower, loan whereborrower.loan- number = loan.loan- number The from Clause
The create table statement is used to create a new table. The data types specify what the type of data can be for that particular column. Example of simple create table statement: create table “tablename” (“column1” “data type”, “column2” “data type”, “column3” “data type”); Creating Tables
Most Common Data Types • char (size) – Fixed length character string. Size is specified in parenthesis. Max 255 bytes. • varchar (size) – Variable-length character string. Max size is specified in parenthesis. • number (size) – Number value with a max number of column digits specified in parenthesis. • date – Date value • number (size, d) – Number value with a max number of digits of “size” total, with a max number of “d” digits to the right of the decimal.
Example: Problem: Create a table for a new company that contains the following information about your new employees: first name, last name, title, age, and salary. Answer: Create table myemployees (firstname varchar(30), lastname varchar(30), title varchar (30), age number(2), salary number(8, 2)); Creating Tables cont…
The insert statement is used to insert or add a row of data into the table. Strings should be enclosed in single quotes, and numbers should not. Example 1: Insert into “tablename” (first_column, … last_column) values (first_value,…last_value); Example 2: Insert into employee (first, last, age, address, city, state) values (‘Luke’, ‘Duke’, 45, ‘2130 Boars Nest’, ‘Hazard Co.’, ‘Georgia’); Inserting into a Table
The update statement is used to update or change records that match a specified criteria. This is accomplished by carefully constructing a where clause. Example 1: update phone_book set area_code = 623 where prefix = 979; Example 2: update “tablename” set “columnname” = “newvalue” [,”nextcolumn” = “newvalue2”…] where “columnname” OPERATOR “value” [and | or “column” OPERATOR “value”]; **[ ] = optional Updating Records
The delete statement is used to delete records or rows from the table. To delete an entire row/record, enter “delete from” followed by the table name, the where clause. **If you leave off the where clause, it will delete all records. Example: delete from “tablename” where “columnname” OPERATOR “value” [and | or “column” OPERATOR “value”]; [ ] = optional Deleting Records
The drop table command is used to delete a table and all rows in the table. drop table is different from deleting all of the records in the table. Dropping the table removes the table definition as well as all of its rows. Example: Drop table “tablename” Drop a Table
The order by clause causes the tuples in the result of a query to appear in sorted order. By default, the order by clause lists items in ascending order. Example: To list in alphabetic order all customers who have a loan at the Perryridge branch: select distinctcustomer-name fromborrower, loan whereborrower.loan-number = loan.loan-number and branch-name = ‘Perryridge’ order bycustomer-name Ordering the Display of Tuples
Aggregates • Functions that operate on sets: • COUNT, SUM, AVG, MAX, MIN • Produce numbers (not tables) • Not part of relational algebra SELECT COUNT(*) FROM Professor P SELECT MAX (Salary) FROMEmployee E
Aggregates Count the number of courses taught in S2000 SELECT COUNT (T.CrsCode) FROM Teaching T WHERE T.Semester = ‘S2000’ But if multiple sections of same course are taught, use: SELECT COUNT (DISTINCT T.CrsCode) FROM Teaching T WHERE T.Semester = ‘S2000’
Aggregates: Proper and Improper Usage SELECT COUNT (T.CrsCode), T. ProfId – makes no sense (in the absence of GROUP BYclause) SELECT COUNT (*), AVG (T.Grade) – but this is OK WHERE T.Grade > COUNT (SELECT ….) – aggregate cannot be applied to result of SELECT statement
Grouping • But how do we compute the number of courses taught in S2000 per professor? • Strategy 1: Fire off a separate query for each professor: SELECT COUNT(T.CrsCode) FROM Teaching T WHERE T.Semester = ‘S2000’ AND T.ProfId = 123456789 • Cumbersome • What if the number of professors changes? Add another query? • Strategy 2: define a special grouping operator: SELECT T.ProfId, COUNT(T.CrsCode) FROM Teaching T WHERE T.Semester = ‘S2000’ GROUP BYT.ProfId
GROUP BY - Example Transcript Attributes: -student’s Id -avg grade -number of courses 1234 1234 1234 1234 1234 3.3 4 SELECT T.StudId, AVG(T.Grade), COUNT (*) FROM TranscriptT GROUP BY T.StudId
HAVING Clause • Eliminates unwanted groups (analogous to WHERE clause) • HAVING condition constructed from attributes of GROUP BY list and aggregates of attributes not in list SELECT T.StudId, AVG(T.Grade) AS CumGpa, COUNT (*) AS NumCrs FROM Transcript T WHERE T.CrsCode LIKE ‘CS%’ GROUP BY T.StudId HAVING AVG (T.Grade) > 3.5
Example • Output the name and address of all seniors on the Dean’s List SELECT S.Id, S.Name FROMStudent S, Transcript T WHERE S.Id = T.StudIdAND S.Status = ‘senior’ GROUP BY HAVING AVG (T.Grade) > 3.5 AND SUM (T.Credit) > 90 S.Id -- wrong S.Id, S.Name -- right Every attribute that occurs in SELECT clause must also occur in GROUP BY or it must be an aggregate. S.Name does not.
ORDER BY Clause • Causes rows to be output in a specified order SELECT T.StudId, COUNT (*) AS NumCrs, AVG(T.Grade) AS CumGpa FROM Transcript T WHERE T.CrsCode LIKE ‘CS%’ GROUP BY T.StudId HAVING AVG (T.Grade) > 3.5 ORDER BYDESCCumGpa, ASCStudId
Query Evaluation Strategy • Evaluate FROM: produces Cartesian product, A, of tables in FROM list • Evaluate WHERE: produces table, B, consisting of rows of A that satisfy WHERE condition • Evaluate GROUP BY: partitions B into groups that agree on attribute values in GROUP BY list • Evaluate HAVING: eliminates groups in B that do not satisfy HAVING condition • Evaluate SELECT: produces table C containing a row for each group. Attributes in SELECT list limited to those in GROUP BY list and aggregates over group • Evaluate ORDER BY: orders rows of C
Nested Queries List all courses that were not taught in S2000 SELECT C.CrsName FROM Course C WHERE C.CrsCode NOT IN (SELECT T.CrsCode --subquery FROM Teaching T WHERE T.Sem = ‘S2000’) Evaluation strategy: subquery evaluated once to produces set of courses taught in S2000. Each row (as C) tested against this set.