670 likes | 693 Views
OCL4 Oracle 10g: SQL & PL/SQL Session #2. Matthew P. Johnson CISDD, CUNY June, 2005. Agenda. Last time: FDs This time: Anomalies Normalization Then: SQL. Review examples: finding FDs. Product(name, price, category, color) name, category price category color
E N D
OCL4 Oracle 10g:SQL & PL/SQLSession #2 Matthew P. Johnson CISDD, CUNY June, 2005 Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Agenda • Last time: FDs • This time: • Anomalies • Normalization • Then: SQL Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Review examples: finding FDs • Product(name, price, category, color) name, category price category color Keys are: {name, category} • Enrollment(student, address, course, room, time) student address room, time course student, course room, time Keys are: [in class] Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Next topic: Anomalies • Identify anomalies in existing schema • How to decompose a relation • Boyce-Codd Normal Form (BCNF) • Recovering information from a decomposition • Third Normal Form Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Types of anomalies • Redundancy • Repeat info unnecessarily in several tuples • Update anomalies: • Change info in one tuple but not in another • Deletion anomalies: • Delete some values & lose other values too • Insert anomalies: • Inserting row means having to insert other, separate info / null-ing it out Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Example of anomalies • Redundancy: name, maddress • Update anomaly: Bill moves • Delete anom.: Bill doesn’t pay bills, lose phones lose Bill! • Insert anom: can’t insert someone without a (non-null) phone • Underlying cause: SSN-phone is many-many • Effect: partial dependency ssn name, maddress, • Whereas key = {ssn,phone} SSN Name, Mailing-address SSN Phone Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Decomposition by projection • Soln: replace anomalous R with projections of R onto two subsets of attributes • Projection: an operation in Relational Algebra • Corresponds to SELECT command in SQL • Projecting R onto attributes (A1,…,An) means removing all other attributes • Result of projection is another relation • Yields tuples whose fields are A1,…,An • Resulting duplicates ignored Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) Projection for decomposition R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp A1, ..., An B1, ..., Bm C1, ..., Cp= all attributes, usually disjoint sets R1 and R2 may (/not) be reassembled to produce original R Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Name SSN Mailing-address SSN Phone Michael 123 NY 123 212-111-1111 Hilary 456 DC 123 917-111-1111 Bill 789 Chappaqua 456 202-222-2222 456 914-222-2222 789 914-222-2222 789 212-333-3333 Decomposition example Break the relation into two: • The anomalies are gone • No more redundant data • Easy to for Bill to move • Okay for Bill to lose all phones Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
name buys Person Product price name ssn Relational Model: plus FD’s Normalization: Eliminates anomalies Thus: high-level strategy Conceptual Model: Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Using FDs to produce good schemas • Start with set of relations • Define FDs (and keys) for them based on real world • Transform your relations to “normal form” (normalize them) • Do this using “decomposition” • Intuitively, good design means • No anomalies • Can reconstruct all (and only the) original information Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Decomposition terminology • Projection: eliminating certain attributes from relation • Decomposition: separating a relation into two by projection • Join: (re)assembling two relations • Whenever a row from R1 and a row from R2 have the same value for some atts A, join together to form a row of R3 • If exactly the original rows are reproduced by joining the relations, then the decomposition was lossless • We join on the attributes R1 and R2 have in common (As) • If it can’t, the decomposition was lossy Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Lossless Decompositions Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C) R1(B,C) R2(B,A) R’(A,B,C) should be the same as R(A,B,C) Decompose Recover R’ is in general larger than R. Must ensure R’ = R Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Lossless decomposition • Sometimes the data can be reproduced: • (Word, 100) + (Word, WP) (Word, 100, WP) • (Oracle, 1000) + (Oracle, DB) (Oracle, 1000, DB) • (Access, 100) + (Access, DB) (Access, 100, DB) Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Lossy decomposition • Sometimes it’s not: • (Word, WP) + (100, WP) (Word, 100, WP) • (Oracle, DB) + (1000, DB) (Oracle, 1000, DB) • (Oracle, DB) + (100, DB) (Oracle, 100, DB) • (Access, DB) + (1000, DB) (Access, 1000, DB) • (Access, DB) + (100, DB) (Access, 100, DB) What’swrong? Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Ensuring lossless decomposition • Examples: • name price, so first decomposition was lossless • category name and category price, and so second decomposition was lossy R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) If A1, ..., An B1, ..., Bmor A1, ..., An C1, ..., Cp Then the decomposition is lossless Note: don’t need both Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Quick lossless/lossy example • At a glance: can we decompose into R1(Y,X), R2(Y,Z)? • At a glance: can we decompose into R1(X,Y), R2(X,Z)? Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Normal Forms • First Normal Form = all attributes are atomic • As opposed to set-valued • Assumed all along • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce Codd Normal Form (BCNF) • Fourth Normal Form (4NF) • Fifth Normal Form (5NF) Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Most important: BCNF A simple condition for removing anomalies from relations: A relation R is in BCNF if: If As Bs is a non-trivial dependency in R , then As is a superkey for R I.e.: The left side must always contain a key I.e: If a set of attributes determines other attributes, it must determine all the attributes • Codd: Ted Codd, IBM researcher, inventor of relational model, 1970 • Boyce: Ray Boyce, IBM researcher, helped develop SQL in the 1970s Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
B’s A’s Others R1 R2 BCNF decomposition algorithm Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition //Heuristic: choose Bs as large as possible split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2Until no more violations Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Boyce-Codd Normal Form • Name/phone example is not BCNF: • {ssn,phone} is key • FD: ssn name,mailing-address holds • Violates BCNF: ssn is not a superkey • Its decomposition is BCNF • Only superkeys anything else Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
BCNF motivation • Two big ideas: • Only a key field can determine other fields • Key values are unique • no FD-caused redundancy • Slogan: “Every FD must contain the key, the whole key and nothing but the key.” • More accurate: “Every FD must contain (on the left) a key, a whole key, and maybe other fields. Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
BCNF Decomposition • Larger example: multiple decompositions • {Title, Year, Studio, President, Pres-Address} • FDs: • Title Year Studio • Studio President • President Pres-Address • Studio President, Pres-Address (why?) • No many-many this time • Problem cause: transitive FDs: • Title,year studio president Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
BCNF Decomposition • Illegal: As Bs, where As don’t include key • Decompose: Studio President, Pres-Address • As = {studio} • Bs = {president, pres-address} • Cs = {title, year} • Result: • Studios(studio, president, pres-address) • Movies(studio, title, year) • Is (2) in BCNF? Is in (1) BCNF? • Key: Studio • FD: President Pres-Address • Q: Does president studio? If so, president is a key • But if not, it violates BCNF Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
BCNF Decomposition • Studios(studio, president, pres-address) • Illegal: As Bs, where As don’t include key • Decompose: President Pres-Address • As = {president} • Bs = {pres-address} • Cs = {studio} • {Studio, President, Pres-Address} becomes • {President, Pres-Address} • {Studio, President} Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Roadmap • Want to remove redundancy/anomalies • Convert to BCNF • Find FDs – closure alg • Check if each FD AB is ok • If A contains a key • If not, decompose into R1(A,B), R2(A,rest) • Because AB, this will be lossless • Could check by joining R1 and R2 • Would get no rows not in original Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Decomposition algorithm example • R(N,O,R,P) F = {N O, O R, R N} • Key: N,P • Violations of BCNF: N O, OR, N OR • which kinds of violations are these? • Pick N OR (on board) • Can we rejoin? (on board) • What happens if we pick N O instead? • Can we rejoin? (on board) Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
BCNF summary • BCNF decomposition is lossless • Can reproduce original by joining • Saw last time: Every 2-attribute relation is in BCNF • Final set of decomposed relations might be different depending on • Order of bad FDs chosen • Saw last time: But all results will be in BCNF Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
BCNF summary • BCNF decomp. does not lose data • Resulting relations can be rejoined to obtain the original • In BCNF, there’s no FD-based redundancy • Values in key field are unique • Other FDs are from key fields • everything is “as compressed as possible” Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
BCNF Review • Q: What’s required for BCNF? • Q: What’s the slogan for BCNF? • Q: Who are B & C? Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
BCNF Review • Q: How do we fix a non-BCNF relation? • Q: If AsBs violates BCNF, what do we do? • Q: In this case, could the decomposition be lossy? • Q: Under what circumstances could a decomposition be lossy? • Q: How do we combine two relations? Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Design Goals • Goal for a relational database design is: • No redundancy • Lossless Join • Dependency Preservation • If we cannot achieve this, we accept one of • dependency loss • use of more expensive inter-relational methods to preserve dependencies • data redundancy due to use of 3NF • Interesting: SQL does not provide a direct way of specifying FDs other than superkeys • can specify FDs using assertions, but they are expensive to test Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Recap: You are here • First part of course is done: conceptual foundations • You now know: • E/R Model • Relational Model • Relational Algebra • You now know how to: • Capture part of world as an E/R model • Convert E/R models to relational models • Convert relational models to good (normal) forms • Next: • Create, update, query tables with R.A/SQL • Write SQL/DB-connected applications Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
High-level agenda • Install Oracle • Start SQL • Lab on Oracle system info/SQL Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Next topic: SQL • Standard language for querying and manipulating data Structured Query Language • Many standards: ANSI SQL, SQL92/SQL2, SQL3/SQL99 • Vendors support various subsets/extensions • We’ll do Oracle’s version • Basic form (many more bells and whistles in addition): SELECT attributes FROM relations (possibly multiple, joined) WHERE conditions (selections) Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Data Types in SQL • Characters: • CHAR(20) -- fixed length • VARCHAR(40) -- variable length • Numbers: • BIGINT, INT, SMALLINT, TINYINT • REAL, FLOAT -- differ in precision • MONEY • Times and dates: • DATE • DATETIME -- SQL Server Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Table name “Tables” Attribute names Product Tuples or rows Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Simple SQL Query Product SELECT *FROM ProductWHERE category='Gadgets' “selection” Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Simple SQL Query Product SELECT PName, Price, ManufacturerFROM ProductWHERE Price > 100 “selection” and “projection” Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
A Notation for SQL Queries Input Schema Product(PName, Price, Category, Manfacturer) SELECT Name, Price, ManufacturerFROM ProductWHERE Price > 100 (PName, Price, Manfacturer) Output Schema Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
SQL • SQL SELECT • Sometimes called a “projection” • What goes in the WHERE clause: • x = y, x < y, x <= y, etc. • For number, they have the usual meanings • For CHAR and VARCHAR: lexicographic ordering • Expected conversion between CHAR and VARCHAR • For dates and times, what you expect Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
SQL e.g. • Movies(Title,Year,Length,inColor,Studio,Prdcr#) • Q: How long was Star Wars (1977), in SQL? • Q: Which Fox movies are are at least 100 minutes long, in SQL? Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
SQL e.g. • Reps(ssn, name, etc.) • Clients(ssn, name, rssn) • Q: Who are George’s clients, in SQL? • Conceptually: • PClients.name(sReps.name=“George” and Reps.ssn=rssn(Reps x Clients)) Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
The LIKE operator • s LIKE p: pattern matching on strings • p may contain two special symbols: • _ = any single character • % = zero or more chars • Product(Name, Price, Category, Manufacturer) • Find all products whose name contains ‘gizmo’: SELECT *FROM ProductsWHERE Name LIKE '%gizmo%' Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
The LIKE operator • Q: What it want to search for values containing a ‘%’? • PName LIKE '%%%' won’t work • Instead, must use escape chars • In C/C++/J, prepend ‘\’ • In SQL, prepend an arbitrary escape char: • PName LIKE '%x%%' ESCAPE 'x' Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
More on escape chars • SQL: no official default escape char • In SQL*Plus: default escape char == '\' • Can set with • SQL> set escape x • Other tools, DBMSs: your mileage may vary • SQL string literals put in ' ': • 'mystring' • Single-quote literals escaped with single-quotes: • 'George''s string' Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Q: What about double quotes? • A: Can’t be used in place of single quotes • But can be used when Oracle would otherwise misparse your command, e.g.: • Names with spaces: • create table bad table name (a int, b int); • Reserved words as names: • create table badfieldname(from int, b int); Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Q: Can an escape char be an escape string? • A: No. SQL> select * from newtable where a like '%\%%' escape '\'; A B ---------- ---------- h%i there SQL> select * from newtable where a like '%\%%' escape '\\'; select * from newtable where a like '%\%%' escape '\\' * ERROR at line 1: ORA-01425: escape character must be character string of length 1 Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
More on single-quotes • Dates with DATE: • DATE '1948-05-14' • Timestamps with TIMESTAMP: • TIMESTAMP '1948-05-14 12:00:00' Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005
Eliminating Duplicates SELECT category FROM Product Compare to: SELECTDISTINCT category FROM Product Matthew P. Johnson, OCL5, CISDD CUNY, Sept 2005