460 likes | 564 Views
C20.0046: Database Management Systems Lecture #12. M.P. Johnson Stern School of Business, NYU Spring, 2008. Agenda. Last time: nulls, grouping & aggregation Today: creating tables, modifications, etc. Null/logic review. TRUE AND UNKNOWN = ? TRUE OR UNKNOWN = ? UNKNOWN OR UNKNOWN = ?
E N D
C20.0046: Database Management SystemsLecture #12 M.P. Johnson Stern School of Business, NYU Spring, 2008 M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Agenda • Last time: nulls, grouping & aggregation • Today: creating tables, modifications, etc. M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Null/logic review • TRUE AND UNKNOWN = ? • TRUE OR UNKNOWN = ? • UNKNOWN OR UNKNOWN = ? • X = NULL = ? M.P. Johnson, DBMS, Stern/NYU, Spring 2008
HAVING clauses • Sometimes want to limit which rows may be grouped • Q: How many mins. of film did each rich producer make? • Old = made movies before 1930 • Q: Is HAVING necessary here? SELECT name, sum(length) total FROM Movie, MovieExec WHERE producerSsn = ssn GROUP BY name HAVING min(year) < 1930 M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Review • Examples from sqlzoo.net SELECTL FROM R1, …, Rn WHERE C PL(sC(R1 x … Rn) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
New topic: Modifications • Three kinds of modifications • Insertions • Deletions • Updates • Sometimes “update” used as a synonym for “modification” M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Insertions General form: INSERT INTO R(A1,…., An) VALUES(v1,….,vn) Example: Insert a new purchase to the database: INSERT INTO Knights(name, britnatl, title) VALUES('Bill Gates', 'n', 'KBE') Missing attribute NULL (or other default value) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Insertions • If we’re sure we have all values in the right order, can just say: • Only do this if you’re sure of order in which the table fields were defined INSERT INTO R VALUES(v1,….,vn) INSERT INTO Knights VALUES('R. Giuliani', 'n', 'KBE'); INSERT INTO Knights VALUES('Bernard Kerik', 'n', 'CBE'); M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Insertion the result of a query • Product(name, etc.) • Purchase(buyerssn, prodName, etc.) • Maybe some purchases name missing products • add those to the Product table • Subquery replaces VALUES INSERT INTO R(As) (query) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Insertion example • Product(name, listPrice, category) • Purchase(prodName, buyerName, price) • Premise: data corruption lose some Product data • every product referred to in Purchase should exist in Product, but some are missing Product Purchase M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Insertion example INSERT INTO Product(name) SELECT prodName FROM Purchase WHERE prodName NOT IN (SELECT name FROM Product) Purchase Product Product’ Q: Or do we get: A: Depends on implementation! M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Deletions DELETE FROM Table WHERE condition • General form: • E.g.: • As usual, WHERE can contain subqueries • Depending on the DBMS • Q: How do you delete just one row with SQL simpliciter? • Oracle has the ROWID/ROWNUM pseudo-field… INSERT INTO Knights VALUES('R. Giuliani', 'n', 'KBE'); INSERT INTO Knights VALUES('Bernard Kerik', 'n', 'CBE'); DELETE FROM Knights WHERE name = 'Bernard Kerik'; M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Updates UPDATE Product SET field1 = value1, field2 = value2 WHERE condition • General form: • Example: • As usual, WHERE can contain subqueries UPDATE Product SET price = price/2 WHERE Product.name IN (SELECT product FROM Purchase WHERE Date = DATE'Oct, 25, 1999') M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Next: defining schemata • So far, have done queries and data manipulation • Now doing data definition • Recall data types: • INT or INTEGER (variant: SHORTINT) • FLOAT or REAL: floating-point numbers • DOUBLE PRECISION: • DECIMAL(n,d): • E.g. decimal(5,2): five decimal digits, with the decimal point two positions from the right: e.g. 123.45 • DATE and TIME • Character strings • Fixed length: CHAR(n) • Variable length: VARCHAR(n) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Creating tables CREATE TABLE Table-name ( field1 field-type, field2 field-type, … fieldn field-type ) • Form: • E.g.: No comma! CREATE TABLE People ( name VARCHAR(30), ssn CHAR(9), age INT, city VARCHAR(30), gender CHAR(1), dob DATE ) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Default Values • Specify defaults when creating table: • The default default: NULL CREATE TABLE People ( name VARCHAR(30), ssn CHAR(9), age SHORTINT DEFAULT 100, city VARCHAR(30) DEFAULT 'New York', gender BIT(1), dob DATE DEFAULT DATE '1900-01-01' ) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Deleting and modifying schemata • Delete data, indices, schema: • Delete data and indices: • Either way, exercise extreme caution! • Add or delete attributes: DROP TABLE Person TRUNCATE TABLE Person ALTER TABLE Person ADD phone CHAR(12) Q: What’s put in the new fields? ALTER TABLE Person DROP age M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Constraints in SQL simplest Constraints in SQL: • Keys • foreign keys • Attribute-level constraints • Tuple-level constraints • Global constraints: assertions The more complex the constraint, the harder it is to check and to enforce Mostcomplex M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Constraints in SQL • A constraint = a property that we’d like our database to hold • The system will enforce the constraint by taking some actions: • forbid an update • or perform compensating updates M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Primary Keys & Unique Keys • There is at most one PRIMARY KEY; there can be many UNIQUE • Primary key v. candidate keys • Key fields get indices automatically (why?) CREATE TABLE Product ( productID CHAR(10), name CHAR(30), category VARCHAR(20), price INT, PRIMARY KEY (productID), UNIQUE (name, category) ) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Keys Or: CREATE TABLE Product ( name CHAR(30) PRIMARY KEY, category VARCHAR(20) ) CREATE TABLE Product ( name CHAR(30), category VARCHAR(20) PRIMARY KEY (name) ) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Keys with Multiple Attributes CREATE TABLE Product ( name CHAR(30), category VARCHAR(20), price INT, PRIMARY KEY (name, category) ) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Foreign Key Constraints • prodName is a foreign key to Product(name) • name should be a key in Product • Purchase ~ Product is many-one • NB: referenced field specified with parens, not dot • Referenced and referring fields should/must be indexed (why?) Referentialintegrityin SQL CREATE TABLE Purchase ( prodName CHAR(30) REFERENCES Product(name), date DATETIME ) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Product Purchase M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Foreign Key Constraints • Or: • (name, category) must be a key (primary/unique) in Product (why?) CREATE TABLE Purchase ( prodName CHAR(30), category VARCHAR(20), date DATETIME, FOREIGN KEY (prodName, category) REFERENCES Product(name, category) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
What happens during updates? Types of updates: • In Purchase: insert/update • In Product: delete/update Product Purchase M.P. Johnson, DBMS, Stern/NYU, Spring 2008
What happens during updates? • SQL has three policies for maintaining referential integrity: • Reject: violating modifications (default) • Cascade: after a delete/update do a delete/update • Set-null: set foreign-key field to NULL M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Constraints on attributes and tuples • Constraints on attributes: • NOT NULL -- obvious meaning... • Can combine NOT NULL with foreign key constraints • CHECK condition -- any condition on row itself • Some DBMS support subqueries here, but many don’t • Constraints on tuples • CHECK condition M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Tuple-level constraint How is this different from aForeign Key? CREATE TABLE Purchase ( prodName CHAR(30) CHECK (prodName IN SELECT Product.name FROM Product), date DATETIME NOT NULL ) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
General Assertions • Defined in latest SQL standard, but isn’t actually implemented • Approximated in MySQL and Oracle as stored procedures • PL/SQL in Oracle CREATE ASSERTION myAssert CHECK (NOT EXISTS( SELECT Product.name FROM Product, Purchase WHERE Product.name = Purchase.prodName GROUP BY Product.name HAVING count(*) > 200) ) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
New: Indices • Important for speeding up query processing • Index on field(s) = data structure that makes searches/comparisons on those fields fast • Suppose we have a relation • Person (name, age, city) • Sequential scan of the whole Person file may take a very long time SELECT * FROM Person WHERE name = 'Waksal, Sam' M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Creating Indices • Syntax: • Here: • Now searching by name is much faster • How much faster? • Log-time, say • Base-what? doesn’t matter, but say 2 • If all New Yorkers, #comparisons: • 8000000 log2(8000000) ~= 23 • (i.e., 223 ~= 8000000) CREATE INDEX index-name ON R(fields) CREATE INDEX nameIndex ON Person(name) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
How do indices work? • What the data structure? • Different possibilities • 1st intuition: index on field f is an ordered list of all values in the table’s f field • each item has address (“rowid”) of its row • Where do we get the ordered list? • 2nd intuition: put all f values in a BST • searching BST takes log time (why?) • DBMSs actually use a variant: B+Tree • Later… M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Creating Indices • Indexes can be useful in range queries too: CREATE INDEX ageIndex ON Person (age) SELECT * FROM Person WHERE age > 25 M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Using indices • Indices can be created on multiple attributes: • Helps in: • And in: • But not in: CREATE INDEX doubleNdx ON Person (lname, fname) SELECT * FROM Person WHERE fname='Sam' AND lname = 'Waksal' SELECT * FROM Person WHERE lname='Waksal' Idea: our sorted list is sorted on age;city, not city;age SELECT * FROM Person WHERE fname='Sam' Q: In Movie, should index be on year;title or title;year? M.P. Johnson, DBMS, Stern/NYU, Spring 2008
The Index Selection Problem • Indices speed up queries dramatically, so… • Why not just index all (sequences of) fields? • how does the list/B+Tree stay up to date? • Given a workload: a set of SQL queries and their freqs • Q: What indices will speed up the workload? • Answers: • Attributes in WHERE clauses (queries) favor an index • Attributes in INSERT/UPDATE/DELETEs discourage an index • In many DBMSs: your primary/foregin key fields get indexed automatically (why?) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Next: Views • Stored relations physically exist and persist • Views are relations that don’t • in some texts, “table” = stored relation = “base table” • Basically names/references given to queries • maybe a relevant subset of a table • Employee(ssn, name, department, project, salary) • Payroll has access to Employee, others only to Developers CREATE VIEW Developers AS SELECT name, project FROM Employee WHERE department = 'Dev' M.P. Johnson, DBMS, Stern/NYU, Spring 2008
A Different View • Person(name, city) • Purchase(buyer, seller, product, store) • Product(name, maker, category) • We have a new virtual table: NYCview(buyer, seller, product, store) CREATE VIEW NYCview AS SELECT buyer, seller, product, store FROM Person, Purchase WHERE Person.city = 'New York' AND Person.name = Purchase.buyer M.P. Johnson, DBMS, Stern/NYU, Spring 2008
A Different View • Now we can query the view: CREATE VIEW NYCview AS SELECT buyer, seller, product, store FROM Person, Purchase WHERE Person.city = 'New York' AND Person.name = Purchase.buyer SELECT name, NYCview.store FROM NYCview, Product WHERE NYCview.product = Product.name AND Product.category = 'Camera' M.P. Johnson, DBMS, Stern/NYU, Spring 2008
What happens when we query a view? SELECT name, NYCview.store FROM NYCview, Product WHERE NYCview.product = Product.name AND Product.category = 'Camera' SELECT name, Purchase.store FROM Person, Purchase, Product WHERE Person.city = 'New York' AND Person.name = Purchase.buyer AND Purchase.poduct = Product.name AND Product.category = 'Camera' M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Can rename view fields CREATE VIEW NYCview(NYCbuyer, NYCseller, prod, store) AS SELECT buyer, seller, product, store FROM Person, Purchase WHERE Person.city = 'New York' AND Person.name = Purchase.buyer M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Types of Views • Views discussed here: • Used in databases • Computed only on-demand – slow at runtime • Always up to date • Sometimes talk about “materialized” views • Used in data warehouses • Pre-computed offline – fast at runtime • May have stale data • Maybe more later… M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Updating Views How to insert a tuple into a table that doesn’t exist? Employee(ssn, name, department, project, salary) CREATE VIEW Developers AS SELECT name, project FROM Employee WHERE department = 'Development' If we make the following insertion: INSERT INTO Developers VALUES('Bill', ‘Word') It becomes: INSERT INTO Employee(ssn, name, dept, project, sal) VALUES(NULL, 'Bill', NULL, 'Word', NULL) M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Non-Updatable Views • Person(name, city) • Purchase(buyer, seller, product, store) • How can we add the following tuple to the view? ('NYC', 'The Wiz') • We don’t know the name of the person who made the purchase • cannot set to NULL (why?) CREATE VIEW CityStore AS SELECT Person.city, Purchase.store FROM Person, Purchase WHERE Person.name = Purchase.buyer M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Finally: R.A./SQL has limitations • Can easily find Alice’s direct subordinates: • But: find all of King’s underlings • Cannot compute “transitive closure” • Cannot express in R.A./SQL! • SQL is not “Turing-Complete” M.P. Johnson, DBMS, Stern/NYU, Spring 2008
Midterm • Know all steps of project so far: • Draw E/R from dataset description • Convert E/R to relations • Normalize badly designed relations • Indices: • Why important • How BSTs work • Constraints • Why important • Basic concepts • SQL! M.P. Johnson, DBMS, Stern/NYU, Spring 2008