330 likes | 485 Views
C20.0046: Database Management Systems Lecture #3. Matthew P. Johnson Stern School of Business, NYU Spring, 2005. Admin. Textbooks? This afternoon. Agenda. Last time: E/R models, some design issues This time: More design “carving at the joints” Redundancy
E N D
C20.0046: Database Management SystemsLecture #3 Matthew P. Johnson Stern School of Business, NYU Spring, 2005 M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Admin • Textbooks? • This afternoon M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Agenda • Last time: E/R models, some design issues • This time: More design “carving at the joints” • Redundancy • Whether an element should be an attribute or entity set • Replacing a relationships with entity sets • Constraints • Identifying & specifying key attributes to an entity set • Recognizing other types of single-valued constraints • Representing referential integrity constraints • Identifying & representing general constraints • Weak entity sets M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Review • Multiplicity review: • Square-of? (e.g., (3,9)) • Cube-of? (e.g., (-3,-27)) • Wife-of? • Wife-of-in-certain-other-cultures? M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Design Principles • Faithfulness • Simplicity • Avoiding redundancy • Choice of relationships • Picking elements M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Owned-by Owns Movies Ownings Studios Simplicity • Einstein: Theories should be as simple as possible, but not simpler. • Use as few elements as possible • Minimum required relations • No unnecessary attributes (will you be using this attribute?) • Eliminate “spinning wheels” • Example: how can we simplify this? M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Name Own Studios Movies Length Address StudioName Name Avoiding redundancy • Say everything exactly once • Minimize database storage requirements • More important: prevent possible update errors • simplest but not only e.g.: modify data one place but not the other – more later • Example: Spot the redundancy M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Avoiding redundancy • Say everything exactly once • Minimize database storage requirements • More important: prevent possible update errors • simplest but not only e.g.: modify data one place but not the other – more later • Example: Spot the redundancy Name Own Studios Movies Length Address Phone StudioName Name Redundancy: Movies “knows” the studio two ways M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Name Length Studio SAddress SPhone Pulp Fiction … Miramax NYC 212-… Sylvia … Miramax NYC 212-… Jay & Sil. Bob … Miramax NYC 212-… … Spot more redundancy Length SPhone Name Movies SAddress StudioName Different redundancy: studio info listed for every movie! M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Enrolls Students Courses Assist TA-of TAs Don’t add relships that are implied Suppose each course again has <=1 TA Q: Is the following good design? A: If TAs other than the course’s TA can help students, then yes; if not, then no: we can connect Students and TAs by going through Courses; redundant! M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Correct E/R models may contain loops name category • Person plays multiple roles: • employee of company • buyer of product name price makes Company Product stockprice buys employs Person name ssn address M.P. Johnson, DBMS, Stern/NYU, Spring 2005
More design Q: What’s wrong with this design? • Repeating TA names & IDs – redundant • TA is not TAing any course now lose TA’s data! • TA should get its own ES Course-ID CName Enrolls Students Courses TA-Email TA-Name TA-ID A: M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Opposite problem: Entity or attribute? • Some E/Rs improved by removing entities • Can convert Entity E into attributes of F if • R:FE is many-one • one-one counts because special case • Attributes for E are independent of each other • knowing one att val doesn’t tell us another att val • Then • remove E • add all attributes of E to F M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Course-ID CName Enrolls Students Courses Room TA-Name Entity attribute Course-ID CName Enrolls Students Courses Room Assists TA TA-Name M.P. Johnson, DBMS, Stern/NYU, Spring 2005
CName CID Room TA-Name DBMS 46 123 Howard DBMS 46 123 Wesley … Convert TA entity again? Course-ID Enrolls • No! Multiple TAs allowed • Violates condition (1) • Redundant course data Students Courses CName Assists Room TA TA-Name M.P. Johnson, DBMS, Stern/NYU, Spring 2005
CName TA-Name TA-ID TA-Color DBMS Ralph 678 Green A.Soft. Ralph 678 Green … Convert TA entity again? Course-ID • No! TA has dependent fields • Violates condition (2) • How can it tell? • Redundant TA data Enrolls Students Courses CName Assists Room TA TA-Name TA-ID TA-Favorite-Color M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Entity or attributes? • Should student address be an entity or an attribute? • If student may have multiple addresses, must be entity • campus address, permanent address • attributes cannot be set-valued • If we need to examine structure of address, must be entity • find all students from NYS but not NYC • If attribute, then it’s probably a simple string • no structure! • NB: this choice is a microcosm of entire miniworld • (much) power of a DB comes from the structure imposed on the data M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Larger example DB design • Application: library database. Authors have written books about various subjects; different libraries in the system may carry these books. • Entities (with attributes in parentheses): • Authors (ssn, name, phone, birthdate) • Books (ISDN, title) • Subjects (sname, sid) • Libraries (lname) • Relationships [associating entities in square brackets]: • Wrote-on [Authors, Subjects] • Cover [Libraries, Subjects] • On [Books, Subjects] M.P. Johnson, DBMS, Stern/NYU, Spring 2005
E/R of DB design Name ssn phone birthdate Author ISBN wrote-on On Book SName Subject Title Carries LName Library M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Poor initial design • First design is a poor model of this system • Some info not captured: • How many copies does a lib. have of a given book? • What edition of a book does the library have? • Design problems: • no direct relship associating authors and books • no direct relship associating libraries and books • Common queries complex and difficult/expensive • What libraries carry books by a given author? • What books has a given author written? • Who is the author of a given book? M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Larger example DB design 2 • Application: library database as before • Entities (with attributes in parentheses): • Authors (ssn, name, phone, birthdate) • Books (ISDN, title) • Subjects (sname, sid) • Libraries (lname) • Relations [associating entities in square brackets] (attributes in parentheses): • Wrote [Authors, Books] • Carries [Libraries, Books] (quantity, edition) • On [Books , Subjects] M.P. Johnson, DBMS, Stern/NYU, Spring 2005
E/R of improved DB design Name ssn phone birthdate Author SName wrote On Subject ISBN Book • Rule of thumb: often queried together make closely connected Edition Title Carries Quantity LName Library M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Next topic: Constraints • Review: programmer-defined rules stating what should always be true about consistent databases • Restrictions on data: • Keys (e.g. SSNs uniquely identify people) • Single value constraints (e.g. everyone has 1 father) • Referential Integrity (e.g. person’s record refers to father father must exist) • Domain constraints (e.g. gender in M/F, age in 0..150) • General constraints (e.g. no more than 10 customers per sales rep) • Can’t infer constraints from data • may hold “accidentally” • they are a part of the schema M.P. Johnson, DBMS, Stern/NYU, Spring 2005
E/R keys • Uniquely identifies entity in ES • Attribute or set of attributes • Two entities cannot agree on all key attributes • These attributes determine all others • Every ES should have a key • possibly including all attributes • Primary key attributes underlined • More than one possible key: • Candidate keys, primary key • Practical tip: create intentional key attribute • E.g. SSN, course-id, employee-id, etc. • SSN likely shorter than (name,address) • Prevents quasi-redundancy Person name ssn address M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Single-valued constraints • “at most one” value • sharp arrows • E.g. attributes: could be null or one • Many-one relationships: the “one” part is single-valued. • Can think of key atts as (non-null) single-valued Assists Course TA M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Referential integrity • “Exactly one value” • NOT NULL attributes • Relationships • Non-null value refers to entity that exists • Refer to entity with foreign key • HTML analogy: no broken links • Programming analogy: no dangling pointers • Ways of handling deletion: • Prevent deletion as long as referrer exist • Enforce deletion of all referrers Taught Course Instructor M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Referential integrity – E/R e.g. Enrolls • Insertion – must refer to existing entity • Suppose need to add • course: “Oracle” • instructor: MPJ • Q: Which order? • Q: What if relship were exactly-exactly, say, M(Hs,Ws)? • i.e., referential integrity in both directions? • A: Put both inserts in one xact – later Students Courses Taught Instructor M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Other kinds of constraints • Domain constraints • E.g. date: must be after 1980 • Enumerated type: grades A through F, no E • No specific E/R notation: mention with attribute or relationship • General constraints: • A class may have no more than 100 students; a student may not have more than 6 courses: Enroll Students <=100 <=6 Courses M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Next topic: Weak entity sets • Definition: • Some or all key attributes belong to another ES • Why: • An entity set is part of a hierarchy (not ISA) • Connecting entity sets • The key consists of • 0, 1 or more of its own attributes • Key attributes of entity sets from supporting relationships M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Conditions of Supporting relationships • Supporting relationship R:EF • R is many-one (E-F) (or one-one) • R is binary • Referential integrity from E to F • a rounded arrow • Those atts supplied to E are thekey attributes of F • F itself may be weak • Another entity set G, and so on recursively R A1 F E A2 M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Requirements for weak entity sets • For several supporting relships from E to F • Keys of each F role appear as foreign key of E • Other many-one relationships • Not necessarily supporting From Purchases A1 People A2 By At-store Stores A3 M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Weak entity sets • Example: Hierarchy – species & genus • Idea: species name unique per genus only Species Belongs-to Genus name name M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Next time • We’ll finish E/R models and begin the relational model • Read chapter 3 through section 3.4 • Info on project, hw likely posted soon M.P. Johnson, DBMS, Stern/NYU, Spring 2005