450 likes | 573 Views
CS411 Database Systems. 03: Relational Model. Announcements. How to view . pptx files: http ://www.microsoft.com/Downloads/details.aspx?familyid=048DC840-14E1-467D-8DCA-19D2A8FD7485&displaylang=en Problem Set 1 is out, due in one week Hand in in SC 2120 ( not in class ); no late submission
E N D
CS411Database Systems 03: Relational Model
Announcements • How to view .pptx files: http://www.microsoft.com/Downloads/details.aspx?familyid=048DC840-14E1-467D-8DCA-19D2A8FD7485&displaylang=en • Problem Set 1 is out, due in one week • Hand in in SC 2120 (not in class); no late submission • Review (re)grading policies! • Homework points are cumulative, not per problem set • This problem set is small • Might possibly have 3 or 5 rather than 4 • Problem Set 2 is ready, will go out in one week • More I2CS students: Bowe Bell + Howell, Microsoft, GE Healthcare • Job openings in a Microsoft DB group • Google science data • Sun to buy MySQLhttp://www.nytimes.com/2008/01/17/technology/17sun.html?_r=2&th=&adxnnl=1&emc=th&adxnnlx=1200586263-y9veTexTjcrFtRWZpOlZdQ&oref=slogin&oref=slogin • Survey: why are you taking this class?
Job announcement Our group is involved in developing a fault tolerant, highly scalable and highly available structured data store. At this point in time our group is expanding rapidly and to this end needs to hire at a fast pace. We are interested in undergrad, MS, and PhD students. A sample job profile can be found at: http://members.microsoft.com/careers/search/details.aspx?JobID=40392323-71E3-4D8B-82F5-C11CB1D3D75E&start=1&interval=10&SortCol=DatePosted Resumes can be forwarded to rsinha@microsoft.com .
Let’s examine some ER schemasEvery one is differentWhat is “good” depends on the applicationThen we’ll introduce the relational model
DB Modeling & Implementation Ideas Database Model (E/R, ODL) Relational Schema Physical storage Complex file organization and index structures. Diagrams (E/R) Tables: column names: attributes rows: tuples
ER Model vs. Relational Model • Relational model • has just a single concept: relation • world is represented with a collection of tables • well-suited for efficient manipulations on computers • ER model is richer • entities, relations, attributes, etc. • well-suited for capturing app. requirements • not so well-suited for computer implementation • (does not even have operations on its structures)
A Relation Relation (table) name Attributes (columns) Products: Name Price Category Manufacturer Gizmo $19.99 gadgets GizmoWorks PowerGizmo$29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi tuples (rows)
Domains • Each attribute has a type, often called its domain • Integer • String • Date • Real • … • Domains are generally not (i.e., only recently sometimes) user-definable, e.g., “map” • The type must be atomic (why? see later)
Schemas vs. instances (very important, make sure you knowthe difference)
A Relation Relation (table) name Attributes (columns) Products: Name Price Category Manufacturer Gizmo $19.99 gadgets GizmoWorks PowerGizmo$29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi tuples (rows)
The relation’s schema Relation (table) name Attributes (columns) Products: Name Price Category Manufacturer
The relation’s instance(sometimes called its data, rows, tuples) Gizmo $19.99 gadgets GizmoWorks PowerGizmo$29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi tuples (rows)
The schema of a relation is the relation name plus attribute names and their domains. Typically written like this: Product(Name, Price, Category, Manufacturer) The schema of a database is a finite set of relation schemas. Can write like this: Product(Name, Price, Category, Manufacturer), Vendor(Name, Address, Phone),… .
Instances (in math-speak) Given a relational schema = R (A1 ,…, Ak), an instance of the relation is a finite set of k-tuples where the first value of each tuple is in the domain of A1 and … and the kth value of each tuple is from the domain of Ak Example instance in mathspeak: {(a11, …, a1k), (a21, …, a2k), … , (an1, … ank)} Database schema = R1(…), …, Rn(…)DBinstance= n relations, R1, ..., Rn
There are three kinds of updates to a database instance. 1) add a tuple 2) delete a tuple 3) modify an attribute value in a tuple Data updates are frequent. Schema updates are relatively rare and painful. Why?
Schemas and Instances • Analogy with programming languages: • Schema = type • Instance = value • Important distinction: • Database Schema = stable over long periods of time • Database Instance = changes constantly, as data is inserted/updated/deleted
How should we talk about relations (that is, represent them)?
More mathspeak: a relation is a finite subset of the cartesian product of the domains Tuple= element of string x int x string x string E.g. t = (gizmo, 19, gadgets, GizmoWorks) Relation = subset of string x int x string x string • Order in the tuple is important ! • (gizmo, 19, gadgets, GizmoWorks) • (gizmo, 19, GizmoWorks, gadgets) • No attributes in this version, just domains
More mathspeak: a relation is a set of functions • Fix the set of attributes A • A = {name , price, category, manufacturer} • A tuple = function t: ADomains • Relation = set of tuples • Order in a tuple is notimportant! • Attribute names areimportant! {name gizmo, price 19, category gadgets, manufacturer gizmoWorks}
We will switch back and forth between these two definitions of relations • Positional tuples, without attribute names • Relational schemas with attribute names
Now the fun part: translating from an ER diagram to the relational model boring
Basic cases • entity set E = relation with attributes of E • relationship R = relation with attributes being keys of related entity sets + attributes of R • Special cases • combining two relations • translating weak entity sets • translating is-a relationships and subclasses
Announcements • Student ACM • Problem set 1 and the first several parts of the course project are out (part 0 is due on Tuesday, just like problem set 1); document your assumptions • Adjusted hand-in requirements to be 3:15 PM deadline
name category name price makes Company Product Stock price buys employs Person name ssn address
Entity Sets to Relations name category price Product Product: Name Category Price gizmo gadgets $19.99
Relationships to Relations price name category Start Year name makes Company Product Stock price Relation Makes (watch out for attribute name conflicts) Product-name Product-category Company-name Start-year gizmo gadgets gizmoWorks 1963
Likes 2 1 Favorite Buddies Likes(drinker, beer) husband wife Favorite(drinker, beer) Married Buddies(name1, name2) Married(husband, wife) name name addr manf Drinkers Beers
Sometimes it is best to combine two relations price name category Start Year name makes Company Product Stock price No need for Makes. Just modify Product: name category price StartYearcompanyName gizmo gadgets $19.99 1963 gizmoWorks
When is it ok to combine? It is OK to combine the relation for an entity set E with the relation R for a many-one relationship from E to another entity set. Drinkers(name, addr) Favorite(drinker, beer) Drinkers(name, addr, favoriteBeer) (Why wouldn’t this be ok if each drinker could have several favorite beers?)
Redundancy Combining a many-to-many relationship causes trouble Drinkers(name, addr) Likes(drinker, beer) name addr beer Sally 123 Maple Bud Sally 123 Maple Miller
Handling Weak Entity Sets affiliation Team University sport number name Relation Team: Sport Number Affiliated University mud wrestling 15 Montezuma State U. - need all the attributes in the key for Team. - don’t need a separate relation for Affiliation. (why?) - do want a separate relation for University. (why?)
The relation for a weak entity set must include all the attributes for its key (including those belonging to other entity sets), as well as its own non-key attributes. A supporting (double-diamond) relationship is redundant and yields no relation.
At becomes part of Logins name name Logins At Hosts createTime Hosts(hostName) Logins(loginName, hostName, createTime) At(loginName, hostName)
Translating Subclass Entities Product ageGroup topic Platforms required memory isa isa Educational Product Software Product isa isa Educational-method Educ-software Product
Option #1: the OO Approach 4 tables: each object can only belong to a single table Product(name, price, category, manufacturer) EducationalProduct( name, price, category, manufacturer, ageGroup, topic) SoftwareProduct( name, price, category, manufacturer, platforms, requiredMemory) EducationalSoftwareProduct( name, price, category, manufacturer, ageGroup, topic, platforms, requiredMemory) All names are distinct
Option #2: the E/R Approach Product(name, price, category, manufacturer) EducationalProduct( name, ageGroup, topic) SoftwareProduct( name, platforms, requiredMemory) No need for a relation EducationalSoftwareProduct Unless, it has a specialized attribute: EducationalSoftwareProduct(name, educational-method) Same name may appear in several relations
Option #3: The Null Value Approach Have one table: Product ( name, price, manufacturer, age-group, topic, platforms, required-memory, educational-method) Some values in the table will be NULL, meaning that the attribute not make sense for the specific product. Too many meanings for NULL
Translating Subclass Entities: The Rules Three approaches: • Object-oriented : each entity belongs to exactly one class; create a relation for each class, with all its attributes. • E/R style : create one relation for each subclass, with only the key attribute(s) and attributes attached to that E.S.; entity represented in all relations to whose subclass/E.S. it belongs. • Use nulls : create one relation; entities have null in attributes that don’t belong to them.
Example Beers name manf isa Ales color
Object Oriented Beers name manf isa name manf Bud Anheuser-Busch Beers name manf color Summerbrew Pete’s dark Ales Ales color
E/R Style Beers name manf isa name manf Bud Anheuser-Busch Summerbrew Pete’s Beers name color Summerbrew dark Ales Ales color
Using Nulls Beers name manf isa Ales color name manf color Bud Anheuser-Busch NULL Summerbrew Pete’s dark Beers
Comparisons • O-O approach good for queries like “find the color of ales made by Pete’s.” • Just look in Ales relation. • E/R approach good for queries like “find all beers (including ales) made by Pete’s.” • Just look in Beers relation. • Using nulls saves space unless there are lots of attributes that are usually null.
Translation Review • Basic cases • entity to table, relation to table • selecting attributes based on keys • Special cases • many-one relation can be merged • merging many-many is dangerous • translating weak entity sets • translating isa hierarchy • 3 choices, with trade-offs