420 likes | 531 Views
Week 10 October 31. Extended ERD Data Normalization. Problems with ER modeling. Fan traps - Pathway between two entities is ambiguous Chasm traps - Pathway does not exist between certain entity occurrences Inheritance - An entity receives its attributes from a class of attributes.
E N D
Week 10October 31 Extended ERD Data Normalization
Problems with ER modeling • Fan traps - Pathway between two entities is ambiguous • Chasm traps - Pathway does not exist between certain entity occurrences • Inheritance - An entity receives its attributes from a class of attributes Extended Entity Relationship (ERR) modeling
Connection Trap: Fan Trap What products belong to which product categories? Which products have restricted use aboard a plane? Merchandise Lines Merchandise_line Description Product Categories Product_category Merchandise_line Classify Have Products Product_code Description Merchandise_line
Connection Trap: Fan Trap What products belong to which product categories? Which products have restricted use aboard a plane? Merchandise Lines Merchandise_line Description Product Categories Product_category Merchandise_line Classify Have Products Product_code Description Merchandise_line To satisfy these queries, we need to form a relationship
Connection Trap: Chasm Trap What products belong to the same merchandise line? Which products require a UL listing? Product Categories Product_category Merchandise_line Merchandise Lines Merchandise_line Description UL_listing Classify Have Products Product_code Description Product_category To satisfy these queries, we need to form a relationship Known: What merchandise lines are composed of what products
EER modelingSuperclass and Subclass Entity Types • Superclass - Higher order of classification or categorization • Subclass - A member of a superclass that provides specification Electronic Merchandise Music and Videos Superclasses Audio Visual CD Receiver Cassette attributes of CD attributes of cassette decks attributes of receivers
EER modelingSuperclass and Subclass Entity Types • Specialization - top-down • Maximizing differences between members by identifying distinguishing characteristics • Generalization - bottom-up General Electronic Merchandise Audio Visual CD Receiver Cassette Specific attributes of CD attributes of cassette decks attributes of receivers
EER modelingSuperclass and Subclass Entity Types • Specialization - top-down • Generalization - bottom-up • Minimizing differences between entities by identifying common features General Electronic Merchandise Audio Visual CD Receiver Cassette Specific attributes of CD attributes of cassette decks attributes of receivers
EER modelingAttribute Inheritance CDP-325 (Sony CD Changer) Attributes common to all audio merchandise are inherited General Electronic Merchandise Audio Visual CD Receiver Cassette Specific attributes of CD attributes of cassette decks attributes of receivers
EER Diagram Product_code Prod_descip 1 M Manufacturers Products Sells 1 Disjoint constraint Produces Superclass/subclass d 1 CD Receiver Cassette Db range Watts Flutter
Constraints • Disjoint (d, o) • Entity can be a member of only one of the subclasses of specialization • Under non-disjoint, an entity can be a member of more than one subclass of specialization • Participation (partial or total) • Total - every entity in the superclass must be a member of a subclass in specialization • Partial - An entity need not belong to any of the subclasses of specialization
Data Normalization The process of decomposing complex data structures into simple relations according to a set of dependency rules.McFadden and Hoffer
Data Normalization • The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise. • Achieve a design that is highly flexible • Reduce redundancy • Ensure that the design is free of certain update, insertion and deletion anomaliesCatherine Richardo, 1990
Normalization 1NF 2NF Progressively putting the relation into a higher normal form 3NF BCNF 4NF
0000 000 0000 0 John Smith 10001 Order No. Stereos To Go Invoice 6 15 99 Date: / / Go, Hogs 0000-000-0000-0 Account No. John Smith Customer: 2036-26 Street 1/05 Address: Sacramento CA 95819 City State Zip Code 6 18 99 Date Shipped: / / Product Item Product Description/Manufacturer Price Qty Code Number SAGX730 Pioneer Remote A/V Receiver 1 1 1 1 56995 35995 39995 2 AT10 Cervwin Vega Loudspeakers CDPC725 Sony Disc-Jockey CD Changer 3 4 5 132985 10000 10306 153291 Subtotal Shipping & Handling Sales Tax Total
File-Based System Invoice Program Invoices Customer Orders Customer Account Program Account Customer Accounts Report File Customer Mailings Program Customer Mailing List Mailing List File
Data Redundancy • Customer Order File • PO number • Customer account number • Customer name, address, city, state, zip code • Order date • Product code, product description, price, unit • Customer Account File • Account Number • Customer name, mailing address, city, state, zip code • Customer Mailing List File • Customer name, mailing address, city, state, zip code
Unnormalized Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item1 Item1_descrip Item1_qty Item1_price, Item2 Item2_descrip Item2_qty Item2_price, . . . , Item7 Item7_descrip Item7_qty Item7_price) How would a program process the data to recreate the invoice?
First Normal Form (1NF) • A relation is in first normal form if and only if every attribute is single-valued for each tuple. • Remove all repeating groups • Create a flat file
Unnormalized to 1NF (Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item1, Item1_descrip, Item1_qty, Item1_price, Item2, Item2_descrip, Item2_qty, Item2_price, . . . , Item7, Item7_descrip, Item7_qty, Item7_price) Repeating groups A flat file places all the data of a transaction into a single record. This is reminiscent of a COBOL or BASIC program processing a single transaction with one read statement.
Unnormalized to 1NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item, Item_descrip, Item_qty, Item_price) Nominated group of attributes to serve as the key (form a unique combination) • Eliminate the repeating groups. • Each row retains data for one item. • If a person bought 5 items, we would have five tuples
10001 123456 John Smith ••• SAGX730 Pioneer Remote A/V Rec 1 569.95 10001 123456 John Smith ••• AT10 Cerwin Vega Loudspeakers 1 359.95 10001 123456 John Smith ••• CDPC725 Sony Disc Jockey CD 1 399.95 10001 123456 John Smith ••• S/H Shipping 1 100.00 10001 123456 John Smith ••• Tax Sales Tax 1 103.06 1NF Flat File Invoice number Account number Customer name Item Quantity Item Price Item Description Account number
Second Normal Form (2NF) • A relation is in second normal form if and only if it is in first normal form and the nonkey attributes are fully functionally dependent on the key. • Full functional dependency • B is functionally dependent on A if each value of A is associated with exactly one value of B Attribute A Attribute B Determinant
2NF • Second normal form applies to relations with composite keys(i.e., a primary key composed of two or more attributes) • A relation with a single attribute primary key is automatically in at least 2NF
From 1NF to 2NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item, Item_descrip, Item_qty, Item_price) What attribute(s) can be used to uniquely identify a tuple? If the primary key consisted of invoice_number and item (i.e., composite key), we would need to remove the partial dependencies.
From 1NF to 2NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item, Item_descrip, Item_qty, Item_price) Using Invoice number and Item as the key... Some of the attributes are dependent upon invoice_number for their values and others on item. In either case, they are not functionally dependent on the entire key.
From 1NF to 2NF Which attributes are functionally dependent on which keys? Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, tem_descrip, Item_qty, Item_price ? Invoice_number Vs. Item
From 1NF to 2NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Item, Item_descrip, Item_qty, Item_price) Is this unique by itself? What happens if the item is purchased more than once?
From 1NF to 2NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) Partial dependency (Invoice_number,Item, Item_descrip, Item_qty, Item_price) Composite key (forms a unique combination)
From 1NF to 2NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Invoice_number,Item, Item_qty, Item_price) (Item, Item_descrip)
From 1NF to 2NF In contrast... (Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item Item_descrip Item_qty Item_price) If the primary key consisted of invoice_number and Invoice_date (i.e., composite key), we would NOT have partial dependencies. Thus, the relation would be in 2NF.
Third Normal Form (3NF) • A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key. • Remove transitive dependencies • “Each nonkey attribute must depend upon the key, the whole key, and nothing but key.”Kent, 1978
From 2NF to 3NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Invoice_number,Item, Item_qty, Item_price) (Item, Item_descrip) Which attributes are dependent on others? Is there a problem?
Transitive Dependencies and Anomalies • Insertion anomalies • To add a new row, all customer (name, address, city, state, zip code, phone) and products (description) must be consistent with previous entries • Deletion anomalies • By deleting a row, a customer or product may cease to exist • Modification anomalies • To modify a customer’s or product’s data in one row, all modifications must be carried out to all others
Insertion and Modification AnomaliesFor example… Insert a new Panasonic product Product_code Manufacturer_name DVD-A110 Panasonic PV-4210 Panasonic PV-4250 Panasonic CT-32S35 PAN Inconsistency DVD-A110 Panasonic PV-4210 PanaSonic PV-4250 Pana Sonic CT-32S35 PAN Change all Panasonic products’ manufacturer name to “Panasonic USA”
Deletion AnomalyFor Example… 4377182 John Smith Sacramento CA 95831 4398711 Arnold S Davis CA 95691 4578461 Gray Davis Sacramento CA 95831 4873179 Lisa Carr Reno NV 89557 By deleting customer Arnold S, we would also be deleting Davis, California.
Invoice_number Invoice_date Date_delivered Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code Item Item_descrip Invoice_number+Item Item_qty Item_price Transitive Dependencies • A condition where A, B, C are attributes of a relation such that if A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).
Why Should City and State Be Separated from Customer Relation? • City and state are dependent on zip code for their values and not the customer’s identifier (i.e., key). Zip_code City, State • Otherwise, Cust_account Cust_addr, Zip_code City, StateIn which case, you have transitive dependency.
3NF Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Invoice_items Relation (Invoice_number, Item, Item_qty, Item_price) Items Relation (Item, Item_descrip)
3NF Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Invoice_items Relation (Invoice_number, Item, Item_qty, Item_price) Items Relation (Item, Item_descrip) Manufacturers Relation (Manuf_code, Manuf_name) Since the Items relation contains the manufacturer’s name in the description, a separate Manufacturers relation can be created
First to Third Normal Form(1NF - 3NF) • 1NF: A relation is in first normal form if and only if every attribute is single-valued for each tuple (remove the repeating or multi-value attributes and create a flat file) • 2NF: A relation is in second normal form if and only if it is in first normal form and the nonkey attributes are fully functionally dependent on the key (remove partial dependencies) • 3NF: A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key (remove transitive dependencies)