Class Agenda (04/06/2006 and 04/11/2006)

Class Agenda(04/06/2006 and 04/11/2006) • Discuss use of Visio for ERDs • Learn concepts and ERD notation for data generalization • Introduce concepts of normalization

What is generalization? • Generalization is the concept that some entities share attributes. • Example: • Imagine a university human resource system with an entity called “employee”. • An employee has attributes such as name, address, SSN, birth date, educational level, etc. • Different types of entities have other information. • For example, a “faculty” entity has grant type. • A “classified” entity stores data about employment level. • An “administrative” entity stores data about contract type.

Generalization vocabulary • An entity “supertype” is an entity whose instances have attributes that are common to one or more entity “subtypes”. • Attributes that are shared by all entities (including the identifier) are associated with the supertype. • An entity “subtype” is an entity whose instances inherit some common attributes from an entity supertype. An entity “subtype” adds other attributes to those inherited from the supertype. • A subgrouping of the entities in an entity type which has attributes that are distinct from those in other subgroupings. • Generalization allows an analyst to understand more fully the relationships between entities.

What are the generalization constraints? • Constraints are limitations depicted on a data model. • The most common types of constraints are: • Completeness. Addresses the question of whether an instance of a supertype must also be a member of at least one subtype. • Result is either required or not required. • Specified on the ERD with the letter “C” under the arrow pointing to the supertype. • Disjointness. Addresses the question of whether an instance of a supertype may simultaneously be a member of two or more subtypes. • Result is specified as disjoint or overlap. • Specified on the ERD with the letter “D” under the arrow pointing to the supertype.

Database Normalization • What will you know about database normalization? • Define normalization. • Know the vocabulary of normalization. • Understand the process of normalization. • What will you be able to do? • Be able to identify the characteristics of each normal form. • Be able to tell whether or not a data model is in third normal form. • Possibly be able to use normalization to assist you in the design of a database.

Review of database design goals • Protect the integrity of the data. • Reduce data redundancy. • Prevent data anomalies. • Provide for change. • Prevent inflexible data structures. • Anticipate changes. • Provide complete data for decision making.

What are data anomalies? • An anomaly is a potential error or inconsistency in the data. • Data anomalies are most frequently caused by the implementation of M:N relationships. • A M:N relationship can be implemented through a “false” 1:M relationship.

Example of a “False” 1:m Relationship Order 12100 5613 04-10-2006 03-24-2006 200 net30 4.99 11 12100 7816 04-10-2006 03-24-2006 200 net30 45.89 15 12100 5613 05-15-2006 03-24-2006 200 net30 4.78 50 12250 4512 12-02-2005 03-22-2006 231 cod 9.99 87 12250 5622 12-02-2005 03-22-2006 231 cod 5.70 25 Product 5613 tumbler 12 oz ea inventory 50 452 Sacramento South 9162234551 5613 tumbler 12 oz ea inventory 50 455 Sacramento North 9167817273 7816 food processor ea inventory 15 455 Sacramento North 9167817273 5622 paper rm supplies 25 452 Sacramento South 9162244551 4512 glass pitcher ea inventory 75 488 Reno 7753314551

Potential anomalies in the example • Insertion anomaly: Can’t add “some” of a row; must have all the key attributes. Example - Suppose we need to add a new warehouse? • Deletion anomaly: Lose some relevant data when deleting other data. Example - What happens to the Reno warehouse information (name and phone#) when we delete item #4512? • Update anomaly: Must update more than one row when one piece of data changes. Examples - What happens if the telephone number at the Sacramento North warehouse changes? What happens if the date the purchase order was placed is entered incorrectly and must be updated?

Other problems with “false” 1:m relationships • What happens when the database design grows or changes? • How do you add new data attributes? • What about keeping track of the buyer for an order? • Or the manager for a warehouse?

What is normalization? • Normalization is a formal, process-oriented approach to data modeling. • Normalization is the process of: • examining groups of data attributes; • splitting them into appropriate entities; • identifying the relationships between the entities; and • identifying appropriate primary and foreign keys.

Normalization process • Some refer to this as the “bottom-up” form of database design. • Contrast with the more intuitive “top-down” approach we have been using. • The results from the normalization process are stable, flexible entities. The results from the intuitive approach should be the same.

Two methods of applying normalization • Use it to help in designing a database. • Normalization starts with a single entity. • Normalization breaks that entity into a series of additional entities. • More entities are discovered and named during the process. • Entities are linked during the process. • Use it to validate the design of a database. • Identify entities from the meaning of the data. • Create conceptual and logical data models. • Apply the rules of normalization to ensure a stable, non-redundant design.

Vocabulary for normalization • A “functional dependency" is a relationship between attributes in which one attribute or group of attributes determines the value of another. • A “determinant” is an attribute that, once known, can determine the value of another attribute.

Examples of functional dependencies and determinants • A social security number determines your name and address. SSN  name, address. • A vehicle id number determines the make and model of a car. VIN  make, model. • Name and address are “functionally dependent” on SSN. • SSN “determines” name and address. • Functional dependency diagram format: • CrsNum  CrsDescription, CrsCredits • ZipCode  City, State • PatID, TrtID, LocID, TrtDateTime  TstResults

Normalization process • Normalization is accomplished in stages. A “normal form” is a state (level of completeness) of a data model. • Unnormalized data: A data model that has not been normalized. It contains repeating groups and is not a stable model. • Unnormalized data is essentially one entity. The system under analysis is categorized as a single entity.

Unnormalized data for grade report exercise • Semester • Year • Student Name • Student Address • Student City • Student State • Student Zip Code • Student ID • Student College • Student Major • Student Minor • Student Year • Course ID • Course Title • Course Instructor • Course Credits • Grade What attributes might be needed that aren’t visible on the grade report? Group all attributes in one “big” entity. Identify a primary key for the entity. Maybe studentID for this one.

First Normal Form • First normal form: Remove repeating groups. • A repeating group is an attribute or group of attributes that can have more than one value for an instance of an entity. If it is a single attribute, we have been calling it a “multi-valued” attribute. • To get a data model into first normal form: • Identify repeating groups and place them as separate entities in the model. • Identify a primary key for the repeating group. The key may be concatenated. • Create the relationships between entities. • Divide m:n relationships with appropriate intersection entities.

Second Normal Form • Second normal form: Remove partial functional dependencies. • A partial functional dependency is a situation in which one or more non-key attributes are functionally dependent on part, but not all, of the primary key. • Partial functional dependencies occur only with concatenated keys. • Examples of partial functional dependencies: • PatID, TrtID, LocID, TrtDateTime  PatName, TstResults, TrtType, TrtDescription, LocName • CourseID, StudentID  CourseTitle, Grade • Which entities developed in first normal form for the grade report have concatenated keys?

Third normal form • Third normal form: Remove transitive dependencies. • A transitive dependency occurs when a non-key attribute is functionally dependent on one or more non-key attributes. • Third normal form examines entities with single primary keys and removes the “floating” or transitive dependencies. • It may be possible to have attributes that are determined by other attributes, rather than by the primary key. They must be removed into entities with appropriate primary keys.

Summary of normalization process • Examine and evaluate the logical data model for effectiveness. • Find the repeating groups and put the model into first normal form. Identify primary key fields for any new entities. Relate entities with foreign keys. • Find the functional dependencies. Identify the partial functional dependencies and put the model into second normal form. Identify primary key fields for any new entities. Relate entities with foreign keys. • Find the transitive dependencies and put the model into third normal form. Identify primary key fields for any new entities. Relate entities with foreign keys.

Goal of normalization A set of entities where each attribute in each entity is dependent on the primary key, the whole primary key, and nothing but the primary key.

Class Agenda (04/06/2006 and 04/11/2006)

Class Agenda (04/06/2006 and 04/11/2006)

Presentation Transcript

Semi-Indexing Semi-Structured Data (in tiny space)

Class Agenda (04/03 and 04/08)

UN/CEFACT ORGANISATION

vti_encoding:SR|utf8-nl

vti_encoding:SR|utf8-nl

October 04, 2006

Project Progress Overview

vti_encoding:SR|utf8-nl

vti_encoding:SR|utf8-nl

vti_encoding:SR|utf8-nl

vti_encoding:SR|utf8-nl

vti_encoding:SR|utf8-nl

Tuesday, September 04, 2006

vti_encoding:SR|utf8-nl

vti_encoding:SR|utf8-nl

vti_encoding:SR|utf8-nl

vti_encoding:SR|utf8-nl

Agenda – 04/18/2006 and 04/20/2006

MNM Fatal 2006-04