1 / 95

Lecture 3

Lecture 3. This lecture will introduce More aspects of database design Modelling and transformations Normalisation Some examples of modelling And an excursion into the land of Non Structured Data. Data Structure Diagrams. Requirements Analysis. Conceptual Design. Physical Design.

rgirouard
Download Presentation

Lecture 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 3 This lecture will introduce • More aspects of database design • Modelling and transformations • Normalisation • Some examples of modelling • And an excursion into the land of Non Structured Data

  2. Data Structure Diagrams Requirements Analysis ConceptualDesign Physical Design Implementation Design

  3. Conceptual Design -Create an enterprise (total) model - Normalise user views - Integrate user views - Create a database -Review

  4. Modelling User Requirements DATA Entity Relationship Modelling (Semantic) Normalisation (Minimality) - meaning the minimum number of attributes to define the Primary Key PROCESS Data Flow Analysis Procedure Formation Functional Analysis

  5. Well Structured Data • Advantages • Better integration of applications through shared access to common data. • More adaptable systems because files and databases will follow the natural structures inherent in data rather than application oriented relationships. • Processing should be simpler if data is held in its simplest least redundant form.

  6. Data Focussed Development 1.Develop broad data model and broad function model. 2. Partition data model by function. 3. Develop detailed data model by each functional partition. 4. Develop the detailed procedure models by functional partition.

  7. Primary Key Selection A data item or element that describes one entity type. STUDENT RESULT StudentNo Unit Code Year Semester Grade S1 CSE4230 2003 1 N S1 CSE4230 2003 2 HD • PRIMARY KEY or IDENTIFIER • Some or all of the attributes describing an entity type serve to uniquely identify each entity instance.

  8. Value Sets or Domains • Each attribute type supports one entity type and is often based on an underlying domain or value set. • EMPLOYEE • EmpNo Emp_Gender Emp_Name Emp_Hire_Date • E1 Female Jones 20/6/1989 • DEPENDENT • EmpnoFamily_Name First_Name Gender Birthdate • E1 Jones Amelia Female23/6/1986

  9. Synonyms or Aliases • Be careful to differentiate between things and their names. • ITEM = PART = PRODUCT ??? • SYNONYM or ALIAS • The same object called different names by different people. • This problem concerns entity types, relationship types and attribute types.

  10. Homonyms • Different things called the same name by different people ?? • EMPLOYEE START DATE • In the Personnel area this may mean the date an employee started with the company. • In a given department this may mean the date an employee started with the department. • This problem also concerns entity types, relationship types and attribute types.

  11. Design and Data Structures Data structures are the bricks and mortar that hold databases together. Data structures (for the ANSI/SPARC standard) are defined in the internal model level and implemented in the physical data organisation. Data structures are often hidden from the application programmer, since they are primarily used by the DBMS and Operating System. Understanding data structures is important for performance reasons, to improve program design and allow easier communication with DBMS specialists.

  12. Goals of Relational Design What Relations should exist and what Attributes should they contain? Avoid Redundancy if possible - minimise storage space Avoid Anomalies Avoid Nulls (but be careful) Avoid Joins which produce spurious rows

  13. Some Assumptions • A group of attributes has a natural “inherent” structure • This structure is independent of the way the data is used • Normalisation (meaning : to be able to access complete, clear and correct data - to and from tables) • introduced by E.F. Codd together with relational database • theory • originally Codd defined three normal forms • later expanded to Boyce-Codd and fourth and fifth • normal forms

  14. A Sample Data Structure PRIME _MINISTER PM_NAME BIRTH_YR YRS_SERVED DEATH_AGE STATE_BORN STATE_REP MARRIAGE PM_NAME SPOUSE_NAME MAR_YR PM_AGE NR_CHILDREN MINISTRY MIN_NR PM_NAME PARTY DAY_COMM MTH_COMM YR_COMM

  15. Normalisation Formal measure of why one grouping of attributes may be better than another Each Normal Form requires that a Relation satisfies criteria for that form and this eliminates a different kind of redundancy Normalised Relations will remain consistent following database operations and will store each fact only once Database operations applied to unnormalised relations may lead to anomalies PM_NAME BIRTH_YR YRS_SERVED DEATH_AGE STATE_BORN STATE_REP

  16. Some Anomalies Relation : ASSIGN Person-Id Project_budget Project Time Spent on Project S75 32 P1 7 S75 40 P2 8 S79 32 P1 4 S79 27 P3 1 S80 40 P2 5 - 17 P4 - Null Values are considered to be anomalies

  17. Anomalies Person-Id Project_budget Project Time Spent on Project S75 32 P1 7 S75 40 P2 8 S79 32 P1 4 S79 27 P3 1 S80 40 P2 5 - 17 P4 - Insertion Anomaly add row ASSIGN values( ‘S85’,35,’P1’,9) - two conflicting budgets for P1 S75 32 P1 7 Deletion AnomalyS79 32 P1 4 delete row ASSIGN (‘S79’,27,’P3’,11) - removes project budget for P3 S79 27 P3 1

  18. Functional Dependencies - the values of one set of attributes affect the values of another attributes Y X The value of X determines the value of Y The value of Y depends on the value of X The simplest case is 1 attribute determines another single attribute AND the Primary Key value determines the values of the non key attributes in each row.

  19. Functional Dependencies Project Project Budget Person-Id Project Time Spent on Project Functional Dependency Diagram Project Project Budget

  20. Functional Dependencies • Two attributes are FUNCTIONALLY DEPENDENT if a value forONEattribute specifies a SINGLEvalue for the other attribute GradeRate of Pay Driver 265.00 Crane Driver 310.95 Keyboard Operator 215.55 Systems Analyst 450.00 The Rate of Pay is functionally dependent on Grade or, Grade determines Rate of Pay or, grade----> Rate of Pay. Grade is a DETERMINANT

  21. Functional Dependencies Some Terms: Primary Key Candidate Key Foreign Key Concatenated Key Non-Ambiguous Keys: An attribute (or attributes) of a table is the Primary Key for that table if 1. All attributes in the table are functionally dependent on the Primary Key AND 2. Where the Primary Key is a collection of attributes (concatenated key), no subcollection of the attributes used also exhibits the functional dependence property

  22. Functional Dependencies Translation: 1. The primary key will identify a given row (unique) 2. No portion of the primary key can be an independent primary key PM_NAME BIRTH_YR YRS_SERVED DEATH_AGE STATE_BORN STATE_REP

  23. Minimal Numbers of FD’s Or, rationalise (minimise) the number of tables in the physical database Employee Department Employee determines Department Department Location Department determines Location Employee Location Employee determines Location This FD is redundant. It is known as a ‘Transitive Dependency’ and would be excluded from the design model. (E-R diagram)

  24. Normalisation 1st Normal Form Repeating groups must not occur Unit Student No. Result Name CSE3180 9142717 C Wilson 9131618 D Renoir 9077184 P Gilbey CSE2138 8967384 N Breton 8737980 P Balzac CSE9020 9142717 P Wilson 6932475 HD Gilbey 1st Normal Form

  25. Normalisation Corrected Table The stored data must be complete Unit StudentNo Result Name CSE3180 9142717 C Wilson CSE3180 9131618 D Renoir CSE3180 9077184 P Gilbey CSE2138 8967384 N Breton CSE2138 8737980 P Balzac CSE9020 9142717 P Wilson CSE9020 6932475 HD Gilbey Formally expressed as: Results(unit code, studentno,result,name

  26. Normalisation 2nd Normal Form Designed to eliminate Update Anomalies (4 categories) Examples 1. Update Where a Description alteration requires total file scan and alterations made 2. Inconsistent Data Possibility of a code having more than one description 3. Additions Condition where a row cannot be fully identified e.g. subject code only 4. Deletions If subject Code CSE9020 is deleted, location of associated data is not possible

  27. Normalisation 3rd Normal Form Final Stage ---> 3rd Normal Form Watch for Determinants Primary Keys are Determinants Candidate Keys are Determinants A table is in 3rd Normal Form (3NF) if (i) It is in 2nd Normal Form and (ii) Determinants are Candidate Keys

  28. Normalisation As a check, review 1. Update 2. Inconsistent Data 3. Additions 4. Deletions for Conditions and Difficulties of access (and retrieval)

  29. Normalisation Review A relation is in 3NF if, and only if, the NONKEY ATTRIBUTES are (1)Mutually Independent (2)Fully Dependent on the Total Primary Key Or: A relation is in 3NF if, and only if, for the life of the database, each row consists of a Primary Key Value which identifies some Entity instance, together with a set of Zero or more mutually independent attribute values which describe the Entity in some way. (Date, C.J) Or: Each field must represent a fact about the key, the Whole Key and Nothing but the Key. (Kent, W)

  30. Normalisation - Examples 1. 1NF or First Normal Form Rule: Each Row MUST CONTAIN the same number of columns Example: Course Instructor table class code lecturer tutor tutor tutor C3576 Doe,J Jones,R Smith,V Ng,K K567 Nguyen,L B6745 Fabbri,M Ong,W R3289 Pratt,W Archer,V Barrat,N Thenumber of columns (attributes) is not consistent Create a table of TUTORS with Class Codes

  31. Normalisation - Examples 2NF: 2 criteria (1) The table must be in 1NF (2) every attribute which is NOT part of the Primary key must be functionally dependent on the Primary Key Building Room Seats No. of Levels A 214 85 4 A 242A 25 4 B 213 135 6 The number of levels is dependent on Building, not Building + Room. Create another table - ‘building table’

  32. Solution 1 table : Accommodation(building, room, seats) Another table : Building(building, levels)

  33. Normalisation - Examples 3NF : A table is in 3rd Normal Form if(1) the table is in 2NF and (2) every attribute which is NOT part of the Primary Key is functionally dependent ONLY on the key - meaning it is not dependent on any other non-key attribute

  34. Normalisation - Examples This table is NOT in 3NF: ClassCode Lecturer Lecturer’s Office Phone C3567 Doe,J 101 Bldg A 32412 K4567 Fabbri,M 312 Bldg B 31523 B7645 Nguyen,L 209 Bldg F 31518 R3289 Pratt,W 406 Bldg E 32581 ClassCode Lecturer Lecturer Lecturer’s Office Phone C3567 Doe,J Doe,J 101 Bldg A 32412 K4567 Fabbri,M Fabbri,M 312 Bldg B 31523 B7645 Nguyen,L Nguyen,L 209 Bldg F 31518 R3289 Pratt,W Pratt,W 406 Bldg E 32581

  35. Normalisation Some Thoughts: • Databases should be constructed so that data occurs only ONCE. • Redundant or repetitive data may produce inconsistencies in the database during update e.g. a person’s address is in 1 table and row only, and other rows in other tables have keys referring to it. (known as referenced and referencing keys/tables) • A quick design test - ‘If a row is deleted, will valuable data be lost ?’ If YES, remove the valuable information and place it in another table

  36. NormalisationDiagrammatic Functional Dependencies supplier name supplier number status p.k city Notice that these attributes are dependencies of the primary key part name color mass city city assets part number p.k This attribute IS NOT

  37. Other Normalisation Forms Other forms of Normalisation: 4th Normal Form 4NF A row should NOT contain two or more independent multivalued facts about an entity. Facts are independent of each other when there is no direct connection between them. Generally occurs in a many to many relationship Example: A book publishing company can have many authors producing many books The books would (normally) be sold from many booksellers stores

  38. Other Normalisation Forms 5th Normal Form A table is in 5th Normal Form when its data can no longer be decomposed into smaller tables without each row having the same Primary Key

  39. MultiValued Dependencies MVD’s Functional Dependencies cannot express all cases of associations between attributes Consider PERSON and SKILL(s) A PERSON may have one or many SKILLS A SKILL may be owned by many PERSONS PERSON --- > SKILL SKILL ----> PERSON Neither of these expressions cover the above statements

  40. MultiValued Dependencies MVD’s An MVD defines that an ATTRIBUTE is always associated with a given set of values of another ATTRIBUTE PERSON --->> SKILL Interpretation: a single value of a set of attributes determines a set of values of other attributes AND this set of values is the same, independent of any other dependencies of the first named attributes PERSON ---->> skill_id, date_acquired

  41. Advanced Examples 4NF, 5NF • SALES_AREAS Sales_Rep Customer Product_Class Joe BH Store Cosmetics Joe BH Store Haberdashery Helen MaxiStores Cosmetics Helen MaxiStores Clothing Helen MaxiStores Shoes Joe 8-Up Cosmetics Joe 8-Up Haberdashery Helen BH Store Cosmetics Helen BH Store Clothing Helen BH Store Shoes

  42. Advanced Examples 4NF, 5NF In this table, there are MVD’s Sales_Rep --->> Customer Sales_Rep --->> Product_Class FOURTH NORMAL FORM: A relation is in 4NF if, for any multi-values dependency X --->> Y, there are no attributes other than those in X and Y In the table , Sales_Rep --->> Customer, BUT also --->> Product_Class Solution: Decompose SALES_AREAS into:Sales_Rep(Sales_rep, Product_Class) Areas(Sales_Rep, Customer)

  43. Advanced Examples 4NF, 5NF FIFTH NORMAL FORM: Restrictions on Dependencies (also illustrates a Business Rule) BH Store does not sell Haberdashery MaxiStores do not sell Cosmetics Thus the SALES_AREA cannot be decomposed as shown. It can be decomposed into Sales_Rep(Sales_Rep, Product_Class) Areas( Sales_Rep, Customer) Sells(Customer,Product_Sold)

  44. Advanced Examples 4NF, 5NF IF not all Sales_Reps of a given Product_Class can sell that Product_Class to a store, then the resultant table, based on the original table, is Sales_Rep Customer Product_Class Joe BH Store Cosmetics Helen MaxiStores Clothing Helen MaxiStores Shoes Joe 8-Up Cosmetics Joe 8-Up Haberdashery Helen BH Store Clothing Helen BH Store Shoes This relation cannot be decomposed and is in 5NF

  45. In Practice --- 1.Oracle insists that modelling to 3rd normal form is adequate for all commercial databases 2. Since the emergence of the 4th and 5th forms, database software (in the case of Oracle this is PL/SQL) is able to provide the processing necessary to accommodate special or extended control over data in database tables 3. Part of the development of a database is focussed on the presence and understanding of ‘Business Rules’ 4. There will be some examples of Business Rules in a later lecture 5. You will need to develop Rules for your database as part of its development and control

  46. Stages in Data Modelling - 1 1. Analyse User Requirements - User Views 2. Aggregate Functions - Highlight Synonyms 3. Prepare first cut of Entities 4. Determine Cardinality of Entities • Develop E-R Diagrams 5. Determine Relationships 1:1, 1:M, M:N 6. Introduce Attributes - Data Structures (use Natural Mapping) 7. Analyse Candidate Keys for Primary, Foreign and Access key functions 8. Determine success of relationships 9. Review (if necessary) the E-R Diagram

  47. Stages in Data Modelling - 2 • Select and Test the Primary and ForeignKeys for each table • Select Candidate keys for access • Normalise • Test Normalisation

  48. Stages in Data Modelling - 3 • Review E-R Diagram and Data Structure Diagrams if necessary • Examine Data Structures/E-R Diagrams against original views, requirements e.g. report contents • Obtain confirmation and acceptance from User(s) • Move to Physical Design Considerations and, don’t forget the documentation !

  49. Data Modelling - Some Guidelines • Do not prematurely combine entities into tables • Concentrate on Access Mechanisms which can be shared among requests • Deviate from the model in a responsible manner (i.e. get user acceptance, enthusiasm) • Use table/view and attribute names which closely reflect the data model names (Natural Mapping) • Do not define multiple attributes as one composite attribute in a table

  50. Database Design Guidelines • Watch data types for each attribute, especially complex keys • Develop an ‘architecture’ to support ‘Business Rules’ for procedures, integrity and access • Look for developing technology trends • Determine the reason/purpose of the proposed database

More Related