1 / 41

Entity Modelling Tips

Entity Modelling Tips. Quick Intro. Working with Oracle since 1986 Oracle DBA - OCP Oracle7, 8, 9, 10 Oracle DBA of the Year – 2002 Oracle ACE Director ANSI/ISO Standards Committee - SQL Regular Presenter at Oracle Conferences Consultant and Trainer

Download Presentation

Entity Modelling Tips

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Entity Modelling Tips

  2. Quick Intro Working with Oracle since 1986 Oracle DBA - OCP Oracle7, 8, 9, 10 Oracle DBA of the Year – 2002 Oracle ACE Director ANSI/ISO Standards Committee - SQL Regular Presenter at Oracle Conferences Consultant and Trainer Technical Editor for a number of Oracle texts UK Oracle User Group Director Member of IOUC Day job – Tradba Ltd

  3. What is an Entity • Oxford English Dictionary • Entity : Noun.1 Thing • Bloom’s Taxonomy • Thing : Whatever is or maybe an object of thought • James Martin’s definitions of IT terms • Object : A real-world entity

  4. Relationships and the Time Dimension Building Society Scenario • Each and every manager manages one and only one branch • Each and every branch is managed by one and only one manager • Could (should?) be implemented as one entity (table) • But what about the time dimension? • How could the relationship ‘marriage’ be modelled? Manager Branch

  5. Why Normalise? • Normalising usually implies splitting data apart into separate tables • This means data often has to be joined back together • Why not keep the data in one table? • Attempt 1 • Store department details in line with employee information in emp_dept emp_dept EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DNAME LOC ----- ------ --------- ---- --------- ---- ---- ------ ---------- -------- 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 ACCOUNTING NEW YORK 7839 KING PRESIDENT 17-NOV-81 5000 10 ACCOUNTING NEW YORK 7934 MILLER CLERK 7782 23-JAN-82 1300 10 ACCOUNTING NEW YORK 7566 JONES MANAGER 7839 02-APR-81 2975 20 RESEARCH DALLAS 7902 FORD ANALYST 7566 03-DEC-81 3000 20 RESEARCH DALLAS 7876 ADAMS CLERK 7788 12-JAN-83 1100 20 RESEARCH DALLAS 7369 SMITH CLERK 7902 17-DEC-80 800 20 RESEARCH DALLAS 7788 SCOTT ANALYST 7566 09-DEC-82 3000 20 RESEARCH DALLAS 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 SALES CHICAGO 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 SALES CHICAGO 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 SALES CHICAGO 7900 JAMES CLERK 7698 03-DEC-81 950 30 SALES CHICAGO 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 SALES CHICAGO 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 SALES CHICAGO

  6. The emp_dept table • Primary key is empno 1. No way to add a new department without an employee 2. Changing a department name or location would require multiple rows to be updated • Risk of inconsistencies 3. Deleting the last employee in a department would remove the department information

  7. Attempt 2 – The dept_emp table • Store employee details in line with departmental information • Requires an indeterminate number of columns • Eight columns for each employee in the department • Key is deptno • Cannot add a new employee without a department • Could easily have duplicate employees • Would need to know that TURNER is employee number 2 in department 30 • Suppose one department was much bigger than the rest • Leads to lots of columns with most rows having many null values DEPTNO DNAME LOC EMPNO1 ENAME1 JOB1 MGR1 HIREDATE1 SAL1 COMM1 EMPNO2 ... ------ ---------- -------- ------ --------- --------- ---- --------- ----- ----- ------ 10 ACCOUNTING NEW YORK 7782 CLARK MANAGER 7839 09-JUN-81 2450 7839 ... 20 RESEARCH DALLAS 7566 JONES MANAGER 7939 02-APR-81 2975 7902 ... 30 SALES CHICAGO 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 7844 ...

  8. Many to Many Relationships • Cannot be directly implemented as two tables • Number of foreign key columns is indeterminate • Similar to problems shown on previous slide • Link employees to projects by inserting foreign empno columns into project table • If a project has 3000 employees it would require 3000 empno columns • One project may have many more employees than others • Results in many null values for most rows • Link projects to employees by inserting foreign proj_id columns into emp table • If an employee has 50 projects it would require 50 empno columns • One employee may be on many more projects than others • Results in many null values for most rows EMP PROJECT

  9. Many to Many Relationships (continued) PROJ_ID P_DESC EMPNO1 EMPNO2 EMPNO3 EMPNO4 EMPNO5 EMPNO6 EMPNO7 EMPNO8 EMPNO8 ... ------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ 1 ARTS 7566 7900 7942 7369 7654 7902 7876 7482 7321 ... 2 DESIGN 7600 7942 3 IT 7988 7600 7788 : : : : : • Many columns – possibly many NULLs • Cannot add an employee to a project if all columns are full • Difficult to handle change of EMPNO EMPNO ENAME PROJ_ID1 PROJ_ID2 PROJ_ID3 PROJ_ID4 PROJ_ID5 PROJ_ID6 PROJ_ID7 ... ----- ------ -------- -------- -------- -------- -------- -------- -------- ------- 7399 COX 3 5 9 8 2 12 14 ... 7942 MILLS 1 7001 CARVER 4 3 : : : : • Cannot add a project to an employee if all columns are full • Difficult to handle change of PROJ_ID

  10. Many to Many Relationships (continued) EMP PROJECT PROJ_EMP • The link entity , PROJ_EMP, has a primary key of PROJ_ID,EMPNO and foreign keys PROJ_ID (referencing PROJECT) and EMPNO (referencing EMP) • Easy to make changes to data EMPNO ENAME ----- ------ 7399 COX 7942 MILLS 7001 CARVER : : PROJ_ID EMPNO ------- ------ 1 7942 1 7566 1 7900 1 7942 1 7369 1 7654 1 7902 1 7876 1 7482 1 7321 2 7600 2 7942 3 7988 3 7600 3 7788 : : PROJ_ID P_DESC ------- ------ 1 ARTS 2 DESIGN 3 IT : :

  11. FanTraps • Arise from M:1 and 1:M relationships through one entity Subsidiary Company Department Employee • Diagram give rise to three tables : Department Company Employee

  12. Data Patterns in Fan Traps • Green is employee number E7 • In which department does Green work? • The possibilities fan out into D1, D2 and D3 from C2 E1 E2 E3 E4 E5 E6 E7 E8 D1 D2 D1 D2 D3 C1 C2

  13. Significant and non-Significant Fan Traps • Fixing the model is easy • The hierarchy is changed Subsidiary Company Department Employee

  14. Non-Significant Fan Traps • A very similar construct, but gives no problem • There is no specific relationship between Stock and Sales Office • Business rules are that a Sales Office can receive ANY of the Stock from its Warehouse • The fan trap is non-significant Warehouse Stock Sales Office

  15. Resolving Relationships • Key requirement • Which jockey was riding which horse in which race? Horse Jockey Race

  16. Resolving Relationships (continued) Race Horse R1 H1 R1 H2 R3 H1 R2 H3 R4 H3 Jockey Horse J1 H1 J1 H2 J3 H1 J3 H3 Horse Horse/Jockey Race/Horse Horse H1 H2 H3 Race Jockey Race Jockey R1 J1 R1 J3 R3 J1 R2 J3 R4 J3 Jockey J1 J2 J3 Race R1 R2 R3 R4 Race/Jockey

  17. Chasm Traps • Hospital Example – existing observed data pattern Hospital Clinic Doctor Leeds Black Orange Yellow Brown GreenBlue Red Pink Eye Nose Throat • Gives rise to the following relationships Clinic Hospital Doctor

  18. Chasm Traps (continued) • Doctors can be tracked through to their hospital

  19. Chasm Traps (continued) • But another doctor called White (doctor number 500) is discovered • He works at Leeds but is not attached to a specific clinic • Gives rise to a chasm trap – there is no route through from doctor to hospital via clinic • The situation is now better represented as : • This shows a potential chasm trap Clinic Hospital Doctor

  20. Chasm Traps (continued) • Model fixed by adding a relationship between doctor and hospital Clinic Hospital Doctor • Results in an extra foreign key column in the doctor table

  21. Recursive Relationships An employee may manage one or more other employees An employee may be managed by one and only one manager Emp EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ----- ------ --------- ---- --------- ---- ---- ------ 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 7839 KING PRESIDENT 17-NOV-81 5000 10 7934 MILLER CLERK 7782 23-JAN-82 1300 10 7566 JONES MANAGER 7839 02-APR-81 2975 20 7902 FORD ANALYST 7566 03-DEC-81 3000 20 7876 ADAMS CLERK 7788 12-JAN-83 1100 20 7369 SMITH CLERK 7902 17-DEC-80 800 20 7788 SCOTT ANALYST 7566 09-DEC-82 3000 20 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7900 JAMES CLERK 7698 03-DEC-81 950 30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30

  22. Arcs • Each instance of A is related to one instance of either B or C or D • Cannot take part in more than one of the relationships • Allows different physical designs • Known as EXCLUSIVE-OR relationship B A C D

  23. Arcs – Alternative Table Designs A1 A2 • Table A1 has a column for each foreign key • Leads to lots of Null values • Need complex integrity checking to avoid more than one foreign key column having a non-null value in each row • New foreign key columns needed if extra relationships are added to the arc • Table A2 has only one column of foreign key values plus a column to identify which parent table is related to the row • No null values – easier integrity checking • More complex joins (need to equate two columns (rel and id)) • Greater flexibility if new relationships added • No need to add columns to A2 • Keys in B,C and D must be of the same datatype or

  24. Arcs – More Alternative Table Designs • A1 could be separated out into three tables • One for each relationship • Advantages • No need to scan the full A table when looking for rows in A of a particular type • Disadvantages • No easy way of viewing all rows of type A • Need to union A1, A2 and A3 • Effectively this is a form of subtyping (of A) A1 B A2 C A3 D

  25. Arcs – Even More Alternative Table Designs • B,C and D could be regarded as similar entities if their data is similar • Combine them into one table (Z) • Effectively collapsing subtypes into their supertype • Advantages • No need for separate foreign key columns in the A table • Simple joins • Disadvantages • Harder to distinguish between rows in A of type B, C and D • Bigger data sets to process • B, C and D must have keys of same datatype Z B A C D

  26. Cyclic Relationships Key : Sta,D Key : Sta • Leads to redundant data State District Street City Key : Sta,D,C,Str Key : Sta,D,C

  27. Cyclic Relationships (continued) Primary key of street table Additional foreign key from direct relationship with state Inconsistencies can arise due to redundant relationship Foreign key via state, district, and city

  28. Cyclic Relationships (continued) • Inconsistency easy to spot • The two Sta# columns in Street are not equal • But what if surrogate keys are used? District State Street City

  29. Surrogate Keys • Not easy to see that key value Sur_C5 has an inconsistency with Sta4 in the Sta# column • The surrogate key Sur_C5 represents C1, D1, Sta3 Street City Surrogate key – unique for all cities

  30. Sub-Typing Methods • Consider the following diagram • Has a supertype consisting of two subtypes • Hourly-paid employees and salaried employees EMP SAL HOURLY • How could this be physically implemented?

  31. Alternative 1 • One table • Either fill in salary field or hourly field – but not both • Lots of NULLs (either Sal_code or Hourly_rate must be NULL) EMP • Advantage • Easy design • Disadvantage • Wasted space • Need to write triggers to preserve integrity

  32. Alternative 2 • Two tables • One for hourly paid employees, one for salaried employees EMP_SAL EMP_HOUR • Advantage • Easy design (no NULLs) • Disadvantage • No easy way to work with all employees • Best used when applications do not use both tables

  33. Alternative 3 • Three tables • One for common attributes (supertype) and two subtypes PK(empno) EMP EMP_HOUR EMP_SAL • Advantage • Collection of common attributes • Disadvantage • No easy way to know if named employee is salaried or hourly paid

  34. Alternative 4 • Four tables • Same as alternative 3 but with EMP_TYPE to allow more generic constructs EMP_TYPE EMP EMP_HOUR EMP_SAL • Advantage • Easy to write applications against • Disadvantage • E_code is redundant data in EMP

  35. Alternative 4 • Additional tables needed if extra types introduced EMP EMP_SAL EMP_HOUR EMP_COMM

  36. Alternative 5 • Five Tables – Generic solution • Advantage • Easy to modify structure (could classify by different criteria e.g. job type) • Users can add types and change their definition without DBA intervention • Disadvantage • Beyond the comprehension of mortals • Slow EMP EMP_TYPE EMP_TYPE_ATTR EMP_ATTR ATTRIBUTE

  37. Structure of the Five Tables EMP_TYPE E_code PK E_Type_Desc NN EMP Empno PK E_code FK(EMP_TYPE) Last_name NN First_name NN Still put all common attributes here ATTRIBUTES A_Code PK A_Desc NN EMP_ATTR Empno PK, FK(EMP) A_Code PK, FK(ATTRIBUTE) Value NN E_Type FK(EMP,FK(EMP_TYPE_ATTR) This makes sure that EMP has correct attributes for type EMP_TYPE_ATTR Empno PK, FK(EMP) This declares what attributes A_Code PK, FK(ATTRIBUTE) are valid for each type

  38. Weak Entities • What is a weak entity • Any occurrence (row in the table) must have a parent • For example a clinic MUST be in a hospital Unique key (H#,C#) Hospital (0,n) has (1,1) • The key of hospital becomes part of the key for clinic (H#) • C# alone cannot identify a row • New clinics cannot be added without a value for H# Clinic

  39. Example of Non-Weak Entity • Clinic can be identified by C# • H# is not part of the primary key • H# is merely a foreign key • Possible to add a clinic not linked to hospital Hospital (0,n) has (0,1) 106 Clinic Unique key

  40. Transitive Dependencies • Suppliers have ‘city’ addresses • Each city has a status (relevant importance) • Status is transitively dependent on Supplier Id Supplier Id City Status

  41. Nutwood Hospital – Entity Modelling Exercise • The hospital runs clinic sessions during which patients can book appointments to see their doctors • Only one doctor is present at each appointment • Each clinic session is held in a single clinic • Patients may also undergo operations • Only one doctor will perform each operation • A series of operations are performed during one theatre session • Each theatre session is held entirely within one operating theatre. • Doctors may be authorized to work in one or more theatres

More Related