730 likes | 920 Views
DAMA, 2001 December. ORMvER. What’s Wrong With ER Modeling ?. Gordon C. Everest Carlson School of Management University of Minnesota. Problems and Solutions. ORMvER. OBJECTIVES FOR THIS PRESENTATION:
E N D
DAMA, 2001 December. ORMvER What’s Wrong With ER Modeling? Gordon C. Everest Carlson School of Management University of Minnesota
Problems and Solutions ORMvER OBJECTIVES FOR THIS PRESENTATION: • Show several PROBLEMS with ER modeling schemes,(actually, any “record-based” modeling scheme). • Identify the ROOT CAUSE of the problem To stop there would be irresponsible, so… • Show you a better way – a SOLUTION using Object Role Modeling (ORM) • NOT asking you to abandon what you have learned about data modeling and are doing in practice • BUT to defer thinking in terms of entity records, andto begin doing data modeling at a richer, more conceptual level
Data Modeling DMOD BEFORE WE CAN ANSWER THAT: Why Do Data Modeling? How do we do Data Modeling? Why do we need Normalization? What is the Dominant Data Modeling Scheme today? What’s Wrong with ER Modeling?
Database DesignObjective: (WHAT we are trying to do) DMOD TO ACCURATELY AND COMPLETELY MODEL SOME PORTION OF THE REAL WORLD UNIVERSE OF DISCOURSE (UoD) OF INTEREST TO SOME ORGANIZATION OR COMMUNITY OF USERS.
Logical Database DesignObjective, Principles, Benefits DMOD • OBJECTIVE of LOGICAL DATABASE DESIGN:TO ACCURATELY AND COMPLETELY MODEL SELECTED PORTIONS OF THE REAL WORLD OF INTEREST TO A COMMUNITY OF USERS. • USERS (COLLECTIVELY) WILL ALWAYS KNOW MOREABOUT A DATA STRUCTURE THAN THE SYSTEM KNOWS, OR THAN COULD BE DEFINED TO THE SYSTEM. • WHAT IS NOT FORMALLY DEFINED TO THE SYSTEM,THE SYSTEM CANNOT MANAGE . . . THE USERS MUST! • THEREFORE, NEED TO CAPTURE RICH SEMANTICSWITH COMPREHENSIVE DATA MODELING and DEFINITION, INCLUDING INTEGRITY CONSTRAINTS AND OPERATIONS. • FOR ==> GREATER QUALITY & RELIABILITY IN DATA • ==> GREATER USER CONFIDENCE. • ==> HIGHER USER / DEVELOPER EFFICIENCY • Let the ‘system’ do it!
Purpose of Data Modeling (WHY we do it)DUAL, CONFLICTING PURPOSES DRIVE THE PROCESS: DMOD USER • Facilitate Human Communication, Understanding, & Validation • capture and present meaning, the semantics of a model • direct representation of only essential model semantics • PRESENTATION CHARACTERISTICS: • scoping and presenting subparts of a Model • unfolding presentation at different levels of abstraction or detail • visual prominence in proportion to semantic importance • SECONDARY: • Basis for Implementation - defining & creating a Database • complete in all the necessary details • construction/generation able to be fully automated SCHEMA DATABASE
Modeling(Re).present.(ation) DMOD Knowledge externalized, formalized, shared. Knowledge in the head (mental models) Knowledge in the world Reality MODEL MODELING PROCESS present Re.present What drives or guides the process?
The Modeling Process DMOD MODELING SCHEME Context Constructs Composition Constraints METHODOLOGY: Steps/Tasks + Milestones + Deliverables + Real World Universe of Discourse perception selection/filtering MODELING PROCESS REPRESENTATIONAL FORMS: Narrative, Graphical Diagram, Formal Language Statements (the Syntax) MODEL
A Data Modeling “Scheme” DMOD DEFINES the: • Context • Constructs (ENTITIES, OBJECTS) • Collections, Compositions, Connections (RELATIONSHIPS) • Constraints, Characteristics WE LOOK FOR IN THE “REAL” WORLD UoD or Domain of Interest and USE IN BUILDING A DATA MODEL.
Data Modeling Constructs DMOD What to look for: Relative emphasis differentiates Data Modeling approaches RELATIONSHIP ENTITY (OBJECT) [ FOREIGN KEY ] IDENTIFIER ATTRIBUTE characteristics characteristics
Student-Course Database - Table Diagram Diagram of the Schema: DMOD COURSE Course# Title Description Credits COURSE OFFERING Course# Year Term Section Building Room Days Time Start Control Enrollment Instructor SSN STUDENT Student ID Name Address Major GPA REGISTRATION Course ID Student ID Grade INSTRUCTOR SSN LastName FirstName Address Phone Dept LEGEND: ENTITY NAME (upper case) Identifier (bold face) Attributes (not bold face) Foreign Key Identifier M:1 relationship What if you move the arrow head to the other end of the arc?
Student-Course Database – Populated DMOD Actualinstances of data values: STUDENT: StudentID Name Major GPA COURSE: Course# Title Credits COURSE OFFERING: CRSO# Course# Year Term Sect Room InstrID Enroll REGISTRATION: CRSO# StudentIDGrade INSTRUCTOR: InstrID Name Dept Secondary (Composite) Key
IDENTIFIER ATTRIBUTE ATTRIBUTE ... Data Modeling – Schema DiagramTHINKING ABOUT ATTRIBUTES: DMOD Record-Based: ENTITY
identifier Attribute1 Attribute2 Attribute3 ENTITY1 relationship> ENTITY2 ENTITY IDENTIFIER ATTRIBUTE ATTRIBUTE ... Attribute Essentials of ER Modeling / Diagramming DMOD RELATIONSHIP ENTITY1 ENTITY2 1 M Adding Attributes, omitting the Diamond: ENTITY1 ============ IDentifier 1 --------------------- Attribute 1.1 Attribute 1.2 Attribute 1.3 : ENTITY2 ============ IDentifier 2 --------------------- Attribute 2.1 Attribute 2.2 Attribute 2.3 ForeignID 1 :
ORMvER What’s wrong with ER Modeling? ________
ER / Record-based Modeling DMOD VALUE DOMAIN CLUSTERING of ATTRIBUTES into RECORDS/RELATIONS • NOT a necessary or desirable first step • gets us into trouble: if too much, must decompose to normalize VALUE DOMAIN VALUE DOMAIN VALUE DOMAIN ... roles ID ATTRIBUTES . . . TABLE: X A B C D
X A B C Record-based DesignWHAT SEMANTICS ARE PRESUMED BY THE FOLLOWING RECORD STRUCTURE? ORMvER • What does it say aboutX ? • What does it say aboutA? • What does it say about the relationshipX – A? • What does it say about the relationshipA – B? There are at least 14 distinct semantic statements you can make in answering these questions! • Do we know it is in Third Normal Form (3NF)? How?
X A B C Record-based DesignWHAT DOES IT SAY ABOUT X ? ORMvER
X A B C Record-based DesignWHAT DOES IT SAY ABOUT A ? ORMvER
X A B C Record-based DesignWHAT DOES IT SAY ABOUT THE RELATIONSHIP X–A ? ORMvER
X A B C A D ... Record-based DesignREPRESENTING THE RELATIONSHIP X–A ? ORMvER N
X A B C Record-based DesignWHAT DOES IT SAY ABOUT THE RELATIONSHIP A–B ? ORMvER
X A B C B A? ... A ... Record-based DesignREPRESENTING COMPLEX RELATIONSHIPS AMONG X, A, & B . ORMvER ? Separately consider the relationshipbetween A and B. What if it is many-to-many? What if other information is functionally dependent on A–B ?
X Y A B C Record-based Design - Compound KeyWHAT IS PRESUMED BY THE FOLLOWING RECORD STRUCTURE? ORMvER
Major Data Modeling Schemes DMOD Everest-DM-4p.121. (1) SINGLE FILE (E-A) • FLAT FILE “TABLE” • HIERARCHICAL - nested repeating groups e.g., COBOL (M) MULTIFILE (E-R → E-A-R) • NETWORK - hierarchical records • RELATIONAL (E-A-[R]) - flat records (O) NO FILE (O-R) (No Clustering of Data Items into Records) • NIAM/“Binary” Modeling • ORM (Object-Role Modeling - Halpin) RECORD-BASED (Clustered Data Items)
Data Modeling Schemes DMOD CLASSIFIED by Degree of Clustering: • No clustering • NIAM/ORM - Nijssen, Halpin • Clustering to One Level => Atomic Data Values • Relational Modeling - Codd • ER Modeling - Chen • Extended ER (EER) - Teorey • Information Engineering (IE) – Clive Finkelstein -> James Martin • Oracle (Designer*2000) - Barker • IDEF1X - Appleton, US Gov’t, ERwin (tool), Bruce (book) • • Nested Objects • Hierarchical data structure (single file; COBOL) • CODASYL Network (ANSI NDL) • Nested Relations • Semantic Object Modeling (SOM) – Kroenke, Salsa (tool) • Object Modeling (UML) – Rational Rose (tool) • ANSI SQL:1999
=> semantic/ OBJECT models Data Modeling Schemes – Clustered DMOD • single file • nested repeating groups • implicit hierarchical relationships HIERARCHIC special case • multifile, hierarchical record • defined relationships NETWORK • Focus on E & R, hidden record structure • Usually flat records [optionally with attributes] • Defined relationships (general M:N) • Usually restricted to binary relationships ER • Multifile; flat records only • Relationships as foreign keys so no M:N relationships RELATIONAL
Taxonomy of “Clusterered” Data Structures DMOD SingleFile MultipleFiles Clustered SINGLE FLATFILE (“TABLE”) RELATIONAL(“TABLES”) Flat Intra-RecordStructure HIERARCHICALFILE (CODASYL)NETWORK Nested
Stages of Data Modeling DMOD CONCEPTUAL USER Start at the highest Conceptual Level! Domain Knowledge ER CLUSTERED “LOGICAL” • ORM • Objects • Obj. ID’s • Roles/Relships • (Fnl. Dep) • NO clustering • => NO “attributes” Attribs in Records MultiValued, Nested - - - - - -> Ternaries - - - - - -> M:N - - - - - - - - - -> Normalized (2,3,4) Relationships - - -> w/attributes Sub/SupTypes RELATIONAL PHYSICAL Flat (1NF) Binary only 1:Many only Primary Keys Foreign Keys • Implementation in/for a DBMS • Denormalize • (for performance) • + triggers, stored • procedures SCHEMA DATABASE
Data Modeling - Representation Stages DMOD USER A SECOND CUT: • Conceptual (ORMHALPIN/NIJSSEN SUMMFULTON UDMCDMTG) • only what the user knows or needs to know • functional dependencies fully represented • Elementary Facts - no clustering of “attributes” into “records” • Clustered(ERCHEN EERTEOREY SDMMcLEOD SOMKROENKE SQL:99ANSI UML) • identifiers (attributes or dependent relationships) • keep: M:N, ternary relationships, super/subtypes, attributed relationships, multi-valued items/rgroups • “Logical” (RELATIONALCODD SQLANSI ) • flat files/tables; – stored identifiers; – 3NF (decompose) • resolve: M:N, ternary, super/subtype relationships • foreign keys to represent relationships • Denormalize(Recluster) - for performance • Physical (IMPLEMENTATION in a DBMS) • triggers, stored procedures, user code torepresent and enforce semantics beyond the DBMS. NEW SCHEMA DATABASE
Data Modeling Schemes - ER DMOD • ENTITIES, that have ATTRIBUTES, and participate in RELATIONSHIPS. • Originated with Peter Chen, 1976, TODS (1:1) • Notation has evolved, many variations • Drop diamond; attributes inside entity box or suppressed. • No standard syntax notation (but similar semantics) • Common: attributes clustered into entity records. • Most popular today • Weak entity - Association entity - • Relationship naming: one name, direction unstated,thus ambiguous; need direction (>) or rule (eg. left to right). M 1 works in EMPLOYEE DEPT … … EmpNo UnitNo EmpName Name
EMPLOYEE EmpNo (#) EmpName (*) Address (o) Data Modeling Schemes - Oracle DMOD • In Oracle Designer*2000 tool (R. Barker, A-W, 1990) • A flavor of ER modeling • ENTITY in rounded box; optionally ATTRIBUTES inside • ATTRIBUTE flags: # - [part of] identifier* - mandatory o – optional • RELATIONSHIPS: - binary only - two names at end from which to be read- optional ---, mandatory —, many - identifying ———, fixed ——— works in DEPT employs
Data Modeling Schemes - IE DMOD • Information Engineering (1970’s) • Due to Clive Finkelstein, adapted by James Martin • Used in several tools: IEF, IEW/ADW/Cool, ICES, … • Widely used, many variations, no single standard • ENTITIES: in boxes, optionally with ATTRIBUTES, in or out • RELATIONSHIPS: - usually binary only - many ——— , at most one ——— - optional ——— (at the “other” end) - mandatory, at least one ——— EMPLOYEE DEPT
EMPLOYEE DEPT EmpNo EmpName SS# (AK1) Address (O) UnitNo (FK) DeptNo DeptName : Data Modeling Schemes – IDEF1X DMOD • U.S. Air Force/Defense (1970’s), Appleton eXtensions • NIST (U.S. Govt) standard – 1993; revised in IDEF1X97; IEEE - 1998 • Book by T. Bruce, 1992; Used in ERwin (now from CA), Visio, … • Widely used in and for U.S. Govt work, some outside • Some Relational restrictions: Foreign Keys, thus no M:N • “Unnecessarily complex, confusing, and forgettable” - Halpin • ENTITY: independent - , dependent - • ATTRIBUTE flags: - Alternate Key - (AKi), Foreign Key - (FK) - optional (O) – mandatory is default • RELATIONSHIPS: - binary only, “child” ——— (may be arbitrary) - First Name always read toward the child - identifying —— , non identifying ------ “cardinality” on child: P - one or more, Z - zero or one, n - exactly n----- Parent is optional (some allow many parents) employs/works in
Forming a Relational Data Structure RELSQL Some rules: • Define a TABLE or “Relation” for each Entity type • Types of Entities: base/reference, dependent (“weak”), association/intersection, event/transaction • Assumes mutually exclusive (non-overlapping) populations • SINGLE-VALUED ITEMS (“flat” tables) • If multivalued or nested repeating group of items,put into a separate table • IDENTIFIER for every table (entity “integrity”) • FOREIGN IDENTIFIERS to represent all relationships 1:M - stored in the child / dependent entity 1:1 - should probably merge into one table M:N - must introduce an association/intersection table • NORMALIZE to second and third normal form • important for good design • but not enforced by RDBMS... WHY?
X A … Functional Dependency in RelationshipsBasis for Database Normalization. RELSQL A f (X) is functionally dependent on X A determines X A A is dependent on X, and the Relationship is exclusive on A, multiple on X. Clustered into a Record/table for entity of X: There can only be oneA for each X . There can be multiple Xs for a given A . There can be differentAs for the Xs .
Database Normalization RELSQL Start with ENTITIES, their IDENTIFIERS (unique keys)and their ATTRIBUTE FIELDS (facts about each entity).i.e., start with data items clustered into records/tables. PROBLEM: we may do it wrong; cluster too much; some items in the wrong place, which can lead to redundancy & update anomalies. Any Flat File is a Relation, but… not all Relations are “well-formed.” • NORMALIZATION is the test • a set of rules to perform internal validation of a data model • Record DECOMPOSITION is the remedy. • Removing attributes from the entity record, and placing them in a different, often a new entity record (1) First Normal Form: no multivalued items or rgroups. (2) Second Normal Form: no partial dependencies. (3) Third Normal Form: no transitive dependencies. “Every non-key data item must be single-valued, and dependent upon the key, the whole key, and nothing but the key… so help me Codd.”
EMPLOYEE# EMPNAME SKILL PROFICIENCY … BOSSNAME DEPT# DEPTNAME Anomalies RELSQL Resulting from (clues to) poor database design: • DEPTNAME and BOSSNAME stored redundantly • if EMPLOYEE moves to another DEPT#, DEPTNAME and BOSSNAME would also change, needing update. • If a DEPTNAME (or BOSSNAME) for a DEPT changes, must updateall occurrences, else inconsistency. • To delete a DEPT you must also delete all its EMPLOYEEs (unless null foreign keys allowed!) • If you delete the last EMPLOYEE in a DEPT, you also delete that DEPT (unless null keys allowed!…multiple?) • No place to insert a DEPT# and its DEPTNAME, if there are no EMPLOYEEs there.
Summary of all Normal Forms RELSQL GIVEN: • a set of attributes, clustered into tables/records with identifiers • all functional dependencies on the attributes • No multi-valued, non-key attributes (1NF) • No partial dependencies on non-key attributes (2NF) • No transitive dependencies in non-key attributes (3NF) • No partial or transitive dependencies within any key (EKNF, BCNF), i.e., consider all candidate keys. • No multiple, independent multi-valued attributes in the same table (4NF) • No join dependencies, i.e., a relation can be reconstructed without loss of information by joining some of its projections (5NF). • No more than one table with the same key (“minimal”). • No transitive dependencies across tables (“optimal”). NOTE: number order is artificial, i.e., there is no necessary sequence to the normal forms.
X A X A B X A B X A B Normalization – Testing your Understanding RELSQL Assuming that A is single valued with respect to X (i.e. 1NF). GIVEN: Could you have a violation of: (if not, why not?) 2NF? 3NF? 4NF? 2NF? 3NF? 4NF? 2NF? 3NF? 4NF? What does this diagram mean? How does this differ from diagram above, if any? MUST DISTINGUISH THE PRIMARY KEY .
EMPLOYEE PROJECT EMPL-ID PROJ-ID Representing a M:N Relationship DMOD Another Pattern: • If you cannot store multiple Projects (or Project IDs) in an Employee record, or multiple Employees (or Employee IDs) in a Project record (as is the case in a Relational Database), then … EMPLOYEE PROJECT you must introduce an “Intersection Entity” between them to represent the Many-to-Many Relationship. • The Intersection Entity also provides the place to store additional attributes of the relationshipe.g., Hours Worked, Rate of Pay, … What is the problem with this representation? N
Representing a Ternary Relationship DMOD While we can develop a consistent notation for binary relationships, ternary relationships are a problem. EMPLOYEE SKILL PROFICIENCY • If one of the entities is single valued, is it really ternary? Or “attributed” binary? • What lends uniqueness to each instance of the relationship? • How to verbalize the relationship? Which order? • How to represent Multiplicity / Exclusivity ? • How to represent Dependency? Must have all 3?
What’s Wrong with ER Modeling? ORMvER I will show you still a more excellent way – PAUL, I Cor 12.31 N
X A B C X A X B X C Record-based Design BWHAT DOES THIS “RECORD” REPRESENT? ORMvER Design minimal records with at most one non-key domain. Now what do these “records” represent?Perhaps Codd was right in naming it a _________! Avoids spurious associations, e.g., A – B …Could there be any violations of normal forms? What about the representation of the entityX ? What if A is related to other “entities”?
X A B C X A X B X C Transform Record-based (ER) DesignTO REALLY REPRESENT THE ENTITY DOMAINS ORMvER A Object Role Model: X B C
IDENTIFIER ATTRIBUTE ATTRIBUTE ... Data Modeling THINKING ABOUT ATTRIBUTES: ORMvER Record-Based (ER): ENTITY ENTITY ENTITY Object-Role (ORM): ENTITY ENTITY (id) ENTITY ENTITY ENTITY ENTITIES have ATTRIBUTES / DESCRIPTORS by playing roles in relationships with other entities.
Record-Based Modeling ORMvER GIVEN TWO FACTS (conceptually): • one about the CITY a PERSON lives in • another about the CITY a PERSON works in ASSUME: • every person has to live and work in a city • each person can live and work in only one city at a time • not interested in anything more about persons or cities EXAMPLE: • Gordon Everest lives in Falcon Heights and * works in Minneapolis DIAGRAM A CONCEPTUAL DATA MODEL • to represent this information (a database to contain these facts)
Record-Based Data Model for PERSON lives in / works in a CITY ORMvER • What is the entity and what is the attribute? • Would it make any sense to say(to a novice layperson - a user): • CITY was an "attribute" of PERSON? • Doing more than is necessary at the conceptual level PERSON • cannot have CITY and CITY as attributes of PERSON • column/attribute name reflects " entity + role " • CITY as an entity/object is lost (not its own table) • what if there is a CITY where no one lives or works • some add concept of a DOMAIN PersonID [key] LiveCity WorkCity
lives in works in Object-Role Model for PERSON lives in / works in CITY ORMvER FORML language statements: • PERSON lives in CITY • Every PERSON lives in some CITY • Each PERSON lives in at most one CITY • ... for works in PERSON (id) CITY (name) FACT
Record-Based Modelingfor an additional fact. ORMvER • A PERSON makes sales calls in multiple CITIES DIAGRAM the extended conceptual data model • can you add an attribute "SalesCallCities" to PERSON? FLAT Record-Based Modeling is even worse: • create a new table SALESCALLS with a compound key • Is this a real entity in the conceptual view? EXTEND THE OBJECT-ROLE DATA MODEL