310 likes | 410 Views
CEIE 685 Database Design. Instructor Mohan Venigalla http://mason.gmu.edu/~mvenigal. Agenda/Topics 3/9/06. Midterm after the break Closure on keys and normalization BCNF DKNF Relationship traps Steps in database design Theory Learn them by implementing (Access workshop).
E N D
CEIE 685Database Design Instructor Mohan Venigalla http://mason.gmu.edu/~mvenigal CEIE 685 Database Design
Agenda/Topics3/9/06 • Midterm after the break • Closure on keys and normalization • BCNF • DKNF • Relationship traps • Steps in database design • Theory • Learn them by implementing (Access workshop) CEIE 685 Database Design
Closure on Normalization • Normalization 1-3rd NF - Summary • 1NF – removes (more like, masks) repeated groups by: • placing a single value for the cell at which each row and column intersect • 2NF – removes partial dependencies on the primary keys • 3NF – removes transitive dependencies on primary key • There is more to normalization • Boyce-Codd Normal Form (BCND) – removes remaining anomalies form the tables CEIE 685 Database Design
Stages of Database Application Cycle Database Planning Adapted from:Connolly and Begg, Database Systems: A Practical Approach to Design, Implementation and Management System Definition Requirements Collection Conceptual Database Design DBMS Selection (Optional) Application Design Logical Database Design Physical Database Design Prototyping Implementation Data Conversion & Loading Testing Operational Maintenance CEIE 685 Database Design
Database Design • Three main phases • Conceptual database design • Conceptual representation • i.e. identification of entities, relationships and attributes • Logical database design • Translate conceptual representation to logical structure • Includes designing the tables and • Normalization • Physical database design • Decide how logical structure is physically implemented • DBMS specific CEIE 685 Database Design
Definitions • Conceptual db design: • The process of constructing a model of the information used in an enterprise, independent of all physical considerations. • Logical db design: • The process of constructing a model of the information used in an enterprise based on a specific data model, but dependent of a particular DBMS and other physical considerations • Physical db design: • The process of producing a description of the implementation of database on secondary storage; it describes: • The base relations, file organizations, and indexes used to achieve efficient access to the data. • Associated integrity constraints and security measures CEIE 685 Database Design
Critical Factors for Successful Design • Work interactively with users as much as possible • Follow a structured methodology throughout the data modeling process • Employ data driven approach • Imagination is only good to some extent! • Incorporate structural and integrity constraints into the data models • Combine conceptualization, normalization, and transaction validation techniques into the data modeling methodology • Use diagrams to represent as much of the data models as possible • Use a database design language (DBDL) to represent additional data semantics that cannot easily be represented in a diagram • Build a data dictionary to supplement the data model diagrams and the DBDL; and • Be willing to repeat steps CEIE 685 Database Design
Design Phase IConceptual Database Design CEIE 685 Database Design
I - Conceptual Database Design • Major steps • Identify entity types • Identify relationship types • Identify and associate attributes with entity or relationship types (ER Model/Data Model) • Determine attribute domains • Determine candidate and primary keys attributes • Consider the use of enhanced modeling concepts (optional) • Check model redundancy • Validate local conceptual model against user transactions • Review local conceptual model with user CEIE 685 Database Design
Conceptual Design – 2 • Identity entity type • What are the tables/entities going to be? • Their names and aliases, and descriptions • A conceptual outline/description of the tables/entities • Identify relationship types • How will the ER diagram look like? • How are the tables related to each other? • Will there be any relationship traps? • What will be the multiplicity constraints of the relationships? • Identify and associate attributes with entity or relationship type • Simple/composite attributes • Single/multi-valued attributes • Decisions on derived attributes – need to store them or a part of the data “processing” • What are the potential problems? CEIE 685 Database Design
Conceptual Design – 3 • Determine attribute domains • What will be the range of values? • Male – M, or 01; Female – F or 02? • Coded data, say, 1 thru 9. Then, how to represent “Not Applicable?” Ans:____ • Numbered data, say, min 1 and max 9 • What will be the allowable set within this range of values? • What will be the sizes and formats of attributes? • Determine candidate and primary key attributes • Identify a set of candidate keys • Then select the primary key based on these guidelines: • The candidate key with minimal set of attributes • The candidate key that is LEAST likely to change its value • The candidate key with fewest of characters • The candidate key with smallest max value • The candidate key that is the easiest to use – from the users’ point of view! • Consider the use of Enhanced ER modeling concepts (Optional) • Aggregation/composition • Specialization/generalization CEIE 685 Database Design
Conceptual Design – 4 • Check model for redundancy • Reexamine one-to-one relationships • Two tables, identical setup of, candidate keys, primary keys and attributes with almost same cardinality – likely duplicates • Remove redundant relationships • Merge identical tables and remove duplicate attributes • Validate local conceptual model against user transactions • Transaction pathways are much like the arrows we have used to identify the “flow” of data among entities • Recall the snack-food store example • Will the user agree with your interpretation? • Review local conceptual model with the user • Sit with the users and explain to them your conceptual model • Seek their inputs and modify the model as necessary! CEIE 685 Database Design
Design Phase IILogical Database Design CEIE 685 Database Design
II - Logical Database Design • Two stages • Building and validating local logical model • Building and validating global logical model • Stage 1: Local logical model • Remove features not compatible with the relational model (optional step) • Derive relations for local logical model • Validate relations using normalization • Validate relations against user transactions • Define integrity constraints • Review local logical data model with the user CEIE 685 Database Design
Logical Database Design – 2 • Stage 2: Build and validate global data models • Merge local logical data models into global model • Validate global logical data model • Check for future growth • Review global logical data model with the users • Local vs. global (Stage 1 vs. Stage 2): • The local model represents different (or individual) user views • What payroll sees vs. what the accounting sees • The global model represents ALL user views • What will be the requirements of all departments combined? CEIE 685 Database Design
Local Logical db Design • Remove features incompatible with relational model. Here are some no-nos • Many-to-many binary relationship types • Many-to-may recursive relationship types • 1:1 recursion is a OK • Complex relationships • Typically, a relationship connecting three or more entities • Multi-valued attributes • Derive relations for local logical data model • In this step we actually derive tables to represent entities, relationships, and attributes defined in the user view. • To derived or decided here include • Strong and weak entity type • Binary relationships of multiplicity 1:*, 1:1 • Mandatory or voluntary participation on both sides of 1:1 relationship • 1:1 recursive relationships CEIE 685 Database Design
Local Logical db Design – 2 • Validate relations using normalization • Identify functional dependencies • Identify primary key(s) • Derive 1NF, 2NF, 3NF and higher orders, as fit • 1NF – removes repeating groups • 2NF – removes partial dependencies on the primary key • 3NF – removes transitive dependencies on the primary key • BCNF – eliminate all anomalies • Define Integrity constraints • Five types • Required data • Attribute domain constraints • Entity integrity • Referential integrity • Enterprise constraint CEIE 685 Database Design
Integrity Constraints • Required data • Some attributes must always contain a valid value – no nulls • Example: every employee has a staffno and a position title! • If missing – do not enter it into the database, period! • Attribute domain constraints • What do you mean you want to enter a “N/A” code for the attribute “Gender”? • Entity integrity • NO, I repeat, NO records should have a null in the primary key CEIE 685 Database Design
Integrity Constraints – 2 • Referential integrity: • When the related or same attribute values exist in two or more relations • Example: • PROJECTS is an entity EMPLOYEES is another. EMPLOYEE-PROJECT assignment is another entity. When a project is completed, in table EMPLOYEE-PROJECT there may not be any entries for completed projects (they are deleted) • For referential integrity, there is a parent relation and there is/are one or more child relations linked by an attribute set called foreign key! • Participation of the child relation in the relationship is • Mandatory – then nulls are NOT allowed for foreign key • Optional – then nulls are allowed CEIE 685 Database Design
Referential Integrity Constraints • Six possible cases: • (When) Inserting a tuple into child relation • For mandatory participation • The foreign key has to have a valid value • That valid value must also exist in parent relation • For optional participation • The parent relation may or may not have an equivalent value in the parent • The foreign key can also have a null • Deleting a tuple from child relation • Will not affect the parent relation at all • Updating a foreign key of child tuple • See how it operates for case (1) above. Update is much similar • Insert tuple in to parent relation • Won’t affect the referential integrity with the child CEIE 685 Database Design
Referential Integrity Constraints - 2 • Six possible cases (continued): • (When) Deleting a tuple from parent relation - Houston, we have a problem!! • Need to be absolutely careful here! • Question – what happens? • Answer:__________________________ • Several strategies – let the system to the following • NO ACTION: What is it?_____________________ • CASCADE: What is it?______________________ • SET NULL: What is it? _____________________ • SET DEFAULT: What is it?__________________ • NO CHECK: What is it? ____________________ • (When) Updating the primary key of the parent tuple • Referential integrity is lost!! CEIE 685 Database Design
Integrity Constraints – 3 • (This item is after the referential integrity constraint – which was covered in more depth) • Enterprise constraints • Also called business rules • Updates to entities • Constrained by enterprise rules governing the “real world” transactions • Example: No person should be managing more than 5 projects at any given time. • Documentation • Documentation of all the integrity constraints is an important step in the logical design process!! CEIE 685 Database Design
Global Logical db Design • This is the stage 2 in the Logical design • (4 steps in this stage – see the slide earlier) • Merge local logical data models into global model • Some typical tasks • Review the names and contents of entities/relations and their candidate keys • Review the names and contents of relationships and foreign keys • Merge entities/relations from the local models • Include, without merging, entities/relations unique to each local data model • Merge relationships/foreign keys form the local data models • Include (without merging) relationships/foreign keys • Check for missing entities/relations and relationships/foreign keys • Check foreign keys • Check integrity constraints • Draw global ER diagram • Update documentation CEIE 685 Database Design
Global Logical db Design – 2 • Validate global logical data model • Check for future growth • Are there any significant changes likely in the foreseeable future? • How can the global logical data model can accommodate these changes? • Review global logical data model with users CEIE 685 Database Design
Design Phase IIIPhysical Database Design CEIE 685 Database Design
III - Physical Database Design • Stage 1: Translate global data model for target DBMS • Design base relations • Design representation of derived data • Design enterprise constraints • Stage 2: Design physical representation • Analyze transactions • Choose file organizations • Choose indexes • Estimate disk space requirements • Stage 3: Design user views • Stage 4: Design security mechanisms • Stage 5: Consider the introduction of controlled redundancy • Stage 6: monitor and tune the operational system CEIE 685 Database Design
Translating Global Data • Design base relations • The name of the relation • A list of simple attributes (in parenthesis) • The primary key, and, where appropriate, alternate keys and foreign keys • A list of derived attributes and how they should be computed • Referential integrity constraints for any foreign keys identified • Design representation of the derived data • How will you represent the derived or computed data • Design enterprise constraints • Can be implemented as a part of SQL • Example: CONSTRAINT StaffNotManagingTooManyProjects CHECK (NOT EXISTS (SELECT blah blah)) CEIE 685 Database Design
Design of Physical Representation • Objectives: • To determine optimal file organizations to store the base relations and the indexes • How do you measure efficiency? • Transaction throughput • Response times – elapsed time for completion of single transaction • Disk storage • Have to understand system resources • Memory • CPU • Disk I/O • Networking CEIE 685 Database Design
Design of Physical Representation – 2 • Analyze Transactions • Why? To identify performance criteria such as: • The transactions that are run frequently will have a significant impact on the performance • The transactions that are critical to the operation of business • Map all transaction paths to relations (a cross-tab of sorts) • Determine which relations are most frequently accessed by transactions • Analyze data usage of selected transactions that involve these relations CEIE 685 Database Design
Design Physical Representation – 3 • Choose file organizations • Objective: to determine an efficient file organization for each base relation • Methods: • Heap • Hash • Indexed Sequential Access Method (ISAM) • B+-tree • Clusters • Choose indexes • Index Candidates are the attributes that are used most often: • For join operations or search operations • To access the tuples in a relation in order of that attribute • Estimate disk space requirements CEIE 685 Database Design
Other Stages in Physical Design • Stage 3: Design user views • Draw the GUI (look and feel) • Stage 4: Design Security measures • System security • Data security • User level access • Stage 5: Consider introduction of controlled redundancy (advanced topic in design) • Stage 6 Monitor and tune the operational system CEIE 685 Database Design