350 likes | 627 Views
Flawless Logical to Physical Data Model Transformations. Bert Scalzo, PhD. Database Domain Expert Bert.Scalzo@Quest.com. About the Author …. Database Domain Expert & Product Architect for Quest Software Oracle Background:
E N D
Flawless Logical to Physical Data Model Transformations Bert Scalzo, PhD. Database Domain Expert Bert.Scalzo@Quest.com
About the Author … Database Domain Expert & Product Architect for Quest Software Oracle Background: • Worked with Oracle databases for over two decades (starting with version 4) • Work history includes time at both “Oracle Education” and “Oracle Consulting” Academic Background: • Several Oracle Masters certifications • BS, MS and PhD in Computer Science • MBA (general business) • Several insurance industry designations Key Interests: • Data Modeling • Database Benchmarking • Database Tuning & Optimization • "Star Schema" Data Warehouses • Oracle on Linux – and specifically: RAC on Linux Articles for: • Oracle’s Technology Network (OTN) • Oracle Magazine, • Oracle Informant • PC Week (eWeek) Articles for: • Dell Power Solutions Magazine • The Linux Journal • www.linux.com • www.orafaq.com
Books by Author … Coming in 2008 …
Agenda • Purpose • Identify issues arising from over reliance on modeling tools • Illustrate Top 10 most common modeling issues faced when transforming data models from logical (conceptual) to physical • Describe how to correctly identify these issues • Explain why these issues are serious problems • Provide Best Practices to resolve these issues • Overview • Primary, Unique and Foreign Keys • Inheritance (i.e. super/sub-types) • Relationship Dependencies • Normalization/Denormalization
World of Data Modeling … • Identify all data & relationships - E/R (Entity/Rel’ship) diagrams - Database independent view • Business Rules • Focus=Effectiveness • Bus. Analyst • Data Architect • Data Analyst Logical Data Modeling (.TXL file) • DBA • DB Developer • DB Architect • Database platform specific • Reverse engineer existing DB • Create/Update DB from model • Focus=Efficiency Physical Data Modeling (.TXP file) Toad Data Modeler synchronizes data models from all levels into a single tool
10 Most Common Logical to Physical Data Modeling Transformation Issues Here we go…
1. Many to Many Relationships • You can NOT physically implement many to many relationships • You may potentially miss multiple key business requirements • Many modeling tools will attempt to automatically resolve this
1. Resolution • Need to accurately model true business requirements yourself in logical… Intersection or Bridging Entity may have its own: • Attributes • Unique ID’s • Relationships • Lookup • Parent/Child
2. Avoid Partial Unique Keys • You SHOULD NOT physically implement partial unique keys • You may invalidate or corrupt key business requirements • Some Databases (i.e. Oracle) can surprise you on how works Toad Data Modeler will prevent Logically Wrong Only an issue for composite unique keys • Modeling tool generates initial physical database design • Modeler/Architect/DBA often incorrectly modifies design • Change column to allow Null to reduce constraints Database will allow unintended results (see next slide)
Issue 3: Avoid CandidateKey Loss • You SHOULD NOT lose candidate or alternate unique ID’s • You may potentially miss multiple key business requirements All these alternate or candidate keys exist due to some business requirements
3. Effect of Incorrect Change • Modeling tool generates initial physical database design: Index Count = 4 • Modeler/Architect/DBA often incorrectly modifies design • Remove indexes to increase efficiency, but now allow bad data Eliminated indexes to increase efficiency at the cost of business requirements!!!
4. Avoid Surrogate Key Loss Only an issue when replacing primary key with surrogate/artificial key • DO NOT lose unique ID’s when convert to surrogate keys • May potentially miss multiple key business requirements • This is actually a special (extended) case of prior issue
4. Design Generated by Tool • Note that the two FK’s are part of the PK
Issue 4: Effect of Incorrect Change • Modeler/Architect/DBA often incorrectly modifies design Wrong – lost alternate key Right – has new & old keys
5. Avoid Partial Foreign Keys Referential integrity requires that for each row of a child table, the value in the foreign keymatches a value in a parent key. Only an issue for mandatory, composite foreign keys • DO NOT physically implement partial foreign keys • May invalidate or corrupt key business requirements • Some Databases (i.e. Oracle) can surprise you on how works
5. Effect of Incorrect Change • Modeling tool generates initial physical database design • Modeler/Architect/DBA often incorrectly modifies design Toad Data Modeler will prevent Oracle Concepts: Partially null composite foreign keys are permitted. If any column of a composite foreign key is null, then the non-null portions of the key do not have to match any corresponding portion of a parent key. That mean’s no RI check!!! (same issue as unique keys)
6. Avoid Indirect Foreign Keys Note there is no business requirement to relate Entity_1 to Entity_3 • You SHOULD NOT physically implement implied FK relationships • You may potentially enforce invalid business requirements • You may needlessly add additional performance overhead
6. Effect of Incorrect Change • Modeler/Architect/DBA often incorrectly modifies design Superfluous FK and not a true business requirement
7. Avoid Bogus Foreign Keys Referential integrity requires that for each row of a child table, the value in the foreign keymatches a value in a parent key. Many conceptual (logical) modeling tools will not permit you to construct questionable scenarios since the relationship lines implicitly reflect the association. The physical details are not really known until implementation, which is exposed during the physical modeling process. But there the tools generally permit DBA’s to apply their insight to make things better …
7. Effect of Incorrect Change • Modeling tool generates initial physical database design • Modeler/Architect/DBA often incorrectly modifies design Toad Data Modeler will prevent A foreign key pointing to a non primary or unique key, what does that mean???
8. Problematic Relationships • Many relationships CAN be logically modeled and physically implemented… • BUT should they be??? • Example 1 • Example 2 I’m a peon, I manage no one (recurse no bottom) I’m the CEO, I have no boss (recurse no top)
8. Problematic Relationships • Example 3 Which came first? (Circular Logic)
8. Problematic Relationships • Example 4: Should you perform “Unification” of FK’s??? Keep Both or Combine?
9. Using Normal Forms • “Normalization” is often quoted, but generally not very well understood. Some quick facts: • Goal is to minimize data redundancy in order to lessen the likelihood of inconsistent data • Side effect of reducing data storage needs • But is this important given today’s cheap disk… • Useful primarily in OLTP and ODS database designs • Normal forms are cumulative (e.g. 2NF includes 1NF) • Easy jingle to remember: • “Depends on the key, the whole key, and nothing but the key – so help me Codd”
9. Common NF Violations 1NF – Attributes are all single valued, there are no repeating groups or arrays 2NF – All non-identifying attributes are dependent on the entity's entire unique identifier (only applies when have compound unique ID’s) 3NF – A non-identifying attribute CAN NOT be dependent upon another non-identifying attribute
10: Super and Sub Types How should you physically implement super and sub types? There are three valid options …
10. Option 1 - One Big Table • Generate Parent = Y and Generate Children = N • Requires Discriminator attribute (e.g. Account Type) • Violates third normal form (… nothing but the key …) • PRO: Easy to code against, just one big table … • CON: All child columns optional, so need table check constraint
10. Option 2 - Table per Sub Type • Results in N-1 tables • Gen. Parent = N, Gen. Children = Y, Inherit All Attributes = Y • PROS: All child columns implemented as expected • CON: Two tables to code against …
10. Option 3 - Table per Super and Sub Type • Results in N tables • Gen. Parent = Y, Gen. Children = Y, Inherit Only Primary Attr. = Y • NOT RECOMMENDED: Just Plain Overkill
Logical/Conceptual to Physical Transformation • Check List: • Verify everything with the business analysts and end users • Verify everything with the business analysts and end users • Verify everything with the business analysts and end users • Use your software’s model checking utilities and/or reports • Every entity must have unique identifier (as per Chen) • Resolve many-to-many relationships (cannot be built) • Double check isolated entities (i.e. no relationships) • Look for very common, generic modeling patterns • Use your software’s generate physical model utility • NOTE – Generated physical model will require DBA review …
Refining the Physical Model • Check List: • Verify that nothing was lost in translation from logical to physical • Add table(s) required for implementation, but not modeled • eg. Lookup tables • Use your software’s model checking utilities and/or reports • Every table should have primary key • Add foreign key relationship meta-data • Add indexes to support data access needs (lots of work) • Use your software’s generate SQL or DDL script utility • REVIEW THE SCRIPT! • Never just run SQL without looking at it
Parting Thoughts ??? • Data Modeling tools do not automatically = good design • Must do complete business analysis • Must do adequate Conceptual -> Physical transformation • Must add required physical meta-data (tuning & insight) • Many of the worst DB’s built result from failure to do the above • There are many other modeling issues – this was just a start … • Breaking models into sub-models • Round-trip Engineering: • Conceptual -> Physical Model compare and sync • Physical Model -> Database compare and sync • Repository-based collaborative modeling • Horizontal and Vertical Partitioning • Data Warehousing (Star Schema design) • Object-Relational Mapping • etc, etc, etc …
Thank you • Please offer any questions or comments • Remember: • Toad Data Modeler – data modeling for the rest of us • Robust, yet Inexpensive • Both easy to learn & use • www.quest.com/toad_data_modeler • Modeling White Papers • www.quest.com/documents/list.aspx?SearchOff=true&ContentTypeID=1&prod=306 • Data Modeling: Common Mistakes and Their Impact • Data Modeling: It's Really All About the Relationships • Data Modeling: Reality Requires Super and Sub Types