1 / 41

Logical Database Design

Logical Database Design. Chapter 4. Agenda. Evolution of Data Models Chapter 1 pgs 25 – 28 Chapter 9 pgs 409 – 418 Relational Database Model Transforming ERDs into Relations Referential Integrity Normalization. The Evolution of Data Models. Hierarchical Network Relational

pearl
Download Presentation

Logical Database Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logical Database Design Chapter 4

  2. Agenda • Evolution of Data Models • Chapter 1 pgs 25 – 28 • Chapter 9 pgs 409 – 418 • Relational Database Model • Transforming ERDs into Relations • Referential Integrity • Normalization

  3. The Evolution of Data Models • Hierarchical • Network • Relational • Object oriented* • Multi-dimensional • NoSQL

  4. Hierarchical Data Model • Each parent can have many children • Each child has only one parent • Tree defined by path that traces parent segments to child segments, beginning from the left • Hierarchical path • Ordered sequencing of segments tracing hierarchical structure

  5. Problem: Child with Multiple Parents

  6. Hierarchical Data Model, cont… • Advantages • Database security • Performance, efficiency • Data independence • Disadvantages • Complex implementation • Structural dependence • Complex applications programming and use • Lack of standards

  7. Network Data Model • Created to: • Represent complex M:M data relationships • Child can have many parents • Impose a database standard • Resembles hierarchical model • Collection of records in 1:M relationships • Sets • Implement relationships • Composed of: • Owner • Member

  8. Network Data Model, cont… • Advantages • Handles more relationship types • Conformance to standards • Disadvantages • System complexity • Lack of (popular) product support

  9. Relational Data Model • Developed by Codd (IBM) in 1970 • Represents data in the form of tables • Based on mathematical theory • 3 Components: • relational database structure • relational rules (integrity) • relational operations (manipulation)

  10. Relational Data Model • Advantages • Structural independence • Improved conceptual simplicity • Easier database design, implementation, management, and use • Ad hoc query capability • Mathematical foundation • Disadvantages • Hardware and system software overhead • Can facilitate poor design and implementation • May promote “islands of information” problems

  11. Relational Theory Components • Relational Database Structures • Rules of Relations • Relational Operators

  12. 1. Relational Database Structure • Relations, Tuples, Attributes • Tables, Rows, Columns • Files, Records, Fields • Primary Key must be designated • Foreign Keys must be designated for relationships CLASS TABLE

  13. 2. Rules of Relations • Relation names must be unique • Entries in columns are atomic (single valued) • Entries in column are from same domain • Each row is unique • Ordering is insignificant CLASS Table

  14. 2. Rules of Relations, con’t... • Data in tables should be added, updated, and deleted without errors • avoid inconsistency ==> referential integrity • insertion • update • Deletion • avoid anomalies ==> normalization • insertion • update • deletion

  15. 3. Relational Operators * • Relational Algebra • UNION (+) • INTERSECTION • DIFFERENCE (-) • PRODUCT (x) • SELECT (tuples) • PROJECT (attributes) • JOIN (PRODUCT, SELECT, PROJECT) *Diagram adapted from Hyperion presentation, http://infolab.stanford.edu/infoseminar/Archive/FallY99/russakovskii-slides/sld001.htm

  16. Converting ERD to Relational Model • Represent entities as relations • Represent relationships as either: • foreign keys in relations • new relations • Provide sample data • Normalize relations

  17. Representing Entities as Tables • Each entity converted to a relational table • attributes become columns • primary key must be designated • regular entities have atomic keys • associative entities have composite keys • subtype entities have same key as supertype • example entity instances are rows of table

  18. ERD Example Problem Revisited • A company sells products to customers • Customer requests generate orders • Orders may consist of many ordered products • Products may be contained on many orders, or no orders at all

  19. ERD Example Converted to Tables

  20. Representing Relationships • 1:1 • merge attributes into single table; • OR create foreign key (FK) in either relation • 1:M • create foreign key (FK) in relation on “many” side of relationship • M:M • should’ve been eliminated on ERD!!! • create new relation with PKs of related entities as (1) concatenated PK, and (2) FKs in new relation

  21. Referential Integrity • Maintains consistency between data in related tables • For every value of a foreign key there must be an existing primary key with that value • Create rules/constraints for: • insertion of foreign keys • update and deletion of primary keys

  22. Adding Referential Integrity Constraints (PK) D:R, U:C (PK) (FK) D:R, U:C (PK) (PK) (FK) (FK) D:R, U:C (PK)

  23. Adding Referential Integrity Constraints, cont…

  24. Normalization • Convert complex relations into simpler relations • Why? • Ensures relations conform to rules • Ensures relation contains facts about one “theme” • Reveals/corrects redundancies, errors, ambiguities in data model • Only a simple check IF good data model exists • Normal Forms • state of a relation • rids relations of potential anomalies

  25. Order ID Order Date Customer ID Customer Name Customer Address Normalization, con’t... • Insertion Anomalies • are experienced when we attempt to store a value for one field but cannot because the value of another field is unknown • e.g., cannot add a new customer’s information until an order number is ready to be entered (PK)

  26. Order ID Order Date Customer ID Customer Name Customer Address Normalization, con’t... • Deletion Anomalies • are experienced when a value for one field we wish to keep is unexpectedly removed when a value for another field is deleted • e.g., cannot delete the sole order for a customer without deleting the only copy of the customer’s information also (PK)

  27. Order ID Order Date Customer ID Customer Name Customer Address Normalization, con’t... • Update Anomalies • are experienced when changes to multiple records of a table are needed to effect an update to a single value of a field • e.g., cannot completely update a customer’s address without changing it for every order placed by that customer (PK)

  28. Steps in Normalization

  29. Normalization, con’t... • Every attribute is dependent on: • the key (1NF) • the WHOLE key (2NF) • and nothing but the key (3NF)

  30. 1NF • The table is a relation • Primary key • No repeating values or groups • only atomic values • All column values from same domain • To correct: • define new (usually associative) entity

  31. 2NF • 1NF + No partial functional dependencies • (Full) Functional dependency • when the value of one attribute can be determined based on the value of another attribute • Partial functional dependency • when a non-key attribute is functionally dependent on a part of the PK • Already in 2NF if: • PK is NOT concatenated • Relation contains no non-key attributes • To correct: • Decompose into 2 or more relations (if not already) • one with original (concatenated) key + attributes • one (or more) with the “depended on” partial key as PK +attributes

  32. 3NF • 2NF + No transitive dependencies • Transitive dependency • a functional dependency between 2 non-key attributes • when a non-key attribute is functionally dependent on another non-key attribute • Already in 3NF if: • only 0 or 1 non-key attributes in relation • To correct: • Decompose into 2 or more relations (if not already) • one with original PK + attributes • one (or more) with “depended on” non-key attribute as PK + attributes

  33. Complex Data Landscape NOTE: This diagram is for effect ONLY—it is incomplete (e.g., no MDDB, no OODB) AND contains some inaccuracies

  34. Multidimensional Data Model • Data represented as cube > Modeling and analyzing of data across dimensions • Cube depicts business measures analyzed by dimensions • Optimized for decision-making vs. transaction processing • Data storage • Pre-aggregation • De-normalization • Basis for data warehouses

  35. Multidimensional Data Model, cont… • Advantages • Faster performance for decision-based queries • Read-only • Summary data • Data viewed in many ways/dimensions • Data analyzed over time • Easier analyses by decision-makers • Visual interfaces • Typically point/click, drag/drop • Disadvantages • ETL • Data redundancy

  36. Big Data • Big Data = more than you're able to effectively process • Influenced by Mobile, Social Networking, Web analytics, RFID, Atmospheric, Medical Research data, … • Traditional RDMS Problems • Transaction-focus • Requires schema  maintenance issue • ACID-focus • Requires locks, db constraints, joins  performance & availability issues • "Relatively" small amounts of operational data • Exceptions require complex, $ actions  scalability issue Reference: http://www.slideshare.net/dondemsak/intro-to-big-data-and-nosql

  37. Big Data Solutions • Hadoop • Good for storing and retrieving large amounts of (mostly unstructured) data • HDFS: data distributor • MapReduce: request distributor • Columnar Databases • Good for data warehouses, analytics • Files contain all column values vs. all row values • NewSQL Databases • Good when high scalability needed with relational DBMSs • Re-written, highly optimized storage engines • "Sharding" (i.e. horizontal partitioning across multiple DB instances) • Distributed query processing • In-memory databases • NoSQL Databases…

  38. Big Data Solutions, cont… • NoSQL Databases • Focus (in most cases): • Scalability via Physical Distribution of data • No fixed schema • "Individual query systems" instead of SQL • Support for semi- and un-structured data • Some provide consistency • Apache's Hbase • Amazon's DynamoDB • Most provide "eventual consistency" • Google's BigTable • Facebook's Cassandra • Amazon's SimpleDB • Use a variety of data models…

  39. Big Data/NoSQL, cont… • NoSQL Physical Data Models • Column (Family) Store • Key Value Store • Document Store • Advantages • Highly scalable • Support for semi- and un-structured data • Data Model (schema) does not have to be defined up-front • Many are open source • Cloud options available (e.g., Amazon's SimpleDB) • Disadvantages • No common query language • Data inconsistency • Reliance on client applications for data validation, consistency, etc…

  40. Summary • Data Models • Hierarchical • Network • Multidimensional • NoSQL • ACID, BASE • Relational Model Components • Structure • Rules • Manipulation • Transforming ERDs to Relations • Representing entities and relationships • understand foreign keys • Referential Integrity • understand RI constraints • Normalization • Purpose • understand anomalies • 3 normal forms

  41. Next Time… • Physical DB Design • Chapter 5: Physical Database Design Pages 206 – 210 (Physical Design Process) Pages 210 – 213 (Choosing Field Data Types & Integrity Control) Pages 213 – 217 (Record Denormalization) Pages 220 – 226 (File Organizations) Pages 228 – 230 (Indexes) • Assignment 3: Logical DB Design ***DUE 2/20***

More Related