Logical Database Design

Logical Database Design Chapter 4

Agenda • Evolution of Data Models • Chapter 1 pgs 25 – 28 • Chapter 9 pgs 409 – 418 • Relational Database Model • Transforming ERDs into Relations • Referential Integrity • Normalization

The Evolution of Data Models • Hierarchical • Network • Relational • Object oriented* • Multi-dimensional • NoSQL

Hierarchical Data Model • Each parent can have many children • Each child has only one parent • Tree defined by path that traces parent segments to child segments, beginning from the left • Hierarchical path • Ordered sequencing of segments tracing hierarchical structure

Problem: Child with Multiple Parents

Hierarchical Data Model, cont… • Advantages • Database security • Performance, efficiency • Data independence • Disadvantages • Complex implementation • Structural dependence • Complex applications programming and use • Lack of standards

Network Data Model • Created to: • Represent complex M:M data relationships • Child can have many parents • Impose a database standard • Resembles hierarchical model • Collection of records in 1:M relationships • Sets • Implement relationships • Composed of: • Owner • Member

Network Data Model, cont… • Advantages • Handles more relationship types • Conformance to standards • Disadvantages • System complexity • Lack of (popular) product support

Relational Data Model • Developed by Codd (IBM) in 1970 • Represents data in the form of tables • Based on mathematical theory • 3 Components: • relational database structure • relational rules (integrity) • relational operations (manipulation)

Relational Data Model • Advantages • Structural independence • Improved conceptual simplicity • Easier database design, implementation, management, and use • Ad hoc query capability • Mathematical foundation • Disadvantages • Hardware and system software overhead • Can facilitate poor design and implementation • May promote “islands of information” problems

Relational Theory Components • Relational Database Structures • Rules of Relations • Relational Operators

1. Relational Database Structure • Relations, Tuples, Attributes • Tables, Rows, Columns • Files, Records, Fields • Primary Key must be designated • Foreign Keys must be designated for relationships CLASS TABLE

2. Rules of Relations • Relation names must be unique • Entries in columns are atomic (single valued) • Entries in column are from same domain • Each row is unique • Ordering is insignificant CLASS Table

2. Rules of Relations, con’t... • Data in tables should be added, updated, and deleted without errors • avoid inconsistency ==> referential integrity • insertion • update • Deletion • avoid anomalies ==> normalization • insertion • update • deletion

3. Relational Operators * • Relational Algebra • UNION (+) • INTERSECTION • DIFFERENCE (-) • PRODUCT (x) • SELECT (tuples) • PROJECT (attributes) • JOIN (PRODUCT, SELECT, PROJECT) *Diagram adapted from Hyperion presentation, http://infolab.stanford.edu/infoseminar/Archive/FallY99/russakovskii-slides/sld001.htm

Converting ERD to Relational Model • Represent entities as relations • Represent relationships as either: • foreign keys in relations • new relations • Provide sample data • Normalize relations

Representing Entities as Tables • Each entity converted to a relational table • attributes become columns • primary key must be designated • regular entities have atomic keys • associative entities have composite keys • subtype entities have same key as supertype • example entity instances are rows of table

ERD Example Problem Revisited • A company sells products to customers • Customer requests generate orders • Orders may consist of many ordered products • Products may be contained on many orders, or no orders at all

ERD Example Converted to Tables

Representing Relationships • 1:1 • merge attributes into single table; • OR create foreign key (FK) in either relation • 1:M • create foreign key (FK) in relation on “many” side of relationship • M:M • should’ve been eliminated on ERD!!! • create new relation with PKs of related entities as (1) concatenated PK, and (2) FKs in new relation

Referential Integrity • Maintains consistency between data in related tables • For every value of a foreign key there must be an existing primary key with that value • Create rules/constraints for: • insertion of foreign keys • update and deletion of primary keys

Adding Referential Integrity Constraints (PK) D:R, U:C (PK) (FK) D:R, U:C (PK) (PK) (FK) (FK) D:R, U:C (PK)

Adding Referential Integrity Constraints, cont…

Normalization • Convert complex relations into simpler relations • Why? • Ensures relations conform to rules • Ensures relation contains facts about one “theme” • Reveals/corrects redundancies, errors, ambiguities in data model • Only a simple check IF good data model exists • Normal Forms • state of a relation • rids relations of potential anomalies

Order ID Order Date Customer ID Customer Name Customer Address Normalization, con’t... • Insertion Anomalies • are experienced when we attempt to store a value for one field but cannot because the value of another field is unknown • e.g., cannot add a new customer’s information until an order number is ready to be entered (PK)

Order ID Order Date Customer ID Customer Name Customer Address Normalization, con’t... • Deletion Anomalies • are experienced when a value for one field we wish to keep is unexpectedly removed when a value for another field is deleted • e.g., cannot delete the sole order for a customer without deleting the only copy of the customer’s information also (PK)

Order ID Order Date Customer ID Customer Name Customer Address Normalization, con’t... • Update Anomalies • are experienced when changes to multiple records of a table are needed to effect an update to a single value of a field • e.g., cannot completely update a customer’s address without changing it for every order placed by that customer (PK)

Steps in Normalization

Normalization, con’t... • Every attribute is dependent on: • the key (1NF) • the WHOLE key (2NF) • and nothing but the key (3NF)

1NF • The table is a relation • Primary key • No repeating values or groups • only atomic values • All column values from same domain • To correct: • define new (usually associative) entity

2NF • 1NF + No partial functional dependencies • (Full) Functional dependency • when the value of one attribute can be determined based on the value of another attribute • Partial functional dependency • when a non-key attribute is functionally dependent on a part of the PK • Already in 2NF if: • PK is NOT concatenated • Relation contains no non-key attributes • To correct: • Decompose into 2 or more relations (if not already) • one with original (concatenated) key + attributes • one (or more) with the “depended on” partial key as PK +attributes

3NF • 2NF + No transitive dependencies • Transitive dependency • a functional dependency between 2 non-key attributes • when a non-key attribute is functionally dependent on another non-key attribute • Already in 3NF if: • only 0 or 1 non-key attributes in relation • To correct: • Decompose into 2 or more relations (if not already) • one with original PK + attributes • one (or more) with “depended on” non-key attribute as PK + attributes

Complex Data Landscape NOTE: This diagram is for effect ONLY—it is incomplete (e.g., no MDDB, no OODB) AND contains some inaccuracies

Multidimensional Data Model • Data represented as cube > Modeling and analyzing of data across dimensions • Cube depicts business measures analyzed by dimensions • Optimized for decision-making vs. transaction processing • Data storage • Pre-aggregation • De-normalization • Basis for data warehouses

Multidimensional Data Model, cont… • Advantages • Faster performance for decision-based queries • Read-only • Summary data • Data viewed in many ways/dimensions • Data analyzed over time • Easier analyses by decision-makers • Visual interfaces • Typically point/click, drag/drop • Disadvantages • ETL • Data redundancy

Big Data • Big Data = more than you're able to effectively process • Influenced by Mobile, Social Networking, Web analytics, RFID, Atmospheric, Medical Research data, … • Traditional RDMS Problems • Transaction-focus • Requires schema  maintenance issue • ACID-focus • Requires locks, db constraints, joins  performance & availability issues • "Relatively" small amounts of operational data • Exceptions require complex, $ actions  scalability issue Reference: http://www.slideshare.net/dondemsak/intro-to-big-data-and-nosql

Big Data Solutions • Hadoop • Good for storing and retrieving large amounts of (mostly unstructured) data • HDFS: data distributor • MapReduce: request distributor • Columnar Databases • Good for data warehouses, analytics • Files contain all column values vs. all row values • NewSQL Databases • Good when high scalability needed with relational DBMSs • Re-written, highly optimized storage engines • "Sharding" (i.e. horizontal partitioning across multiple DB instances) • Distributed query processing • In-memory databases • NoSQL Databases…

Big Data Solutions, cont… • NoSQL Databases • Focus (in most cases): • Scalability via Physical Distribution of data • No fixed schema • "Individual query systems" instead of SQL • Support for semi- and un-structured data • Some provide consistency • Apache's Hbase • Amazon's DynamoDB • Most provide "eventual consistency" • Google's BigTable • Facebook's Cassandra • Amazon's SimpleDB • Use a variety of data models…

Big Data/NoSQL, cont… • NoSQL Physical Data Models • Column (Family) Store • Key Value Store • Document Store • Advantages • Highly scalable • Support for semi- and un-structured data • Data Model (schema) does not have to be defined up-front • Many are open source • Cloud options available (e.g., Amazon's SimpleDB) • Disadvantages • No common query language • Data inconsistency • Reliance on client applications for data validation, consistency, etc…

Summary • Data Models • Hierarchical • Network • Multidimensional • NoSQL • ACID, BASE • Relational Model Components • Structure • Rules • Manipulation • Transforming ERDs to Relations • Representing entities and relationships • understand foreign keys • Referential Integrity • understand RI constraints • Normalization • Purpose • understand anomalies • 3 normal forms

Next Time… • Physical DB Design • Chapter 5: Physical Database Design Pages 206 – 210 (Physical Design Process) Pages 210 – 213 (Choosing Field Data Types & Integrity Control) Pages 213 – 217 (Record Denormalization) Pages 220 – 226 (File Organizations) Pages 228 – 230 (Indexes) • Assignment 3: Logical DB Design ***DUE 2/20***

Logical Database Design