CS 543 Lecture 1: Review of DBMS

CS 543Lecture 1: Review of DBMS January 23, 2012

Basic Definitions • Database: • A collection of related data. • Data: • Known facts that can be recorded and have an implicit meaning. • Mini-world: • Some part of the real world about which data is stored in a database. For example, student registration, grades and transcripts at a university. • Database Management System (DBMS): • A software package/system to facilitate the creation and maintenance of a computerized database. • Database System: • The DBMS software together with the data itself. Sometimes, the application programs and interfaces are also included.

Data Models • Data Model: • A set of concepts to describe the structure of a database, the operations for manipulating the data, and the constraints that the data should follow. • Data Model Structure and Constraints: • Data Model constructs define the database structure • Data model constructs often include: data elements and their data types (often called attributes); grouping of related elements into entities (also called objects or records or tuples); and relationships among entities • Constraints specify restrictions on the stored data; the data that satisfies the constraints is called valid data

Relational Model Concepts • A Relation is a mathematical concept based on the ideas of sets • The model was first proposed by Dr. E.F. Codd of IBM Research in 1970 in the following paper: • "A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970 • The above paper caused a major revolution in the field of database management and earned Dr. Codd the coveted ACM Turing Award

Informal Definitions • Informally, a relation looks like a table of values. • A relation typically contains a set of rows. • The data elements in each row represent certain facts that correspond to a real-world entity or relationship • In the formal model, rows are called tuples • Each column has a column header that gives an indication of the meaning of the data items in that column • In the formal model, the column header is called an attribute name (or just attribute)

Informal Definitions • Key of a Relation: • Each row has a value of a data item (or set of items) that uniquely identifies that row in the table • Called the key • In the STUDENT table, SSN is the key • Sometimes row-ids or sequential numbers are assigned as keys to identify the rows in a table • Called artificial key or surrogate key

Formal Definitions - Summary • Formally, • Given R(A1, A2, .........., An) • r(R) dom (A1) X dom (A2) X ....X dom(An) • R(A1, A2, …, An) is the schema of the relation • R is the name of the relation • A1, A2, …, An are the attributes of the relation • r(R): a specific state (or "value" or “population”) of relation R – this is a set of tuples (rows) • r(R) = {t1, t2, …, tn} where each ti is an n-tuple • ti = <v1, v2, …, vn> where each vjelement-ofdom(Aj)

Definition Summary

Relational Integrity Constraints • Constraints are conditions that must hold on all valid relation states. • There are three main types of constraints in the relational model: • Key constraints • Entityintegrity constraints • Referential integrity constraints • Another implicit constraint is the domain constraint • Every value in a tuple must be from the domain of its attribute (or it could be null, if allowed for that attribute)

Overview of SQL • SQL is a standard, comprehensive language, based on the relational model • SQL includes capabilities for many functions: • DDL statements for creating schemas and specifying data types and constraints • Statements for specifying database retrievals • Statements for modifying the database • Statements for specifying views, triggers, and assertions (see Chapter 5) • Many other capabilities

Summary of SQL Queries • A query in SQL can consist of up to six clauses, but only the first two, SELECT and FROM, are mandatory. The clauses are specified in the following order:SELECT <attribute list>FROM <table list>[WHERE <condition>][GROUPBY <grouping attribute(s)>][HAVING <group condition>][ORDER BY <attribute list>] ;

ER Model Concepts • Entities and Attributes • Entities are specific objects or things in the mini-world that are represented in the database. • For example the EMPLOYEE John Smith, the Research DEPARTMENT, the ProductX PROJECT • Attributes are properties used to describe an entity. • For example an EMPLOYEE entity may have the attributes Name, SSN, Address, Sex, BirthDate • A specific entity will have a value for each of its attributes. • For example a specific employee entity may have Name='John Smith', SSN='123456789', Address ='731, Fondren, Houston, TX', Sex='M', BirthDate='09-JAN-55‘ • Each attribute has a value set (or data type) associated with it – e.g. integer, string, subrange, enumerated type, …

Relationships and Relationship Types • A relationship relates two or more distinct entities with a specific meaning. • For example, EMPLOYEE John Smith works on the ProductX PROJECT, or EMPLOYEE Franklin Wong manages the Research DEPARTMENT. • Relationships of the same type are grouped or typed into a relationship type. • For example, the WORKS_ON relationship type in which EMPLOYEEs and PROJECTs participate, or the MANAGES relationship type in which EMPLOYEEs and DEPARTMENTs participate. • The degree of a relationship type is the number of participating entity types. • Both MANAGES and WORKS_ON are binary relationships.

Constraints on Relationships • Constraints on Relationship Types • (Also known as ratio constraints) • Cardinality Ratio (specifies maximum participation) • One-to-one (1:1) • One-to-many (1:N) or Many-to-one (N:1) • Many-to-many (M:N) • Existence Dependency Constraint (specifies minimum participation) (also called participation constraint) • zero (optional participation, not existence-dependent, partial) • one or more (mandatory participation, existence-dependent, total)

Structural Constraints • 2 options: • 1. Combine notation • (Min, Max) • 2. Separate notation • One for cardinality ratio • One for participation constraints

Relational Database Design by ER- and EER-to-Relational Mapping • ER-to-Relational Mapping Algorithm • Step 1: Mapping of Regular Entity Types • Step 2: Mapping of Weak Entity Types • Step 3: Mapping of Binary 1:1 Relation Types • Step 4: Mapping of Binary 1:N Relationship Types. • Step 5: Mapping of Binary M:N Relationship Types. • Step 6: Mapping of Multivalued attributes. • Step 7: Mapping of N-ary Relationship Types. • Mapping EER Model Constructs to Relations • Step 8: Options for Mapping Specialization or Generalization. • Step 9: Mapping of Union Types (Categories).

Functional Dependencies • Functional dependencies (FDs) • Are used to specify formal measures of the "goodness" of relational designs • And keys are used to define normal forms for relations • Are constraints that are derived from the meaning and interrelationships of the data attributes • A set of attributes X functionallydetermines a set of attributes Y if the value of X determines a unique value for Y

Inference Rules for FDs (1) • Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold • Armstrong's inference rules: • IR1. (Reflexive) If Y subset-of X, then X -> Y • IR2. (Augmentation) If X -> Y, then XZ -> YZ • (Notation: XZ stands for X U Z) • IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z • IR1, IR2, IR3 form a sound and complete set of inference rules • These are rules hold and all other rules that hold can be deduced from these

Inference Rules for FDs (2) • Some additional inference rules that are useful: • IR4. Decomposition: If X -> YZ, then X -> Y and X -> Z • IR5. Union: If X -> Y and X -> Z, then X -> YZ • IR6.Psuedotransitivity: If X -> Y and WY -> Z, then WX -> Z • The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)

Closure • Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F • Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X • X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

Practical Use of Normal Forms • Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties • The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect • The database designers need not normalize to the highest possible normal form • (usually up to 3NF, BCNF or 4NF) • Denormalization: • The process of storing the join of higher normal form relations as a base relation—which is in a lower normal form

First Normal Form • Disallows • composite attributes • multivalued attributes • nested relations; attributes whose values for an individual tuple are non-atomic • Considered to be part of the definition of relation

Second Normal Form • A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key • R can be decomposed into 2NF relations via the process of 2NF normalization

Third Normal Form • A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key • R can be decomposed into 3NF relations via the process of 3NF normalization • NOTE: • In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. • When Y is a candidate key, there is no problem with the transitive dependency . • E.g., Consider EMP (SSN, Emp#, Salary ). • Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

CS 543 Lecture 1: Review of DBMS