5.26k likes | 5.27k Views
A comprehensive review of the principles of database systems, including E-R diagrams, relational model, SQL, integrity constraints, normalization, and more.
E N D
Temple University – CIS Dept.CIS331– Principles of Database Systems V. Megalooikonomou Review of cis331 material (based on notes by C. Faloutsos at CMU)
General Overview • E-R diagrams • Relational model – SQL, QBE • Integrity Constraints • Functional Dependencies & Normalization • Physical Design; Indexing • Transaction processing • Advanced topics • Distributed Databases • OO- and OR-DBMSs • Query processing/optimization • Data Mining
Overview • Concepts • Entities • Relationships • Attributes • Mapping constraints: cardinalities • Keys • Specialization/Generalization • Aggregation • Reduction of E-R diagrams to tables
The Tools Entities (‘entity sets’) N M Relationships (‘rel. sets’) and mapping constraints P attributes
STUDENT N c-id takes M 1 N has s-id SECTION COURSE N teaches 1 INSTRUCTOR Example
E-R Diagrams • Rectangles represent entity sets • Diamonds represent relationship sets • Lines link attributes to entity sets and entity sets to relationship sets • Ellipses represent attributes • Double ellipses represent multivalued attributes • Dashed ellipses denote derived attributes • Underline indicates primary key attributes (will study later)
manages 1 EMPLOYEE N More details • Entity sets of a relationship need not be distinct • self-relationships – example ? • Usually different “roles” are indicated by labeling the lines that connect diamonds to rectangles
Cardinality Constraints • We express cardinality constraints by drawing either a directed line (), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set. • E.g.: One-to-one relationship: • A customer is associated with at most one loan via the relationship borrower • A loan is associated with at most one customer via borrower
More details • Binary relationships • 3-way and k-way relationships? N M EMPLOYEE TOOL uses P PROJECT
More details - attributes • superkey: a set of one or more attributes whose values uniquely determine each entity (eg., (ssn, address) ) • candidate key: a minimal super key (eg., ssn; employee#) • primary key: a cand. key, chosen by DBA • multivalued or set-valued attributes (eg., ‘dependents’ for EMPLOYEE) • derived attributes (eg., 15% tip)
eg., students: part time (#credit-hours) and full time (major) Extended ER features: Specialization name STUDENT ssn IS-A major FT-STUDENT PT-STUDENT #credits
treat a relationship as an ‘abstract’ entity allows relationships between relationships rarely used Aggregation
E-R Design Decisions • The use of an attribute or entity set to represent an object • Whether a real-world concept is best expressed by an entity set or a relationship set • Use of a ternary relationship versus a pair of binary relationships • Use of a strong or weak entity set • Use of specialization/generalization – contributes to modularity in the design • Use of aggregation – can treat the aggregate entity set as a single unit without concern for the details of its internal structure
Overview • concepts • Entities • Relationships • Attributes • Specialization/Generalization • Aggregation • Reduction of E-R diagrams to tables
Overview • concepts • Entities • Relationships • Attributes • Specialization/Generalization • Aggregation • turning E-R diagrams to tables
Summary • E-R Diagrams: a powerful, user-friendly tool for data modeling: • Entities (strong, weak) • Attributes (primary keys, discriminators, derived, multivalued) • Relationships (1:1, 1:N, N:M; multi-way) • Generalization/Specialization; Aggregation
attribute (strong) entity set weak entity set relationship set identifying rel. set for weak entity Summary - cont’d multivalued attribute derived attribute
A primary key discriminator IS-A A total generalization M l’:h’ N l:h cardinalities cardinalities with limits Summary - cont’d (e.t.c. - see book for alternative notations)
Conclusions • E-R Diagrams: a powerful, user-friendly tool for data modeling
Data-files catalog Overview of a DBMS Naïve user casual user DBA DML parser DDL parser DML precomp. trans. mgr buffer mgr
Overview – Relational model • history • concepts • Formal query languages • relational algebra • rel. tuple calculus • rel. domain calculus
Concepts • Database: a set of relations (= tables) • rows: tuples • columns: attributes (or keys) • superkey, candidate key, primary key
Overview • history • concepts • Formal query languages • relational algebra • rel. tuple calculus • rel. domain calculus
Formal query languages • How do we collect information? • Eg., find ssn’s of people in cis331 • (recall: everything is a set!) • One solution: Relational algebra, i.e., set operators (procedural language) • Q1: Which operators?? • Q2: What is a minimal set of operators?
FUNDAMENTALRelational operators • selection • projection • cartesian product MALE x FEMALE • set union • set difference R - S R U S
Relational ops • Surprisingly, they are enough, to help us answer almost any query we want!! • derived operators, for convenience • set intersection • join (theta join, equi-join, natural join) • ‘rename’ operator • division
Joins • very popular: natural join: RS • like equi-join, but it drops duplicate columns: STUDENT(ssn, name, address) TAKES(ssn, cid, grade)
Overview - rel. algebra • fundamental operators • derived operators • joins etc • rename • division • examples
Extended Relational-Algebra-Operations • Generalized Projection • Outer Join • Aggregate Functions
Modification of the Database • The content of the database may be modified using the following operations: • Deletion • Insertion • Updating • All these operations are expressed using the assignment operator
Views • … if it is not desirable for all users to see the entire logical model (i.e., all the relations stored in the database) • A person who needs to know a customer’s loan number but has no need to see the loan amount can see the relation: customer-name, loan-number(borrower loan) • Any relation that is not of the conceptual model but is made visible to a user as a “virtual relation” is called a view.
General Overview - rel. model • history • concepts • Formal query languages • relational algebra • rel. tuple calculus • rel. domain calculus
Overview - detailed • Relational tuple calculus • Why do we need it? • Details • Examples • Equivalence with rel. algebra • More examples; ‘safety’ of expressions • Rel. domain calculus + QBE
Motivation • Q: weakness of rel. algebra? • A: procedural • describes the steps (i.e., ‘how’) • … still useful, for query optimization though
Solution: rel. calculus • describes what we want • two equivalent flavors: • ‘tuple’ calculus and • ‘domain’ calculus • basis for SQL and QBE, resp.
Rel. tuple calculus (RTC) • first order logic ‘Give me tuples ‘t’, satisfying predicate P – e.g.:
..or Tuple Relational Calculus • A nonprocedural query language, where each query is of the form {t | P (t) } • It is the set of all tuples t such that predicate P is true for t • t is a tuple variable, t [A] denotes the value of tuple t on attribute A • t r denotes that tuple t is in relation r • P is a formula similar to that of the predicate calculus
Examples • find all student records output tuple of type ‘STUDENT’
Examples • (selection) find student record with ssn=123
Examples • (selection) find student record with ssn=123
CLASS c-id c-name units cis331 d.b. 2 cis321 o.s. 2 TAKES SSN c-id grade 123 cis331 A 234 cis331 B Reminder: our Mini-U db
More examples • join: find names of students taking cis351
More examples • join: find names of students taking cis351 join projection selection
Safety of expressions • FORBIDDEN: It has infinite output!! • Instead, always use
Safety of expressions • Possible to write tuple calculus expressions that generate infinite relations, e.g., {t | t r } results in an infinite relation if the domain of any attribute of relation r is infinite • To guard against the problem, we restrict the set of allowable expressions to safe expressions. • An expression {t | P (t) }in the tuple relational calculus is safe if every component of t appears in one of the relations, tuples, or constants that appear in P
General Overview - rel. model • history • concepts • Formal query languages • relational algebra • rel. tuple calculus • rel. domain calculus
Overview - detailed • rel. tuple calculus • why? • details • examples • equivalence with rel. algebra • more examples; ‘safety’ of expressions • rel. domain calculus + QBE
Safety of expressions • FORBIDDEN: It has infinite output!! • Instead, always use