230 likes | 334 Views
CS405G: Introduction to Database Systems. Final Review. Database Design. E-R model. E-R model Entities Attributes Relationships. Database Design. 8/21/2014. 8/21/2014. 5. 5. From E-R Diagram to Relations. Relations Schemas Converting E-R diagram to relations Keys Super keys
E N D
CS405G: Introduction to Database Systems Final Review
Database Design Jinze Liu @ University of Kentucky
E-R model • E-R model • Entities • Attributes • Relationships
Database Design 8/21/2014 8/21/2014 5 5
From E-R Diagram to Relations • Relations • Schemas • Converting E-R diagram to relations • Keys • Super keys • Candidate keys • Primary keys • Relational integrity constraints
Key Constraints • Superkey: • (Uniqueness constraints) A set of attributes where no two distinct tuples can have the same values • Every relation has at least one superkey: • The set of all attributes. • Key: A minimal superkey • Uniqueness constraint (superkey) • Minimum Constraint • No attribute can be removed and still satisfy the uniqueness constraints. Jinze Liu @ University of Kentucky
Relational Integrity Constraints Constraints are conditions that must hold on all valid relation instances. There are four main types of constraints: Domain constraints The value of a attribute must come from its domain Key constraints Entity integrity constraints Referential integrity constraints 8/21/2014 8/21/2014 8 8
Database Normalization • Functional Dependency • Functional Closure • Keys • Redefined • Based on functional dependency • DB Norm Form • 1st, 2nd, 3rd, BCNF
Database Query Luke Huan Univ. of Kansas
Relational Algebra and SQL • Relational algebra • SQL query • SFW • Group by …, Having • Subqueries • Relationship between R.A. and SQL
RelOp RelOp Relational algebra A language for querying relational databases based on operators: • Core set of operators: • Selection, projection, cross product, union, difference, and renaming • Additional, derived operators: • Join, natural join, intersection, etc. • Compose operators to make complex queries Jinze Liu @ University of Kentucky
Summary of core operators σpR πLR RXS RS R- S ρS(A1, A2, …)R • Selection: • Projection: • Cross product: • Union: • Difference: • Renaming: • Does not really add “processing” power Jinze Liu @ University of Kentucky
Summary of derived operators RpS RS RS • Join: • Natural join: • Intersection: Jinze Liu @ University of Kentucky
Selection: σpR Projection: πLR Cross product: RXS Join: RpS Natural join: RS Union: RUS Difference: R-S Intersection: R∩S Monotone Monotone Monotone Monotone Monotone Monotone Monotone w.r.t. R; non-monotone w.r.t S Monotone Classification of relational operators Jinze Liu @ University of Kentucky
Update Operations on Relations • Update operations • INSERT a tuple. • DELETE a tuple. • MODIFY a tuple. • Constraints should not be violated in updates Jinze Liu @ University of Kentucky
Basic queries: SFW statement • SELECT A1, A2, …, AnFROM R1, R2, …, RmWHERE condition; • Also called an SPJ (select-project-join) query • (almost) Equivalent to relational algebra query π A1, A2, …, An (σ condition (R1XR2 X … X Rm)) Luke Huan Univ. of Kansas
Semantics of SFW • SELECT E1, E2, …, EnFROM R1, R2, …, RmWHERE condition; • For each t1 in R1: For each t2 in R2: … … For each tm in Rm: If condition is true over t1, t2, …, tm: Compute and output E1, E2, …, En as a row • t1, t2, …, tm are often called tuple variables • Not 100% correct, we will see Luke Huan Univ. of Kansas
Operational semantics of GROUPBY SELECT … FROM … WHERE … GROUP BY …; • Compute FROM • Compute WHERE • Compute GROUPBY: group rows according to the values of GROUPBY columns • Compute SELECT for each group • For aggregation functions with DISTINCT inputs, first eliminate duplicates within the group • Number of groups = number of rows in the final output Jinze Liu @ University of Kentucky
Database Design Jinze Liu @ University of Kentucky
physical data organization • Storage hierarchy (DC vs. Pluto) !count I/O’s • Disk geometry: three components of access cost; random vs. sequential I/O • Data layout • Record layout (handling variable-length fields, NULL’s) • Block layout (NSM, PAX) ! inter-/intra-record locality • Access paths • Primary versus secondary indexes • Tree-based indexes: ISAM, B+-tree !Again, reintroduce redundancy to improve performance ! Fundamental trade-off: query versus update cost
Performance Issues on Indexes • Indexes • ISAM • B+ Tree • Metrics • Storage • IO-costs • Operations • Single value query & range query • Insertion and deletion
Query Processing Implementation • Typical Query Processings • Selection • Join • Set operations. • Typical approaches • Sequential scans in unsorted database • Sorted database • What are the tradeoffs.