130 likes | 138 Views
CSE444: REVIEW. CSE444 in one slide. Logical : E/R diagram normalized relations Physical : files, buffering, and indexes Logical : Relational algebra and SQL Physical : join methods, optimization CC/R choices affect correctness, performance
E N D
CSE444 in one slide • Logical: E/R diagramnormalized relations • Physical: files, buffering, and indexes • Logical: Relational algebra and SQL • Physical: join methods, optimization • CC/R choices affect correctness, performance • Object Databases extend, improve relations • … OODBMS, ORDBMS are converging? • … OQL, SQL3 are converging?
The E/R model • Entities vs. relationships • Attributes for entities and relationships • Keys and weak entities (no foreign keys!!) • Cardinality constraints • Participation (key) constraints • Translation to relations: • Entity becomes relation • Relationship becomes relation iff it is many-many • Other relationships: key of “parent” goes with “child”
FDs and Normal Forms • Determine Candidate Keys (CKs) • A field is prime iff it is in some CK • 1NF: All relations are in 1NF • 2NF: For each FD XA: • A is prime OR X is not a proper subset of any CK • 3NF: For each FD XA: • X is a superkey OR A is prime OR A X (trivial) • BCNF: For each FD XA: • X is a superkey OR A X (trivial) • Normalize by decomposing R (ABC) to R1(AB) and R2 (BC), where BC violates a NF
Physical DB Design • File organizations: heapfile, sorted file, hashed file • Disks: speed and physical characteristics • Buffer pool can vary in size; frame holds page • Replacement policy choice is important • Indexes can speed up data access: • Clustered vs. unclustered • Primary vs. secondary • Dense vs. sparse • B+ Trees are the most common: log(N) searches
Queries: Logical Aspects • Relational Algebra • , , , , • Joins kind of important too!!! • General join • Equijoin • Natural join • Transformations for optimization • SQL • Declarative • Relational calculus + algebra + grouping, etc. • Computationally incomplete • Query execution: relational algebra operations
Queries: Physical Aspects • Implementation of relational algebra • File scan vs. index lookup vs. binary search • Exact-match vs. range queries • Impact of clustered vs. non-clustered indices • Join methods: • Block-Oriented Nested Loops (good idea, if buffers avail.) • Index Nested Loops (good idea, if index avail.) • Sort-Merge (good idea, if sorted or needs to be sorted) • Query optimization: cost estimates hard!!! • Large space of physical, logical alternatives • Prune space by considering only left-deep plans • Enables pipelined execution
Concurrency Control and Recovery • ACID transactions enforce CC&R • Strict 2PL “guarantees” serializability (C, I) • Deadlocks possible (detect and abort a XACT) • Phantom tuples possible (fix with index locking) • ARIES guarantees XACTS are A, D • Write-ahead logging • Log UNDO actions to allow complete replaying of history • Recovery phases: • Analyze (rebuild main memory structures) • REDO (rebuild data in buffer frames) • UNDO (abort uncommitted XACTS)
Object Databases • ORDBMS (e.g. Oracle 8i) vs. OODBMS (e.g. O2) • Structural aspects • Collection-valued fields • OIDs • Freely composable type constructors • Behavioral aspects (methods) • Adds new dimensions to RDBMS problems • SQL3, OQL standards are emerging
Current and Future DBMS Issues • New applications yield new techniques • New techniques yield new applications • Some “new” applications: • Data warehousing • On-line analytical processing (OLAP) • Data mining • Distributed data • Heterogeneous data and data integration • Scientific/sequential/ordered data • Partial or approximate query answers
Current and Future Issues (cont.) • Active DBs: rule management (ICs and triggers) • Real-time DBMS • Web-based DBMS • XML and semi-structured data • Spatial and high-dimensional data (lots of columns) • Special-purpose DBMSs • Digital Libraries • Geographic Information Systems • etc…..
Current and Future Issues (cont.) • Some “new” techniques: • New kinds of indices • Improved B Trees • Faster aggregation algorithms • New QP algorithms • Better optimization techniques • Data broadcasting • Generic data models • Faster sorting algorithms • New query languages • Deductive DBMSs
Current and Future Issues (cont.) • Object databases • New algebras • Query cost estimation • New locking and commit protocols • Main-memory databases • CC/R techniques for non-relational settings • DBMS interfaces, visualization tools • DBMS development tools • etc…. • BOTTOM LINE: Lots of opportunities for jobs, research, development, and fun !!!