450 likes | 544 Views
Overview of relational dbs. background. basics. Highly structured Schema based - we can leverage this to address volume Semantics SQL App Middleware Users Structure Tables/relations, rows/tuples , columns/attributes User defined data types PKs and FKs Null or not null
E N D
Overview of relational dbs background
basics • Highly structured • Schema based - we can leverage this to address volume • Semantics • SQL • App • Middleware • Users • Structure • Tables/relations, rows/tuples, columns/attributes • User defined data types • PKs and FKs • Null or not null • Triggers as a catch-all integrity constraint • Normalization for formal table minimization • Uses • Bank checks • Insurance claims • Credit card payments
nosql • Cluster based broad distribution • Semi structured • More flexible access of data • Hierarchical • Similar structure
Relational DBs:Formally understood • Set theoretic • Originally defined with an algebra, with Selection, Projection, Join, and Union/Difference/Intersection • Declarative calculus that is based on the algebra and supports large grained queries • Clean implementation spec • Unambiguous optimization - with its own algebra of query parse tree transformations
Semantics are in queries • Relational algebra compliant • Queries written in declarative calculus • Set-oriented • But at least Programmers tend to follow PK/FK pairs, and infer semantics from attribute names and associations in tuples • Query results are legal tables (Views)
Also we get(good and bad) • Fixed size tuples for easy row-optimization • 2P transactions • Table, Row distribution • Two language based, with lowest common denominator semantics • Security • Checkpointing • Powerful query optimizers
Object-relational DBs • This runs somewhat counter to NoSQL trends - we make the data types even more complex • We make domains out of type constructors • Object IDs • A row can be a tuple - or an object, with an object ID and a tuple, making all relational DBs also O-R
Object-oriented DBs • No tuple rows • Blend SQL and the app language • This avoids lowest common denominator semantics • These bombed, as relational DBs were not O-O • And they are tough to optimize
The relational algebra and calculus: the heart of relational DBs … SQL
The big 3: • Selection and projection are unary ops • Join is binary • Selection is based on a formula and returns a table that contains all tuples from a given table where the formula is valid • Projection returns a table consisting of a subset of attributes from a given table, with dupes removed • Join creates tuples with attributes from two given tables, where a specific attribute in one matches a specific attribute in another (often a PK, FK pair)
Algebraic closure • Any relational algebra operation returns a legal derived table • The set operators are also part of the algebra • From a formal perspective, the join operator is not a minimal operator, and is therefore represented as a cross product followed by a selection (where the PK equals the FK) • Note that joins are symmetric
Joins can be generalized • Complex join conditions • Non-equi joins • A “natural” join is based on matching all attributes with equal names in both tables • “Outer” join creates null-packed tuples when tuples on the left do not match any on the right; there is also a right outer join
The calculus • It is a tuple calculus, not a domain calculus • SQL is equivalent • Select From Where • The part after the Where is declarative • A tuple calculus (SQL) • Notice that the variables are indeed tuples • Note that set operators often act on tables that are being created in the query
SQL CREATE PROCEDURE test() BEGIN DECLARE sql_error TINYINT DEFAULT FALSE; DECLARE CONTINUE HANDLER FOR SQLEXCEPTION SET sql_error = TRUE; START TRANSACTION; INSERT INTO invoices VALUES (115, 34, 'ZXA-080', '2012-01-18', 14092.59, 0, 0, 3, '2012-04-18', NULL); INSERT INTO invoice_line_items VALUES (115, 1, 160, 4447.23, 'HW upgrade'); INSERT INTO invoice_line_items VALUES (115, 2, 167, 9645.36, 'OS upgrade'); IF sql_error = FALSE THEN COMMIT; SELECT 'The transaction was committed.'; ELSE ROLLBACK; SELECT 'The transaction was rolled back.'; END IF; END//
More • IN operator is “element of” • EXISTS • Nesting • FOR ALL • FOR SOME • Putting computations in the SELECT clause • COUNT, SUM, AVG, MAX, MIN operators
Stored programs • Stored procedures (can be called by an application) • Stored functions (can be called by an SQL program) • Triggers (tied to an operation like INSERT) • Events (tied to a clock)
Variables • DECLARE statement • SET statement • DEFAULT statement • INTO (from a SELECT clause)
Example… CREATE PROCEDURE test() BEGIN DECLARE max_invoice_total DECIMAL(9,2); DECLARE min_invoice_total DECIMAL(9,2); DECLARE percent_difference DECIMAL(9,4); DECLARE count_invoice_id INT; DECLARE vendor_id_var INT; SET vendor_id_var = 95; SELECT MAX(invoice_total), MIN(invoice_total), COUNT(invoice_id) INTO max_invoice_total, min_invoice_total, count_invoice_id FROM invoices WHERE vendor_id = vendor_id_var;
Example, continued SET percent_difference = (max_invoice_total - min_invoice_total) / min_invoice_total * 100; SELECT CONCAT('$', max_invoice_total) AS 'Maximum invoice', CONCAT('$', min_invoice_total) AS 'Minimum invoice', CONCAT('%', ROUND(percent_difference, 2)) AS 'Percent difference', count_invoice_id AS 'Number of invoices'; END//
Domain types – chapter 8 • Character • Integers • Reals • Date • Time • Large object, BLOB and CLOB • 2D vector spatial types • Enumerated
ACID Transactions • Atomic: Either all of a transaction or None of it affects the database • Consistent: When a transaction ends, the database obeys all constraints • Isolated: Two running transactions cannot pass values to each other, via the database or other data store • Durable: Once a transaction has “committed”, its updates are permanent
Atomicity • Use a local log to store a transaction’s partial result • If a transaction does something illegal, toss out the log
Consistent • Check constraints in phase 1 • Some are immediate, like domains • Others don’t have to be true until the commit point, like FKs
Isolated • Transactions commit in a linear order • Serializability is enforced • Results become available only after atomic commit point
Durable • Database has one state and it is in nonvolatile storage • Keep checkpoints and transaction logs
Deadlock • Loops of transactions wait on each other • Detection: use time-outs • Prevention: use “waits for” graph
Stored programs • Stored procedures (can be called by an application) • Stored functions (can be called by an SQL program) • Triggers (tied to an operation like INSERT) • Events (tied to a clock)
Variables • DECLARE statement • SET statement • DEFAULT statement • INTO (from a SELECT clause)
Another view of transactions • Prevents • Lost updates from one of two transactions • Dirty reads when a transaction reads an uncommitted value • Nonrepeatable reads in one transaction because the value gets updated in between • Phantom reads are when a subset of updated rows are simultaneously updated by another transaction
Continued… • Options • Serializable isolates transactions completely and is the highest level of protection • Read uncommitted lets our four problems occur – no locks • Read committed prevents dirty reads • Repeatable read is the default and it means that a transaction will always read a given value the same because the values are locked
Deadlock • Detect by closing transactions that have been open a long time • Use the lowest acceptable locking level • Try to do heavy update transactions when database can be completely reserved
Stored programs • Stored procedures (can be called by an application) • Stored functions (can be called by an SQL program) • Triggers (tied to an operation like INSERT) • Events (tied to a clock)
The DB Design Process • Start with an entity model • Map to tables • Create PKs and FKs • Create other constraints • Normalize tables
Our focus: normalization • Goals • Minimize redundant data • Minimize “update anomalies”
Functional and Multivalued Dependencies • FD • We say that ai FD-> aj • Or “ai functionally determines aj” • MVD-> • We say that ai MVD-> aj • Or “ai multivalued determines aj” • Note: the right side of an FD or an MVD can be a set of attributes
First 3 normal forms • First (1NF) The value stored at the intersection of each row and column must be a scalar value, and a table must not contain any repeating columns. • Second (2NF) Every non-key column must depend on the entire primary key. • Third (3NF) Every non-key column must depend only on the primary key.
NF3 fixed and NF4 • Boyce-Codd (BCNF) A non-key column can’t be dependent on another non-key column. • Fourth (4NF) A table must not have more than one multivalued dependency, where the primary key has a one-to-many relationship to non-key columns.
3NF: remove transitive dependencies Customer ID Address ZIP 18 112 First 80304 17 123 Ash 80303 16 123 Ash 80303
3NF, continued Break into two tables: Customer ID Address Address Zip
4NF: Separate pairs of MVDs Mothers_PhoneFathers_PhoneChild_Name Break into: Mothers_PhoneChild_Name 3030000000 Sue 3031111111 Sue AndFathers_PhoneChild_Name 3032222222 Sue 3033333333 Sue Note: both fields needed for PK
Tradeoffs • “Decomposition” makes it harder to misunderstand the database schema • But Decomposition create narrow tables that might not correspond to forms in the real world • And Decomposition leads to extra joins • One solution is to pre-join data