170 likes | 290 Views
Goodbye, Spring 08 CS411!. Enjoy your DB internships, jobs, and insights. Today’s agenda. W hen Homework 5 is due, and why CS 411 course project wins major prize! A new contest you can enter your CS 411 project in! The conflict final exam May 8: contact me for location
E N D
Goodbye, Spring 08 CS411! Enjoy your DB internships, jobs, and insights
Today’s agenda • When Homework 5 is due, and why • CS 411 course project wins major prize! • A new contest you can enter your CS 411 project in! • The conflict final exam May 8: contact me for location • Two, no, three demos of prizewinning projects! • What you need to know for the final
CS 411 project a winner in the 2008 Computing Habitat contest “At the beginning of this semester, my team heard your suggestion [about revising] the course project and entering it in the 2008 Computing Habitat Programming Competition. Because the team project Illinigroup was for improving student life, we decided to enter the competition. “At the end, my team Chuck Woolerywon three prizes: Best use of data analytics Best web application Partial winning of Most useful for students. …”
2007 Computing Habitat contest gave out big prizes “Prizes for the competition included $2,500 for the first place winner, $1,250 for the second place winner and $750 for the third place winner. A Judge's prize of $500 is awarded to any entrant at the discretion of the judges. In addition to cash prizes, winning entries are made a permanent part of the computing habitat.”
Exam creation methodology • Read three years of past final exams, now posted on our web site • Then wrote my own • I did not want to reuse any questions • But topics are the same, except… • We did security, MySQL page layout, concurrency control • We didn’t do pointer swizzling, repeating fields, non-relational models • Our exam will include dependency theory and exclude SQL (except integrity constraint enforcement), relational algebra, and the ER model
There is one major difference between the exam and your homework • Can’t ask easy look-in-the-book questions on homework. Examples: • What is the purpose of S2PL? • Why don’t people always put their schemas in BCNF (two reasons)? • Give an example of a SQL injection attack. • Why don’t people like to use a FORCE buffer management policy? • Exam has lots and lots of these • Currently 40 T/F and 20 short answer, plus 3 ordinary homework-style problems • May shorten it
What you need to know for the final Know several ways to enforce integrity constraints in SQL and forviews (including triggers, referential dependencies, etc.), and what happens if they are violated (but we won't write trigger code) know 3nf, bcnf, 4nf, how to get relations into them, why you would pickone over the other, what to aim for;how to determine whether something is a key, given some dependencies;how to use Armstrong's Axioms and compute a closure;know what lossless joins are and what "dependency preserving" means Know how to carry out a SQL injection attack and XSRF attack,know how to guard against them,know when you need to worry about them; know how data pages of a database are laid out and how to access,update, and delete a tuple on them; know that the wait time for a disk page read is seek plus latency plustransfer, and their relative magnitudes; how to refer to a tuple (e.g., if you need to point to it from an index), and why it is done that way; Know the major types of indexes are (hashed, B-trees) how B-trees work, for insert/delete, and how big they will be for agiven number of records (dense or sparse);don't need to know the same thing for hash tables ; make reasonable decisions on whether to choose a B-tree or hashed(extensible or linear) index for an attribute, or no index; understand dense/sparse indexes and why you pick one over the other; Know the major ways to do joins (hash join, hybrid hash join, merge join, indexed join, and nested loop join) and when to pick one over the other Be able to come up with a good abstract query plan for a query with no more than 3 joins (plus some selects and projects and SELECT UNIQUE, possibly a simple GROUP BY),come up with a good physical plan for a query no more complex than that,figure out how much memory you need to carry out your physical plan,estimate the result sizes for each node in the query. know how to pick a join order; be able to make the pipeline versus materialize decision; Given a schedule, you need to be able to saywhether it is serializable and why,make it serializable if it is not,say whether it is two phase and why,make it two phase if it is not,say whether it is S2PL schedule and why,make it S2PL if it is not,say whether there can be a deadlock, and how,say whether there can be cascading rollback, and how;figure out what will happen if there is a crash at each point in theschedule, for undo, redo, and undo/redo logging, know what ACID means and why it matters. know what lock modes are -- the most important ones and typical other ones;know about quiescent and nonquiescent checkpoints: why do we do them, what special properties do they have, how are they tricky;what the phantom problem is, when it can occur, and how to avoid it;what dirty reads are (both kinds);what to do about media failures;why we have page locks, tuple locks, and relation locks; know about steal/no steal and force/no force, and how that interactswith concurrency control choices.
The tail end of SQL: integrity constraints Know several ways to enforce integrity constraints in SQL and forviews (including triggers, referential dependencies, etc.) Know what happens if they are violated (but we won't write trigger code)
Dependency theory know 3nf, bcnf, 4nf, how to get relations into them, why you would pickone over the other, what to aim for; how to determine whether something is a key, given some dependencies; how to use Armstrong's Axioms and compute a closure; know what lossless joins are and what "dependency preserving" means
Security Know how to carry out a SQL injection attack and XSRF attack, know how to guard against them, know when you need to worry about them;
Data storage know how data pages of a database are laid out and how to access,update, and delete a tuple on them; know that the wait time for a disk page read is seek plus latency plustransfer, and their relative magnitudes; Know how to refer to a tuple (e.g., if you need to point to it from an index), and why it is done that way;
Indexing Know the major types of indexes are (hashed, B-trees) how B-trees work, for insert/delete, and how big they will be for agiven number of records (dense or sparse); Don't need to know the same thing for hash tables ; Be able to make reasonable decisions on whether to choose a B-tree or hashed (extensible or linear) index for an attribute, or no index; Understand dense/sparse indexes and why you pick one over the other;
How to do joins Know the major ways to do joins (hash join, hybrid hash join, merge join, indexed join, and nested loop join) and when to pick one over the other
Query optimization Be able to come up with a good abstract query plan for a query with no more than 3 joins (plus some selects and projects and SELECT UNIQUE, possibly a simple GROUP BY), Come up with a good physical plan for a query no more complex than that, Figure out how much memory you need to carry out your physical plan, Estimate the result sizes for each node in the query. Know how to pick a join order; Be able to make the pipeline versus materialize decision;
Concurrency control & recovery Given a schedule, you need to be able to say:whether it is serializable and why,make it serializable if it is not,say whether it is two phase and why,make it two phase if it is not,say whether it is a S2PL schedule and why,make it S2PL if it is not,say whether there can be a deadlock, and how,say whether there can be cascading rollback, and how;figure out what will happen if there is a crash at each point in the schedule, for undo, redo, and undo/redo logging,
More concurrency control Know what ACID means and why it matters. Knowwhat lock modes are -- the most important ones and typical other ones; Know about quiescent and nonquiescent checkpoints: why do we do them, what special properties do they have, how are they tricky; Know what the phantom problem is, when it can occur, and how to avoid it; Know what dirty reads are (both kinds); Know what to do about media failures; Know why we have page locks, tuple locks, and relation locks;
Interactions between concurrency control, buffer management, recovery, and query optimization Know about steal/no steal and force/no force, and how that interactswith concurrency control choices and query plans