170 likes | 330 Views
COP5725 Advanced Database Systems. Final Review. Spring 2014. Final Exam. Time : Tuesday 04/29/2013 10am --- 12pm Venue : LOV 103, in-class exam Closed book/note, but you can bring a piece of cheat sheet (A4, double side) Plan your strategy well
E N D
COP5725Advanced Database Systems Final Review Spring 2014
Final Exam • Time: Tuesday 04/29/2013 10am --- 12pm • Venue: LOV 103, in-class exam • Closed book/note, but you can bring a piece of cheat sheet (A4, double side) • Plan your strategy well • No calculators or other electronic devices • Laptops, IPADs, smart phones, etc. are prohibited • Any form of cheating on the examination will result in zero grade, and will be reported to the university
Final Exam • Bring you FSU ID to attend the final exam • 30% of your final score • Coverage • All materials taught in the class and on the textbook, starting from data storage and representation, to LSH • Seven required reading papers
Format • One set of true/false questions with brief answers • e.g., MapReducemodel is a better model for large-scale data than parallel databases • Answer: False. Because …… • Short-answer questions • e.g, What is the nested-loop join? What is the complexity of this join algorithm? • Several more questions • e.g., Dynamic programming for optimal join order selection • 100 points • I believe you have enough time (120 minutes)
Suggested Method for Study • Go over the lecture slides and study the textbook • Reread the required reading papers • Work independently on problems in HW/lectures/exercises in the textbook • Any practice– work it out before looking at solutions • Questions? • Office hours (me and TA) • Discuss with people in the class
User/Web Forms/Applications/DBA query transaction DDL commands Query Parser Transaction Manager DDL Processor Query Rewriter Concurrency Control Logging & Recovery Query Optimizer Query Executor Records Indexes Lock Tables Buffer: data, indexes, log, etc Buffer Manager Main Memory Storage Manager Storage data, metadata, indexes, log, etc Advanced DB Systems
And Many More Behind the Scene • The next one is You!
Data Storage and Representation • Memory Hierarchy • Speed vs. Size vs. Cost • Disk • Latency = seek + rotation + transfer • I/O cost • Random I/O vs. Sequential I/O • Data Representation in RDB Systems • Database Addresses • Pointer swizzling • Record Modification • Row Store vs. Column Store
Indexing • What is indexing and different types of indices • B/B+ Trees • Inverted Index and Boolean Queries • Query optimization • Multidimensional Indices and Queries • kd-tree • quad-tree • R tree • Bitmap Index
Query Processing • Logical vs. Physical Operators • Iterator model • Materialization vs. pipelining • One-pass algorithms • Nested-loop join • …… • Two-pass algorithms • Sort based • Hash based • Index based algorithms
Query Optimization • Algebraic Laws • Rule Based Optimization • Heuristic rules for selection • Cost Based Optimization • Dynamic programming • Size Estimation
MapReduce • What is MapReduce • General ideas • Map • Reduce • Combiner: local aggregation for optimization • Distributed File Systems • RDB vs. MapReduce • Relational Algebra in MapReduce
Data Mining • Data Mining and Knowledge Discovery from Data • Frequent Pattern Mining • Association rules • Closed patterns and maximal patterns • Apriori algorithms • Finding Similar Patterns • Shingles • Jaccard similarity and Minhashing • Locality sensitive hashing