200 likes | 345 Views
CC5212-1 Procesamiento Masivo de Datos Otoño 2014. Aidan Hogan aidhog@gmail.com Wra p-Up. Course Marking. 45% for Weekly Labs (~3% a lab!) 35% for Final Exam 20% for Small Class Project. Final Exam (35%). Next Tuesday, 9am, Room to be confirmed
E N D
CC5212-1ProcesamientoMasivo de DatosOtoño2014 Aidan Hogan aidhog@gmail.com Wrap-Up
Course Marking • 45% for Weekly Labs (~3% a lab!) • 35% for Final Exam • 20% for Small Class Project
Final Exam (35%) • Next Tuesday, 9am, • Room to be confirmed • Goal: test your understanding of concepts • Coding covered by labs/project • No syntax writing questions! • but there will be design and syntax reading questions • Max. three hours (it won’t take that long !) • Not marking you on English • If really stuck, write in Spanish! • Four questions (marked on best three) …
The following is not a legally abiding agreement. It is just a helpful guide for what’s important.
Q1: Distributed Systems (Slides) Slides: [02] MDP-02-Intro-Dist-Sys-20140317.pptx [03] MDP-03-Fallacies-CAP-20140324.pptx [04] MDP-04-Consensus-Paxos-20140331.pptx e.g., [02, S. 3–7] = Slide deck [02],slides 3 to 7 Names per the homepage: http://aidanhogan.com/teaching/cc5212-1/ Slides indicated are only a guide!
Q1: Distributed Systems (Topics) Possible Topics: • Advantages/disadvantages of a distributed system[02, S. 3–7] • Five distributed system design goals [02, S. 9–11] • Distributed architectures (P2P vs. C–S, Fat/Thin, n-Tier, etc.) [02, S. 14–31] • Java RMI (high-level) [02, S. 47–59] • Eight fallacies of distributed computing[03, S. 12–20] • Consensus basics (fail-stop vs. Byzantine, synchronous vs. asynchronous, goals) [04, S. 04–22] • Consensus protocols (2PC, 3PC, Paxos) [04, S. 25–55] • CAP theorem will appear in Q4
Q2: GFS (HDFS) / MapReduce (Hadoop) Slides: [05] MDP-05-DFS-MapReduce-20140407.pptx [06] MDP-06-Hadoop-20140414.pptx [07] MDP-07-Pig-20140421.pptx e.g., [02, S. 3–7] = Slide deck [02], slides 3 to 7 Names per the homepage: http://aidanhogan.com/teaching/cc5212-1/ Slides indicated are only a guide!
Q2: GFS (HDFS) / MapReduce (Hadoop) Possible Topics: • Google File System (reads, writes, fault-tolerance) [05, S. 11–27] • MapReduce(incl. design question)[05, S. 36–46; 06 S. 11–16] • HDFS/Hadoop (architecture) [06, S. 18–20] • Pig (high-level, give result from input and script) [07]
Q3: Information Retrieval (Slides) Slides: [08] MDP-08-Search-20140428.pptx [09] MDP-09-Ranking-20140505.pptx e.g., [02, S. 3–7] = Slide deck [02], slides 3 to 7 Names per the homepage: http://aidanhogan.com/teaching/cc5212-1/ Slides indicated are only a guide!
Q3: Information Retrieval (Topics) Possible Topics: • Crawling (high-level multi-threading, (D)DoS, robots.txt, sitemap, distribution, bow-tie) [08, S. 18–32] • Inverted indexes (data structure, normalisation, Heap’s law, Ziph’s law, Elias encoding, etc.)[08, S. 36–51] • Ranking (relevance vs. importance, TF-IDF, Vector Space Model, etc.) [09, S. 09–31] • PageRank (concept, random surfer, calculation) [09, S. 35–56]
Q4: NoSQL and Querying (Slides) Slides: [03] MDP-03-Fallacies-CAP-20140324.pptx [10] MDP-10-Intro-to-NoSQL-20140512.pptx [11] MDP-11-BigTable+Cassandra.pptx e.g., [02, S. 3–7] = Slide deck [02], slides 3 to 7 Names per the homepage: http://aidanhogan.com/teaching/cc5212-1/ Slides indicated are only a guide!
Q4: NoSQL and Querying (Topics) Possible Topics: • CAP theorem [03, S. 23–39] (<- note out of order) • The Database Landscape [10, S. 10] • Key–Value stores (data model, operations, distribution, consistent hashing, replication, Dynamo, Merkle trees) [10, S. 18–38] • Document stores (high-level) [10, S. 44–45] • Tabular/column-families (data model, Bigtable, sorting, tablets, column families, SSTables, writes, reads, compactions, hierarchy, bloom filters [11, S. 17–36] • Graph databases (high-level) [11, S. 45–52] • Cassandra (high-level) [11, S.61–69]
Final Exam (35%) Recap • Next Tuesday, 9am, • Room to be confirmed • Goal: test your understanding of concepts • Coding covered by labs/project • No syntax writing questions! • but there will be design and syntax reading questions • Max. three hours (it won’t take that long !) • Not marking you on English • If really stuck, write in Spanish! • Four questions (marked on best three) …