200 likes | 215 Views
Join CS5423 to delve into foundational models, relational database internals, and advanced topics like query optimization, data mining, and more. The course includes readings, presentations, projects, and a final exam.
E N D
CS5423Principles of Database Systems Introduction
Welcome to CS5423 • Course website: http://www.cs.okstate.edu/~eakbas/db5423.html • Time: 4:30--7:10pm Wednesday • Venue: MSCS 310 • Please go over the syllabus carefully before taking the class!
Welcome to CS5423! • Instructor • Prof. Esra Akbas http://www.cs.okstate.edu/~akbas • Office hours: • Monday: 10.30Am-12.00pm • Wednesday: 3-4.30pm • Or by appointment • Office: MCSC 216 • Email: eakbas@okstate.edu • Research interest: • Database, data mining, information/social network and graph analysis • TA • Habib BoloorchiTabrizi • Office: MSCS 312 • email: hboloorAT okstate DOT edu
The Goal of CS5423! • Reflection of the foundation: • the foundational models, representations, systems, and techniques for relational database systems, by way of reading and lectures • Projection on the outlook: • what’s the next advanced database systems? • by way of reading and presenting the classics and the state-of-the-art, and by way of doing projects!
The Contents of CS5423! • Relational Database Internals • Fundamentals for relational databases • Data storage and representation • Advanced indexing • Query processing and execution • Query optimization • …… • Advanced Database Topics • Parallel/Distributed databases (MapReduce) • Data mining (selected topics) • Data on the Web • ……
Welcome to CS5423! • Textbook • Database Systems: The Complete Book 2nd edition • Hector Garcia-Molina, Jeff Ullman and Jennifer Widom • Recommended reading • Fundamentals of Database Systems 7th edition, by Elmasri and Navathe. ISBN: 9780133970777, 2016. • Database Management Systems 3rd edition, by Raghu Ramakrishnan and Johannes Gehrke • Readings in Database Systems 5th edition, by P. Bailis J. Hellerstein and M. Stonebraker (http://www.redbook.io) • The Web • Prerequisites • CS4433: Introduction to Database Systems • CS3423:Data Structures and Algorithms • Good programming skills
Welcome to COP5725! • Components of the course • One lecture every week • Two assignments (15%) • A series of papers to be read and summarized (15%) • One or two-page paper summary • Paper presentation (10%) • Each student will present one paper related to the project in the class for 20(?) minutes • Semester-long group project (35%)- at most 2 students • Research-flavor • Implementation-flavor • Final exam (25%)
Paper Summaries • Milestone papers in database systems • Every paper will be assigned early in the Canvas • One to two pages summary includes • What is the problem? • Why is this problem important, difficult, and worthy of a thorough study? • What are the innovative ideas and technical merits? • Comments on the experimental evaluations, if any • Any drawbacks and potential improvement? • Summarize based on your own understanding. Verbatim copying from the paper results in low scores • Contents in the paper may be tested in the final exam!
Paper Presentation • Every student will have a chance to select one paper to present in the class • The paper should be closely related to the project you are conducting • The slides (pptx/ppt/pdf) should be sent to the instructor at least one day prior to the class you will be presenting • The slides organization should be similar to the requirement of the paper summary • 20(?) minutes presentation and 5-10 minutes Q&A • Student will sign up for the presentation in the near future
Project • Theme: choose either of the two • Research-flavor: • find an interesting, nontrivial data management problem, propose a novel and effective solution to it • Implementation-flavor: • find interesting methods/algorithms in a data management paper, implement it, and perform experimental studies • Teamwork: a group of one or two students (but no more!) • The project is partitioned into multiple milestones, each of which requires deliverables • Pay attention to the workload!
Multi-stage Project • Group formation (0%) • Project Proposal (10%) • What I want to do? • Literature Survey (20%) • What are the state-of-the-art? • Status report (10%) • What I have achieved thus far • Presentation/Demo, Source code, software and final report (60%) • Dude, these are deliverables!
Implementation Project • Topics: • Choose a research paper published in the following conferences/journals on or after 2008, implement it and finish experimental studies related to this idea • Conferences: SIGMOD, VLDB, ICDE, KDD, ICDM, WWW, CIKM, IEEE Big Data.. • Journals: TODS, VLDB Journal, TKDD.. • Workload (in C/C++, Java, Python) • Experimental studies on real/synthetic data • Expectation • Source code, software, detailed readmes and scripts, and a final report • Soundness, Repeatability, Completeness of datasets and experimental studies, Efficiency, Effectiveness, Scalability ……
Research Project • Topics: • A state-of-the-art data management, mining problem in your research area • Workload • Problem definition, algorithm design and analysis, implementation, experimental studies • Your innovative ideas! • Expectation • A conference-quality (potential publishable) paper • Source code, software, detailed readmes and scripts
Is This Course Suitable For Me? • Prerequisites • Introduction to database systems • Relational model, relational algebra, relational design, SQL, B/B+ tree, hashing, transaction management, crash recovery…… • Data structures and algorithms • Difference between stack and queue? • Worst-case complexity for insertion/deletion in Red-black trees? • Dijkstra algorithm for shortest-path computation • Set-cover is NP-complete • ……. • Feel comfortable in programing (a lot)
What are we still working on? • Why Are You Taking this Course? • Relational DBMS was invented in early 70’s, and now 50+ billion mature industry • What are we still working on? • Database • http://www.youtube.com/watch?v=Q2GMtIuaNzU • Big data • http://www.youtube.com/watch?v=LrNlZ7-SMPk • Are you interested more in being • An IT guru at Goldman-Sachs or Boeing? • A system developer at Oracle or Google? • A data scientist at Facebook or Uber? • A DB pro or researcher in Microsoft research or IBM research? • A professorexploring the most exciting, and fastest growing area in CS?
In Science – Turing Awardees The ACM A.M. Turing Award is an annual prize given by the Association for Computing Machinery (ACM) to an individual selected for contributions "of lasting and major technical importance to the computer field" • CHARLES BACHMAN, 1973 • - Known for his work in the early development of database management systems. • Edgar codd, 1981 • -Invented Relational model (RM), the theoretical basis for relational databases and relational database management systems. • Michael stonebraker, 2014 • -Research and products are central to many relational database systems • James Gray, 1998 • -For seminal contributions to database and transaction processing research
The Grand Challenges of Data Management • What is the ultimately advanced DB? • Dataof all sorts--- Prevalent on the Web! • What have you been searching lately? • What you search is what you want? • New challenges naturally arise • structured vs. unstructured data • querying vs. analysis vs. mining vs. learning • closed “base” vs. the open Web