1 / 126

CSE 190D Database System Implementation

CSE 190D Database System Implementation. Spring 2019 Arun Kumar. What is this course about? Why take it?. 1. IBM’s Watson beats humans in Jeapordy!. How did Watson achieve that?. Watson devoured LOTS of data!. 2. “Structured” data with Google search results. How does Google know that?.

kimberly
Download Presentation

CSE 190D Database System Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 190DDatabase System Implementation Spring 2019 Arun Kumar

  2. What is this course about? Why take it?

  3. 1. IBM’s Watson beats humans in Jeapordy!

  4. How did Watson achieve that?

  5. Watson devoured LOTS of data!

  6. 2. “Structured” data with Google search results

  7. How does Google know that?

  8. Google also devours LOTS of data!

  9. 3. Amazon’s “spot-on” recommendations

  10. How does Amazon know that?

  11. You guessed it! LOTS and LOTS of data! Analysis

  12. And innumerable “traditional” applications

  13. Large-scale data management systems are the cornerstone of many digital applications, both modern and traditional

  14. The Age of “Big Data”/“Data Science”

  15. Data data everywhere, All the wallets did shrink! Data data everywhere, Nor any moment to think?

  16. CSE 190 will get you thinking about the fundamentals of database systems How are large structured datasets stored and organized? How are “queries” handled? How to make the system faster? Deeper and more recent issues

  17. And now for the (boring) logistics …

  18. Course Administrivia • Lectures: MWF 12:00-12:50pm, WLH 2212 • Attending ALL lectures is mandatory! • Instructor: Arun Kumar; arunkk@eng.ucsd.edu • Office hours: Wed 3-4pm, 3218 CSE/EBU3b • TA: Costas Zarifis • Piazza: TBD • CSE190D on TritonEd for project submissions • Bring iClicker to all lectures; register it on TritonEd! http://cseweb.ucsd.edu/classes/sp19/cse190-d/

  19. Prerequisites • CSE 132A or equivalent DB course necessary • Email the instructor with justification for waivers! • CSE 120 will be helpful but is not a prerequisite • C++ is needed for course projects; check course webpage for resources http://cseweb.ucsd.edu/classes/sp19/cse190-d/

  20. Grading • Midterm Exam:15% • Date: Wednesday, May 1, in-class • Final Exam:35%(cumulative) • Date: TBD, Room TBD • Project 1 (Buffer Manager): 15% • Project 2 (B+ Tree Index): 25% • Surprise Quizzes (in-class with iclicker): 10% http://cseweb.ucsd.edu/classes/sp19/cse190-d/

  21. Course Projects • BadgerDB: an RDBMS “skeleton”in C++ • Project 1: Buffer Manager • Implement “clock algorithm” for buffer replacement • Project 2: B+ Tree Index • Implement data structure and insert/update/delete ops • Teams of 2 or 1 only; no sharing of code across teams! The University has strict rules on plagiarism. • No late days http://cseweb.ucsd.edu/classes/sp19/cse190-d/

  22. Course Outline How are large structured datasets stored and organized? How are “queries” handled? How to make the system faster? Deeper and more recent issues Storage, file layout, and buffer management Indexing, sorting, relational operator implementation, and query processing Query optimization Transactions, Big Data, NoSQL, etc. http://cseweb.ucsd.edu/classes/sp19/cse190-d/

  23. The primary focus will be the relational data model and Relational Database Management Systems (RDBMS)

  24. Relational model in a nutshell Basically, Relation:Table :: Pilot:Driver (okay, a bit more) The model formalizes “operations” to manipulate relations

  25. Relational model in a nutshell Invented by E. F. Codd in 1970s at IBM Research in CA

  26. Relational DBMS in a nutshell A software system to implement the relational model, i.e., enable users to manage data stored as relations

  27. Relational DBMS in a nutshell First RDBMSs: System R (IBM) and Ingres (Berkeley) in 1970s A rare photo of the original System R manual Mike Stonebraker won the Turing Award in 2015!

  28. Relational DBMS in a nutshell RDBMS software is now a USD 20+ billions/year industry; many open source RDBMSs also exist People still start companies about what are basically RDBMSs!

  29. Course Textbook Prescribed: “Database Management Systems” 3rd Edition Raghu Ramakrishnan and Johannes Gehrke Aka The “Cow Book” Which cow are you? Optional: “Database Systems: The Complete Book” H.G. Molina, J.D. Ullman, and J. Widom

  30. Course Schedule

  31. General Dos and Do NOTs • Do: • Raise your hand before asking questions during lectures • Participate in class discussions and on Piazza • Bring iClicker to every lecture • Use “CSE190D:” as subject prefix for all emails to me/TA • Do NOT: • Use laptops, tablets, mobile phones, or any other electronic devices during lectures • Use email as primary communication mechanism for doubts/questions instead of Office Hours • Record or quote the instructor’s anecdotes out of class! ☺

  32. Questions?

  33. Recap: Relational Model

  34. Integers Real … Relational Model: Basic Terms What is a Relation? What are Attributes? What are Domains? What is Arity? Ratings A glorified table! These things The mathematical “domains” for the attributes Number of attributes

  35. Relational Model: Basic Terms What are Tuples? What is Cardinality? Ratings These things Number of tuples

  36. Referring to “tuples”: Two notations Without using attribute names (positional/sequence) Using attribute names (named/set) Ratings (R) t[1] = 3.5 t.NumStars = 3.5

  37. Relational Model: Basic Terms What is Schema? Ratings The relation name, and the name and logical descriptions of the attributes (including domains) Aka “metadata”

  38. Relational Model: Basic Terms What is an Instance? Ratings Instance 1 Instance 2 A given relation populated with a set of tuples (loose analogy: schema:instance::type:value in PL)

  39. Relational Model: Basic Terms Ratings What is a Relational Database? Users Movies A collection of relations; similarly, schema vs. instance

  40. “Write Operations” on a Relation • Insert • Add tuples to a relation • Delete • Remove tuples from a relation (typically based on “predicate” matches, e.g., “NumStars <= 4.5” • Modify • Logically, deletes + inserts, but typically implemented as in-place updates to a relation instance

  41. “Read Operations” on a Relation • “Select” • Select all tuples from Ratings with “UserID == 19” • “Project” • Select only Director attribute from Movies • “Aggregate” • Select Average of all NumStars in Ratings And a few more formal “algebraic” operations …

  42. Recap: Relational Algebra

  43. Basic Relational Operations Select Project Rename Cross Product (aka Cartesian Product) Set Operations: Union Set Difference

  44. Select Ratings (R) Example: Get all ratings with 4.0 or more stars Select “Operator” “Selection condition/predicate”

  45. Select R Schema preserved Subset of tuples (satisfying selection condition) R’

  46. Complex Selection Conditions R R’

  47. Basic Relational Operations Select Project Rename Cross Product (aka Cartesian Product) Set Operations: Union Set Difference

  48. Project Ratings (R) Example: Get all MovieIDs “Projection list”

  49. Project R R’ Schema reduced (to projection list) Tuple values “deduplicated” (slightly different semantics in SQL)

More Related