1 / 39

C20.0046: Database Management Systems Lecture #1

C20.0046: Database Management Systems Lecture #1. M.P. Johnson Stern School of Business, NYU Spring, 2008. What Is a Database?. A large, integrated collection of data which models a real-world enterprise: Entities students, courses, instructors, TAs Relationships

mikko
Download Presentation

C20.0046: Database Management Systems Lecture #1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C20.0046: Database Management SystemsLecture #1 M.P. Johnson Stern School of Business, NYU Spring, 2008 M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  2. What Is a Database? • A large, integrated collection of data • which models a real-world enterprise: • Entities • students, courses, instructors, TAs • Relationships • Hillary is currently taking C20.0046 • Barack is currently teaching C20.0046 • John is currently TA-ing C20.0046 but took it last semester • A Database Management System (DBMS)is a software package that stores and manages DBs M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  3. Databases are everywhere: non-web • criminal/terrorist: TIA • NYPD’s CompStat • Tracking crime stats by precinct • airline bookings • Retailers: Wal-Mart, etc. • when to re-order, purchase patterns, data-mining • Genomics M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  4. Databases are everywhere: web • retail: Amazon, etc. • data-mining: Page You Made • search engines • searchable DBs: IMDB, tvguide.com, etc. • Web2.0 sites: • flickr = images + tags • CMS systems (Wikis, blog & forum software, etc.) M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  5. Databases involved in ordering a pizza? • Pizza Hut’s DB • Credit card records • CC  approval by credit agencies • phone company’s records • (“Pull his LUDs, Lenny.”) • Caller ID • Error-checking, anticrime M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  6. Your wallet is full of DB records… • Driver’s license • Credit cards • NYUCard • Medical insurance card • Social security card • Money (serial numbers) • Photos (ids on back) • Etc… “You may not be interested in databases, but databases are interested in you.” - Trotsky M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  7. Example of a Traditional DB App • Suppose we build a system • We store: • checking accounts • savings accounts • account holders • state of each of each person’s accounts M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  8. Can we do without a DBMS? Sure! Start by storing the data in files: checking.txt savings.txt customers.txt Now write C or Java programs to implement specific tasks… M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  9. Doing it without a DBMS... • Transfer $100 from George’s savings to checking: Write a C program to do the following: • Read savings.txt • Find&update the line w/“George” • balance -= 100 • Write savings.txt • Read checking.txt • Find&update the line w/“George” • balance += 100 • Write checking.txt M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  10. Problems without an DBMS... 1. System crashes: • Same problem even if reordered • High-volume  (Rare  frequent) 2. Simultaneous access by many users • George and Dick visit ATMs at same time • Lock checking.txt before each use–what is the problem? • Read savings.txt • Find&update the line w/ “George.” • Write savings.txt • Read checking.txt • Find&update the line w/ “George” • Write checking.txt CRASH! M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  11. Problems without a DBMS... 3. Large data sets (100s of GBs, or TBs, …) • No indices • Finding “George” in huge flatfile is expensive • Modifications intractable without better data structures • “George”  “Georgie” is very expensive • Deletions are very expensive M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  12. Problems without an DBMS... 5. Security? • File system may lack security features • File system security may be coarse 6. Application programming interface (API)? • Interfaces, interoperability 7. How to query the data? M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  13. In homebrew system, must support • failover/rovery • concurrent use • deal with large datasets? • security • interop? • querying in what? •  DBMS as application • Q: How does a DBMS solve these problems? • A: See third part of course, but for now… M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  14. One big issue: Transaction processing • Grouping of several queries (or other DB operations) into one transaction • ACID test properties • Atomicity • all or nothing • Consistency • constraints on relationships • Isolation • concurrency control • simulated solipsism • Durability • Crash recovery M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  15. Atomicity & Durability • Avoiding inconsistent state • A DBMS prevents this outcome • xacts are all or nothing • One simple idea: log progress of and plans for each xact • Durability: changes stay made (with log…) • Atomicity: entire xact is committed at once M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  16. Isolation • Many users  concurrent execution • Disk access is slow (compared to CPU) •  don’t waste CPU – keep running • Interweaving actions of different user programs •  but can lead to inconsistency: • e.g., two programs simultaneously withdraw from the same account • For each user, should look like a single-user system • Simulated solipsism M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  17. Isolation • Contrast with a file in two Notepads • Strategy: ignore multiple users • whichever saves last wins • first save is overwritten • Contrast with a file in two Words • Strategy: blunt isolation • One can edit • To the other it’s read-only M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  18. Consistency • Each xact (on a consistent DB) must leave it in a consistent state • can define integrity constraints • checks that the defined claims about the data • Only xacts obeying them are allowed M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  19. A level up: data models • Any DBMS uses a data model: collection of concepts for describing data • Relational data model: basically universal • Oracle, DB2, SQLServer, other SQL DBMSs • Relations: table of rows & columns • a rel’s schema defines its fields • Though some have OO extensions… M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  20. Data Schemas • Schema: description of partic set of data, using some data model • “Physical schema” • Physical files on disk • Schema • Set of relations/tables, with structure • Views (“external schema”) • Virtual tables generated for user types M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  21. Schema e.g.: college registrar • Schema: • Students(ssn: string, name: string, login: string, age: int, gpa: real) • Courses(cid: string, cname: string, credits: int) • Enrolled(sid:string, cid:string, grade: string) • Physical schema: • Relations stored as unordered text files. • Indices on first column of each rel • Views: • My_courses(cname: string, grade: string, credits: int) • Course_info(ssn: string, name: string, status: string) M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  22. How the programmer sees the DBMS • Start with SQL DDL to create tables: • Continue with SQL to populate tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ); INSERT INTO Students VALUES('Hillary', '123456789', 'undergraduate'); M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  23. How the programmer sees the DBMS Takes: Students: • Ultimately files, but complex Courses: M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  24. Querying: Structured Query Language • Find all the students who have taken C20.0046: • Find all the students who C20.0046 previously: • Find the students’ names: SELECT SSN FROM Takes WHERE CID='C20.0046'; SELECT SSN FROM Takes WHERE CID='C20.0046' AND Semester='Fall, 2005'; SELECT Name FROM Students, Takes WHERE Students.SSN=Takes.SSN AND CID='C20.0046' AND Semester='Fall, 2005'; M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  25. Database Industry • Relational databases are based on set theory • Commercial DBMSs: Oracle, IBM’s DB2, Microsoft’s SQL Server, etc. • Opensource: MySQL, PostgreSQL, etc. • DBAs manage these • Programmers write apps (CRUD, etc.) • XML (“semi-structured data”) also important M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  26. The Study of DBMS • Primary aspects: • Data modeling • SQL • DB programming • DBMS implementation • This course covers all four (tho less of #4) • Also will look at some more advanced areas • XML, websearch, column-oriented DBs, RAID, RegExs, MapReduce… M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  27. Course outline • Database design: • Entity/Relationship models • Modeling constraints • The relational model: • Relational algebra • Transforming E/R models to relational schemas • SQL • DDL & query language M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  28. Course outline • Programming for databases • Some DB implementation • Indexes, sorting, xacts • Advanced topics… • May change as course progresses • partly in response to audience • Also “current events” • Slashdot/whatever, Database Blog, etc. M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  29. Textbook: Database Management Systems • by Raghu Ramakrishnan & Johannes Gehrke • 3 edition (August 14, 2002) • Available: • NYU bookstore • Amazon/BN (may be cheaper) • Amazon.co.uk (may be cheaper still) • Links on class page • Difficult but good M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  30. SQL Readings • Many SQL references available online • Good online (free) SQL tutorials include: • A Gentle Introduction to SQL (http://sqlzoo.net/) • SQL for Web Nerds (http://philip.greenspun.com/sql/) M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  31. Communications • M. P. Johnson • mjohnson@stern • Office hours: after class • To receive class mail: • Activate account: http://start.stern.nyu.edu • Forward mail: http://simon.stern.nyu.edu M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  32. Communications • Web page: http://pages.stern.nyu.edu/~mjohnson/dbms/ • syllabus • course policies • antecedent courses • Blackboard web site • Some materials will be available here • Discussion board • send general-interest messages here to benefit all • Go to http://sternclasses.nyu.edu • Click on C20.0046 M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  33. Grading • Prerequisites: • Light programming experience • A bit of “mathematical maturity” • Interest in IT/CS • Requirements & base score: • Homework 15%: O(3) • Project: 30% - see below… • Midterm (closed book/notes): 20% • Final (closed book/notes; likely 2hrs in class): 25% • Class participation/pop-quizzes: 10% • Stern Curve • Consistent class attendance is required • Absences will seriously affect your total grade. • Final score = base score – 2n-1 • where n = # missed quizzes (if n>0) M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  34. The Project: design end-to-end DB web app • data model • Identify entities & their relationships •  relations • creation of DB in Oracle/MySQL • Population with real(alistic) data • web app for accessing/modifying data • Identification of “interesting” questions & actions • Produce DBMS interface • Work in pairs (/threes) • Choose topic on your own • Start forming your group today! M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  35. Collaboration model • Homework and exams done individually • Project done with your team members only, though can in general use any tools • Non-cited use of others’ problem solutions, code, etc. = plagiarism • See Stern’s stern academic honesty policy • Contact me if you’re at all unclear before a particular case • Cite any materials used if you’re at all unclear after M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  36. On-going Feedback • Don’t be afraid to ask questions • Some parts will be abstract/mathematical • Topic selection will be partly based on student interest M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  37. So what is this course about, really? • Languages: SQL (some XML …) • Data modeling • Some theory! (rereading…) • Functional dependencies, normal forms • e.g., how to find most efficient schema for data • Some DBMS implementation (algs & data structs) • Algorithms and data structures (in the latter part) • e.g., indices make data much faster to find – how? • Lots of DB implementation and hacking for the project M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  38. For next time • Get the book • Skim chapter 1 • Start reading chapter 2 M.P. Johnson, DBMS, Stern/NYU, Spring 2008

  39. For right now/tonight: email survey • Send to SOMEWHERE • name • previous cs/is/math/logic courses • previous programming experience • career plans: programmer, DBA, MBA, etc. • why taking class/what you’re interested in learning about M.P. Johnson, DBMS, Stern/NYU, Spring 2008

More Related