400 likes | 539 Views
C20.0046: Database Management Systems Lecture #1. Matthew P. Johnson Stern School of Business, NYU Spring, 2004. Personnel. Instructor: Matthew P. Johnson KMEC 8-176, mjohnson@stern.nyu.edu Office hours: KMC 8-176, Th 12:15-2:15
E N D
C20.0046: Database Management SystemsLecture #1 Matthew P. Johnson Stern School of Business, NYU Spring, 2004 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Personnel • Instructor: Matthew P. Johnson • KMEC 8-176, mjohnson@stern.nyu.edu • Office hours: KMC 8-176, Th 12:15-2:15 • please visit! • Tutor/TF/grader: Matthew P. Johnson! M.P. Johnson, DBMS, Stern/NYU, Sp2004
Communications • Web page: http://www.columbia.edu/~mpj9/dbms • syllabus • course policies • may move in the future… • Blackboard web site • Some materials will be available here • Discussion board • send general-interest messages here to benefit all! • Go to http://sternclasses.nyu.edu • Click on C20.0046 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Acknowledgements • Thanks to Ramesh, Ullman, et al., Raghu and Johannes, Dan Suciu, Arthur Keller, David Kuijt for course materials • See classpage for other related, antecedent DBMS courses M.P. Johnson, DBMS, Stern/NYU, Sp2004
What Is a Database? • A very large, integrated collection of data. • Models real-world enterprise. • Entities (e.g., students, courses, instructors, TAs) • Relationships (e.g., Joe is taking C20.0046) • George is currently taking C20.0046 • Dick is currently teaching C20.0046 • Condi is currently TA-ing C20.0046 but took it last semester • A Database Management System (DBMS)is a software package designed to store and manage databases. M.P. Johnson, DBMS, Stern/NYU, Sp2004
Databases are everywhere • Example: Ordering a pizza • Databases involved? • Pizza Hut’s DB • stores previous orders by customer • stores previous credit cards used • Credit card records • huge databases of (attempted) purchases • location, date, amount, parties • phone company’s records • Local Usage Details (“Pull his LUDs, Lenny.”) • Caller ID • ensures reported address matches destination M.P. Johnson, DBMS, Stern/NYU, Sp2004
Your wallet is full of DB records • Driver’s license • Credit cards • NYUCard • Medical insurance card • Social security card • Gym membership • Money (serial numbers) • Maybe even photos M.P. Johnson, DBMS, Stern/NYU, Sp2004
Databases are everywhere • Q: Websites backed by DBMSs? • retail: Amazon, etc. • data-mining: Page You Made • search engines: Google, etc. • directories: Internic, etc. • searchable DBs: IMDB, tvguide.com, etc. • Q: Non-web examples of DBMSs? • criminal/terrorist: TIA • airline bookings • NYPD’s CompStat • all serious crime stats by precinct • Retailers: Wal-Mart, etc. • when to re-order, purchase patterns, data-mining • Genomics! M.P. Johnson, DBMS, Stern/NYU, Sp2004
Example of a Traditional DB App Suppose we are building a system to store the information about: • checking accounts • savings accounts • account holders • state of each of each person’s accounts M.P. Johnson, DBMS, Stern/NYU, Sp2004
Can we do it without a DBMS? Sure we can! Start by storing the data in files: checking.txt savings.txt customers.txt Now write C or Java programs to implement specific tasks M.P. Johnson, DBMS, Stern/NYU, Sp2004
Doing it without a DBMS... • Transfer $100 from George’s savings to checking: Write a C program to do the following: • Read ‘savings.txt’ • Find&update the record “George” • balance -= 100 • Write ‘savings.txt’ • Read ‘checking.txt’ • Find&update the record “George” • balance += 100 • Write ‘checking.txt’ M.P. Johnson, DBMS, Stern/NYU, Sp2004
Problems without an DBMS... 1. System crashes: • Q: What is the problem ? • A: George lost his $100 • Same problem even if reordered 2. Simultaneous access by many users • George and Dick visit ATMs at same • Lock checking.txt before each use– what is the problem? • Read ‘savings.txt’ • Find&update the rec “George.” • Write ‘savings.txt’ • Read ‘checking.txt’ • Find&update the rec “George” • Write ‘checking.txt’ CRASH ! M.P. Johnson, DBMS, Stern/NYU, Sp2004
Problems without an DBMS... 3. Large data sets (say 50GB) • Why is this a problem? • No indices • Finding “George” in huge flatfile is expensive • Modifications intractable without better data structures • “George” “Georgie” is very expensive • Deletions are very expensive M.P. Johnson, DBMS, Stern/NYU, Sp2004
Problems without an DBMS... 5. Security? • File system may be insecure • File system security may be coarse 6. Application programming interface (API)? • suppose need other apps to access DB 7. How to interact with other DBMSs? M.P. Johnson, DBMS, Stern/NYU, Sp2004
General problems to solve • In building our own system, many Qs arise: • how do we store the data? (file organization, etc.) • how do we query the data? (write programs…) • make sure that updates don’t mess things up? • leave the DB “consistent” • provide different views on the data? • e.g., ATM user’s view v. bank teller’s view • how do we deal with crashes? • Too hard! Go buy a DBMS! • Q: How does a DBMS solve these problems? • A: See third part of course M.P. Johnson, DBMS, Stern/NYU, Sp2004
Big issue: Transaction processing • Grouping of several queries (or other database actions) into one transaction • ACID properties • Atomicity • all or nothing • Consistency • constraints on relationships • Isolation • concurrency control • Simulated solipsim • Durability • Crash recovery M.P. Johnson, DBMS, Stern/NYU, Sp2004
Atomicity & Durability • Saw how George lost $100 with makeshift DBMS • DBMS prevents this outcome • xacts are all or nothing • One idea: Keep a log (history) of all actions in set of xacts • Durability: Use log to redo or undo certain ops in crash recovery • Atomicity: don’t really commit changes until end • Then, all at once M.P. Johnson, DBMS, Stern/NYU, Sp2004
Isolation • Concurrent execution is essential for performance. • Frequent, slow disk accesses • don’t waste CPU – keep running • Interleaving actions of different user programs • can lead to inconsistency: • e.g., two programs simultaneously withdraw from the same account • DBMS ensures such problems don’t arise: • users can pretend they are using a single-user system. M.P. Johnson, DBMS, Stern/NYU, Sp2004
Isolation • Contrast with a file in two Notepads • Strategy: ignore multiple users • whichever saves last wins • first save is overwritten • Contrast with a file in two Words • Strategy: blunt isolation • One can edit • To the other it’s read-only M.P. Johnson, DBMS, Stern/NYU, Sp2004
Consistency • Each xant (on a consistent DB) must leave it in a consistent state • can define integrity constraints • checks the defined claims about the data remain true M.P. Johnson, DBMS, Stern/NYU, Sp2004
Data Models • Any DBMS uses a data model: collection of concepts for describing data • Schema: description of partic set of data, using some data model • Relational data model: most widely used (by far) data model • Oracle, DB2, SQLServer, other SQL DBMSs • main concept: relation ~ table of rows & columns • a rel’s schema defines its fields M.P. Johnson, DBMS, Stern/NYU, Sp2004
Example: university database • Conceptual schema: • Students(ssn: string, name: string, login: string, age: int, gpa: real) • Courses(cid: string, cname: string, credits: int) • Enrolled(sid:string, cid:string, grade: string) • Physical schema: • Relations stored as unordered text files. • Indices on first column of each rel • External Schema (View): • Course_info(ssn: string, name: string) • My_courses(cname: string, grade: string) M.P. Johnson, DBMS, Stern/NYU, Sp2004
How the programmer sees the DBMS • Start with DDL to create tables: • Continue with DML to populate tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) . . . INSERT INTO Students VALUES(‘Howard, ‘123456789’, ‘undergraduate’) . . . . M.P. Johnson, DBMS, Stern/NYU, Sp2004
How the programmer sees the DBMS • Tables: • Still implemented as files, but behind the scenes can be quite complex Takes: Students: Courses: “data independence” = separate logical view from physical implementation M.P. Johnson, DBMS, Stern/NYU, Sp2004
Querying: Structured Query Language • Find all the students who have taken C20.0046: • SELECT SSN FROM Takes WHERE CID=“C20.0046” • Find all the students who C20.0046 last fall: • SELECT SSN FROM Takes WHERE CID=“C20.0046” AND Semester=“Fall, 2003” • Find the students’ names: • SELECT Name FROM Students, Takes WHERE Students.SSN=Takes.SSN AND CID=“C20.0046” AND Semester=“Fall, 2003” • Query processor does this efficiently. M.P. Johnson, DBMS, Stern/NYU, Sp2004
Database Industry • Relational databases are a great success of theoretical ideas. • based on most “theoretical” type of math there is: set theory • DBMS companies are among the largest software companies in the world. • Oracle, IBM (with DB2), Microsoft (SQL Server, Microsoft Access), Sybase. • Also opensource: MySQL, Postgres, etc. • $20B+ industry. • XML (“semi-structured data”) also important • New lingua franca for exchanging data M.P. Johnson, DBMS, Stern/NYU, Sp2004
The Study of DBMS • Several aspects: • Modeling and design of databases • DBMS programming: querying and update • DBMS implementation • This course covers all three • though more time on first two • Also will look at some more advanced areas • XML, data-mining, LDAP? M.P. Johnson, DBMS, Stern/NYU, Sp2004
Databases are used by • DB app programmers • desktop app programmers • web developers • Database administrators (DBAs) • design schemas • security/authorization • crash recovery • tuning • better paid than programmers! • Everyone else (perhaps indirectly) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Course outline • Database design: • Entity/Relationship models • Modeling constraints • The relational model: • Relational algebra • Transforming E/R models to relational schemas • SQL • Views and triggers M.P. Johnson, DBMS, Stern/NYU, Sp2004
Outline (Continued) • Connecting to a database from a programming language • Storage and indexing • Transactions • XML • Advanced topics • May change as course progresses • partly in response to audience M.P. Johnson, DBMS, Stern/NYU, Sp2004
Textbook • Database Systems: The Complete Book • Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom • 1st Edition (2001) • Available: • NYU bookstore • Amazon/BN (direct links on classpage) • Amazon.co.uk (total is/was about $20 less) • First two chapters in PDF on classpage M.P. Johnson, DBMS, Stern/NYU, Sp2004
SQL Readings • Optional reference: SQL in a Nutshell • Online (free) SQL tutorials include: • A Gentle Introduction to SQL (http://sqlzoo.net/) • SQL for Web Nerds (http://philip.greenspun.com/sql/) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Grading • Prerequisites: Programming experience • presumably C/C++/Java • Work & Grading: • Homework 30%: O(4) • Project: 30% - see below. • Midterm (closed book/notes): 15% • Final (closed book/notes): 20% • Class participation: 5% • Stern Curve • Class attendance is required • Absences will affect your total grade M.P. Johnson, DBMS, Stern/NYU, Sp2004
The Project: design end-to-end DB app • data model • Identify entities (and fields), relationships • Identify resulting relations (tables) • creation of DB in Oracle • Insertion of real(alistic) data • (web) app for accessing/modifying data • Identification of “interesting” questions to ask • Production of DBMS interface • Work in pairs (start forming now) • Choose topic on your own: previous e.g.s online • Start forming your group today! M.P. Johnson, DBMS, Stern/NYU, Sp2004
Collaboration • Homework and exams done individually • Project done with your team members only • Non-cited use of others’ problem solutions, code, etc. = plagiarism • See Stern’s stern academic honesty policy • Contact me if you’re at all unclear before a particular case • Cite any materials used if you’re at all unclear after a particular case M.P. Johnson, DBMS, Stern/NYU, Sp2004
On-going Feedback • Don’t wait until the end-of-semester course evals to complain or give feedback on how to improve course. (It’s too late for you then!) • Come see me early on during my office hours • or send me email with your concerns • “We’re in touch, so you be in touch.” M.P. Johnson, DBMS, Stern/NYU, Sp2004
Summary • DBMS used to maintain, query large datasets. • Benefits include recovery from system crashes, concurrent access, data integrity, security, and quick application development. • Database skills are critical in financial services, marketing and other business areas! M.P. Johnson, DBMS, Stern/NYU, Sp2004
So what is this course about, really ? A bit of everything ! • Languages: SQL, XPath, XQuery • Data modeling • Some theory! • Functional dependencies, normal forms • e.g., how to find most efficient schema for data • Algorithms and data structures (in the third part) • e.g., indices make data much faster to find – but how? • Lots of implementation and hacking for the project • Business DBMS examples/cases • Most importantly: how to meet real-world needs M.P. Johnson, DBMS, Stern/NYU, Sp2004
For next time • Get the book • Read chapters 1, 2.1-2.2 M.P. Johnson, DBMS, Stern/NYU, Sp2004
For right now: written survey • name • previous cs/is/math/logic courses • previous programming experience • career plans: programmer, DBA, MBA, etc. • why taking class • any religious holidays during class M.P. Johnson, DBMS, Stern/NYU, Sp2004