400 likes | 602 Views
C20.0046: Database Management Systems Lecture #1. Matthew P. Johnson Stern School of Business, NYU Spring, 2005. Personnel. Instructor: Matthew P. Johnson mjohnson@stern Office hours: tba, 8-171, KMC please visit! Tutor/TF/grader: tba….
E N D
C20.0046: Database Management SystemsLecture #1 Matthew P. Johnson Stern School of Business, NYU Spring, 2005 M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Personnel • Instructor: Matthew P. Johnson • mjohnson@stern • Office hours: tba, 8-171, KMC • please visit! • Tutor/TF/grader: tba… M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Communications • Web page: http://pages.stern.nyu.edu/~mjohnson/dbms/ • syllabus • course policies • may move in the future… • Blackboard web site • Some materials will be available here • Discussion board • send general-interest messages here to benefit all! • Go to http://sternclasses.nyu.edu • Click on C20.0046 M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Acknowledgements • Thanks to Ramesh, Ullman, et al., Raghu and Johannes, Dan Suciu, Arthur Keller, David Kuijt for course materials • See classpage for other related, antecedent DBMS courses M.P. Johnson, DBMS, Stern/NYU, Spring 2005
What Is a Database? • A very large, integrated collection of data. • Models real-world enterprise. • Entities • students, courses, instructors, TAs • Relationships • George is currently taking C20.0046 • Dick is currently teaching C20.0046 • Condi is currently TA-ing C20.0046 but took it last semester • A Database Management System (DBMS)is a software package designed to store and manage databases. M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Databases are everywhere: Ordering a pizza • Databases involved? • Pizza Hut’s DB • stores previous orders by customer • stores previous credit cards used • Credit card records • huge databases of (attempted) purchases • location, date, amount, parties • Got approved by credit-report companies • phone company’s records • Local Usage Details (“Pull his LUDs, Lenny.”) • Caller ID • ensures reported address matches destination M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Your wallet is full of DB records • Driver’s license • Credit cards • NYUCard • Medical insurance card • Social security card • Gym membership • Money (serial numbers) • Maybe even photos (ids on back) M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Databases are everywhere • Q: Websites backed by DBMSs? • retail: Amazon, etc. • data-mining: Page You Made • search engines: Google, etc. • directories: Internic, etc. • searchable DBs: IMDB, tvguide.com, etc. • Q: Non-web examples of DBMSs? • criminal/terrorist: TIA • airline bookings • NYPD’s CompStat • all serious crime stats by precinct • Retailers: Wal-Mart, etc. • when to re-order, purchase patterns, data-mining • Genomics! M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Example of a Traditional DB App Suppose we are building a system to store the information about: • checking accounts • savings accounts • account holders • state of each of each person’s accounts M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Can we do it without a DBMS? Sure we can! Start by storing the data in files: checking.txt savings.txt customers.txt Now write C or Java programs to implement specific tasks M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Doing it without a DBMS... • Transfer $100 from George’s savings to checking: Write a C program to do the following: • Read savings.txt • Find&update the record “George” • balance -= 100 • Write savings.txt • Read checking.txt • Find&update the record “George” • balance += 100 • Write checking.txt M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Problems without an DBMS... 1. System crashes: • Q: What is the problem ? • A: George lost his $100 • Same problem even if reordered 2. Simultaneous access by many users • George and Dick visit ATMs at same • Lock checking.txt before each use–what is the problem? • Read savings.txt • Find&update the rec “George.” • Write savings.txt • Read checking.txt • Find&update the rec “George” • Write checking.txt CRASH ! M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Problems without an DBMS... 3. Large data sets (say 500GB) • Why is this a problem? • No indices • Finding “George” in huge flatfile is expensive • Modifications intractable without better data structures • “George” “Georgie” is very expensive • Deletions are very expensive M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Problems without an DBMS... 5. Security? • File system may be insecure • File system security may be coarse 6. Application programming interface (API)? • suppose need other apps to access DB 7. How to interact with other DBMSs? M.P. Johnson, DBMS, Stern/NYU, Spring 2005
General problems to solve • In building our own system, many Qs arise: • how do we store the data? (file organization, etc.) • how do we query the data? (write programs…) • make sure that updates don’t mess things up? • leave the DB “consistent” • provide different views on the data? • e.g., ATM user’s view v. bank teller’s view • how do we deal with crashes? • Too hard! Go buy a DBMS! • Q: How does a DBMS solve these problems? • A: See third part of course M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Big issue: Transaction processing • Grouping of several queries (or other database actions) into one transaction • ACID properties • Atomicity • all or nothing • Consistency • constraints on relationships • Isolation • concurrency control • Simulated solipsism • Durability • Crash recovery M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Atomicity & Durability • Saw how George lost $100 with makeshift software • A DBMS prevents this outcome • xacts are all or nothing • One idea: Keep a log (history) of all actions in set of xacts • Durability: Use log to redo or undo certain ops in crash recovery • Atomicity: don’t really commit changes until end • Then, all at once M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Isolation • Concurrent execution is essential for performance. • Frequent, slow disk accesses • don’t waste CPU – keep running • Interleaving actions of different user programs • can lead to inconsistency: • e.g., two programs simultaneously withdraw from the same account • DBMS ensures such problems don’t arise: • users can pretend they are using a single-user system. M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Isolation • Contrast with a file in two Notepads • Strategy: ignore multiple users • whichever saves last wins • first save is overwritten • Contrast with a file in two Words • Strategy: blunt isolation • One can edit • To the other it’s read-only M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Consistency • Each xant (on a consistent DB) must leave it in a consistent state • can define integrity constraints • checks the defined claims about the data remain true M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Data Models • Any DBMS uses a data model: collection of concepts for describing data • Schema: description of partic set of data, using some data model • Relational data model: most widely used (by far) data model • Oracle, DB2, SQLServer, other SQL DBMSs • main concept: relation ~ table of rows & columns • a rel’s schema defines its fields M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Example: university database • Conceptual schema: • Students(ssn: string, name: string, login: string, age: int, gpa: real) • Courses(cid: string, cname: string, credits: int) • Enrolled(sid:string, cid:string, grade: string) • Physical schema: • Relations stored as unordered text files. • Indices on first column of each rel • External Schema (View): • Course_info(ssn: string, name: string) • My_courses(cname: string, grade: string) M.P. Johnson, DBMS, Stern/NYU, Spring 2005
How the programmer sees the DBMS • Start with DDL to create tables: • Continue with DML to populate tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) . . . INSERT INTO Students VALUES(‘Howard, ‘123456789’, ‘undergraduate’) . . . . M.P. Johnson, DBMS, Stern/NYU, Spring 2005
How the programmer sees the DBMS • Tables: • Still implemented as files, but behind the scenes can be quite complex Takes: Students: Courses: “data independence” = separate logical view from physical implementation M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Querying: Structured Query Language • Find all the students who have taken C20.0046: • SELECT SSN FROM Takes WHERE CID=‘C20.0046’ • Find all the students who C20.0046 last fall: • SELECT SSN FROM Takes WHERE CID=‘C20.0046’ AND Semester=‘Fall, 2003’ • Find the students’ names: • SELECT Name FROM Students, Takes WHERE Students.SSN=Takes.SSN AND CID=‘C20.0046’ AND Semester=‘Fall, 2003’ • Query processor does this efficiently. M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Database Industry • Relational databases are a great success of theoretical ideas. • based on most “theoretical” type of math there is: set theory • DBMS companies are among the largest software companies in the world. • Oracle, IBM (with DB2), Microsoft (SQL Server, Microsoft Access), etc. • Also opensource: MySQL, PostgreSQL, etc. • $20B+ industry. • XML (“semi-structured data”) also important • New lingua franca for exchanging data M.P. Johnson, DBMS, Stern/NYU, Spring 2005
The Study of DBMS • Several aspects: • Modeling and design of databases • DBMS programming: querying and update • DBMS implementation • This course covers all three • though more time on first two • Also will look at some more advanced areas • XML, data-mining, LDAP? M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Databases are used by • DB app programmers • desktop app programmers • web developers • Database administrators (DBAs) • design schemas • security/authorization • crash recovery • tuning • better paid than programmers! • Everyone else (perhaps indirectly) • “You may not be interested in databases, but databases are interested in you.” - Trotsky M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Course outline • Database design: • Entity/Relationship models • Modeling constraints • The relational model: • Relational algebra • Transforming E/R models to relational schemas • SQL • Views and triggers M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Outline (Continued) • Connecting to a database from a programming language • Storage and indexing • Transactions • XML • Advanced topics • May change as course progresses • partly in response to audience M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Textbook • Database Management Systems • by Raghu Ramakrishnan, Johannes Gehrke • 3 edition (August 14, 2002) Available: • NYU bookstore • Amazon/BN (may be cheaper) • Amazon.co.uk (may be cheaper still) • Links on class page M.P. Johnson, DBMS, Stern/NYU, Spring 2005
SQL Readings • Optional reference: SQL in a Nutshell • Online (free) SQL tutorials include: • A Gentle Introduction to SQL (http://sqlzoo.net/) • SQL for Web Nerds (http://philip.greenspun.com/sql/) M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Grading • Prerequisites: Programming experience • presumably C/C++/Java • Work & Grading: • Homework 30%: O(4) • Project: 30% - see below. • Midterm (closed book/notes): 15% • Final (closed book/notes): 20% • Class participation: 5% • Stern Curve • Class attendance is required • Absences will seriously affect your total grade M.P. Johnson, DBMS, Stern/NYU, Spring 2005
The Project: design end-to-end DB app • data model • Identify entities (and fields), relationships • Identify resulting relations (tables) • creation of DB in Oracle/Mysql • Insertion of real(alistic) data • (web) app for accessing/modifying data • Identification of “interesting” questions to ask • Production of DBMS interface • Work in pairs/threes (start forming now) • Choose topic on your own: previous e.g.s online • Start forming your group today! M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Collaboration • Homework and exams done individually • Project done with your team members only • Non-cited use of others’ problem solutions, code, etc. = plagiarism • See Stern’s stern academic honesty policy • Contact me if you’re at all unclear before a particular case • Cite any materials used if you’re at all unclear after a particular case M.P. Johnson, DBMS, Stern/NYU, Spring 2005
On-going Feedback • Don’t wait until the end-of-semester course evals to complain or give feedback on how to improve course. (It’s too late for you then!) • Come see me early on during my office hours • or send me email with your concerns • “We’re in touch, so you be in touch.” M.P. Johnson, DBMS, Stern/NYU, Spring 2005
Summary • DBMS used to maintain, query large datasets. • Benefits include recovery from system crashes, concurrent access, data integrity, security, and quick application development. • Database skills are critical in financial services, marketing and other business areas! M.P. Johnson, DBMS, Stern/NYU, Spring 2005
So what is this course about, really? A bit of everything! • Languages: SQL, XPath, XQuery • Data modeling • Some theory! • Functional dependencies, normal forms • e.g., how to find most efficient schema for data • Algorithms and data structures (in the third part) • e.g., indices make data much faster to find – but how? • Lots of implementation and hacking for the project • Business DBMS examples/cases • Most importantly: how to meet real-world needs M.P. Johnson, DBMS, Stern/NYU, Spring 2005
For next time • Get the book • Read chapters 1 & 2 M.P. Johnson, DBMS, Stern/NYU, Spring 2005
For right now: written survey • name • previous cs/is/math/logic courses • previous programming experience • career plans: programmer, DBA, MBA, etc. • why taking class M.P. Johnson, DBMS, Stern/NYU, Spring 2005