310 likes | 549 Views
CS145: Intro to Database Management Systems. Lecture 1: Course Overview. “data is the future” – my cab driver in Pittsburgh. Outline. Introduction Administrative stuff What is a database and why do we use it? Summary. Big Data Landscape… Infrastructure is Changing. New tech.
E N D
CS145: Intro to Database Management Systems Lecture 1: Course Overview
“data is the future” – my cab driver in Pittsburgh
Outline • Introduction • Administrative stuff • What is a database and why do we use it? • Summary
Big Data Landscape… Infrastructure is Changing New tech. Same Principles. http://www.bigdatalandscape.com/
Why should you study databases? Mercenary: make more $$$ • Startups need DB talent right away = low employee # • Massive industry… Intellectual: • Science: data poor to data rich • No idea how to handle the data! • Fundamental ideas to/from all of CS: • systems, theory, AI, logic, stats, analysis…. Many great computer systems ideas started in DB.
What this course is (and is not) Discuss fundamentals of data management • How to query databases, design databases, build applications with them. • Not how to be a DBA or how to tune Oracle 12g. We won’t get to cover the principles of how to build database management systems. see 245, 345, and 346.
Who we are… Instructor (me) Chris Ré (sounds like Ray) • Faculty in the InfoLab • Research: theory of data processing, statistical analytics, and machine reading. • chrismre@cs.stanford.edu • Office hours: MW 10-11 in Gates 433
Course Assistants (CAs) ! Remember: CAs are people (students) too!
Joy Kim Angela Gong Sam Keller Kevin McKenzie Curran Kaushik VienDinh Duong Michael Fitzpatrick Firas Abuzaid Vishnu Sundaresan Raven Jiang Gina Pai Yifei Huang Patrick Harvey
Communication w/ Course Staff • Piazza, • Course mailing list, • Office hours, and • By appointment! All are (or will be soon) listed on the course page!
Course Logistics cs145.stanford.edu
Course Elements This class is semi-flipped: • Learn from your classmates! • Some classes are flipped, some are not… • The Red F is your guide! • Attendance (10%) Lectures or Videos per week • Videos and Slides.
Lectures Lecture slides cover essential material • You can (almost) always watch Jennifer instead! • Database Systems and Locking are new this time. Try to cover same thing in many ways: Lecture, lecture notes, homework, exams (no shock) • Attendance makes your life easier… • 8 lectures mandatory…must attend GUEST LECTURES!
Graded Elements Attendance (10%) – 8 Classes. Problem Sets & EdX Questions (20%) • You can retake EdX until you get a perfect score. Programming project (20%) • Auction base. Up now! midterm & final exam (20%/30% of grade) All but the final assignment are due on Monday before class.
What is expected from you • Attend lectures • If you don’t it’s at your own peril • Be active • Ask questions, post comments on forums • Do programming and homework projects • Start early and be honest. • Study for tests and exams.
What is a DBMS? • A large, integrated collection of data • Models a real-world enterprise • Entities (e.g., Students, Courses) • Relationships (e.g.,Alice is enrolled in 145) A Database Management System (DBMS) is a piece of software designed to store and manage databases
A Motivating, Running Example • Consider building a course management system (CMS): • students • courses • professors • who takes what • who teaches what Entities Relationships
Data models • A data model is a collection of concepts for describing data • A schema is a description of a particular collection of data, using the given data model • The relational model of data is the most widely used model today • Main Concept: relation: essentially, a table • Every relation has a schema describing types, etc.
“Relational databases form the bedrock of western civilization” – Bruce Lindsay, IBM Research
Modeling the CMS • Logical Schema • Students(sid: string, name: string, gpa: float) • Courses(cid: string, cname: string, credits: int) • Enrolled(sid: string, cid: string, grade: string) Relations Students Courses Enrolled
Other Schemata… • Physical Schema: describes data layout • Relations as unordered files • Some data in sorted order (index) • Logical Schema: Previous slide • External Schema: (Views) • Course_info(cid: string, enrollment: integer) • Derived from other tables for “authorized users” Administrators Applications
Data independence • Applications do not need to worry about how the data is structured and stored Logical data independence protection from changes in the logical structure of the data Physical data independence is protection from the physical layout changes NB: One of the most important reasons to use a DBMS
Challenges with Many Users CMS application serves 1000s+ of users • Security: Different users, different roles • Performance: Need to provide concurrent access • Consistency: Concurrency can lead to update problems • Disk/SSD access is slow, DBMS hide the latency by doing more CPU work concurrently DBMS allows user to write programs as if they were the only user.
Transactions • Key concept is a transaction: an atomic sequence of db actions (reads/writes) • Transactions leave the DB in a consistent state • Users may write integrity constraints, e.g., each course is assigned to exactly one room • But, DBMS does not understand the real semantics of the data – consistency burden is still on the user! Atomicity: An action either completes entirely or not at all
Scheduling concurrent transactions • DBMS ensures that execution of {T1, … Tn} is equivalent to some serial execution • Locking: Before reading or writing transaction reqs a lock from DBMS, holds until the end • Idea: If Ti writes an item x and Tjreads x then Ti, Tjconflict • only one winner gets the lock. • loser is blocked until winner finishes What if Ti asks for X before Tj and Tj asks for Y before Ti? Deadlock! One is aborted…
Ensuring Atomicity • DBMS ensures atomicity all-or-nothing property – even if a transaction crashes! Idea: Keep a log of all writes the DB does Write-ahead log (WAL): Before a change is made, the corresponding log entry is forced to disk Idea: After a crash, partially executed transactions are undone using the log NB: Thanks to WAL, if log entry not present – then its not applied to the DB
More details about the log • The following actions are in the log: • Ti writes an object: old value and new value • Ti commits/aborts • Log records chained by Xact ID so easy undo • Log is on “stable” storage All log maintenance and concurrency handled transparently by DBMS
Friends of Databases (people made happy) • End users and DBMS vendors • Reduces cost and makes money • DB application programmers • e.g., smart webmasters • Database administrators (DBA) • Designs logical/physical schema • Handles security/authorizatino • Tuning, crash recovery, and more… Must understand DB internals
Summary of DBMS • DBMS used to maintain, query, and manage large datasets. • Provides concurrency, recovery from crashes, quick application development, integrity, and security • Key abstractions give independence • DBMS R&D is one of the broadest, most exciting fields in CS. Fact!