300 likes | 310 Views
This course covers the fundamentals of database systems, including data modeling, database design, and data organization. Students will learn how to use a DBMS to store, manage, and retrieve data from databases. Topics include entity-relationship modeling, relational model, and database design.
E N D
About the course – Administrivia • Instructor: • George Kollios, gkollios@cs.bu.edu MCS 279, Wed 1:00-2:30 and Thu 1:30-3:00 • Home Page: • http://www.cs.bu.edu/fac/gkollios/db10 Check frequently! Syllabus, schedule, assignments, announcements…
Textbook Raghu Ramakrishnan and Johannes Gehrke, "Database Management Systems", McGraw-Hill, Third Edition. 2002.
Grading CS460 • Homeworks: 20% • Midterm 20% • Final 30% • Projects 30% • Implement a Web application using a DBMS • Modify/improve an existing database system (PostgreSQL)
Grading CS660 • Homeworks: 15% • Midterm: 15% • Final: 30% • Projects: 30% • Participation in Seminars and Survey: 10%
What is a Database System? • Database: A very large collection of related data • Models a real world enterprise: • Entities (e.g., teams, games / students, courses) • Relationships (e.g., The Celtics are playing in the Final!) • Even active components (e.g. “business logic”) • DBMS: A software package/system that can be used to store, manage and retrieveទាញ data form databases • Database System: DBMS+data (+ applications) • Entity is a object. Relationships. DBMS=Database Management System.
Why Study Databases?? • Shiftផ្លាស់ប្តូរfrom computationការគណនាto information • Always true for corporateដែលរួមគ្នាcomputing • More and more true in the scientific វិទ្យាសាស្រ្តworld • and of course, Web • DBMS necompasses much of CS in a practical disciplineរបៀប • OS, languages, theoryទ្រឹស្តី, AI, logicបានជាសិក្សាdatabaseពីព្រោះចង់លើការបង្កើត
Why Databases?? • Why not store everything on flat files: use the file system of the OS, cheap/simple… Name, Course, Grade John Smith, CS112, B Mike Stonebraker, CS234, A Jim Gray, CS560, A John Smith, CS560, B+ ………………… • Yes, but not scalable…
Problem 1 • Data redundancy and inconsistency(ច្របូកច្របល់និងភាពមិនស្លិតសេ្ថរគ្នា) • Multiple file formats, duplication of information in different files Name, Course, Email, Grade John Smith, js@cs.bu.edu, CS112, B Mike Stonebraker, ms@cs.bu.edu, CS234, A Jim Gray, CS560, jg@cs.bu.edu, A John Smith, CS560, js@cs.bu.edu, B+ Why this a problem? • Wasted space • Potential inconsistencies (multiple formats, John Smith vs Smith J.)
Problem 2 • Data retrieval:ការទាញយក • Find the students who took CS560 • Find the students with GPA > 3.5 For every query សំនួរwe need to write a program! • We need the retrieval to be: • Easy to write • Execute efficiently
Problem 3 • Data Integrity • No support for sharing: • Prevent simultaneous modifications • No coping mechanisms for system crashesប្រព័ន្ធធ្លាក់ • No means of Preventing Data Entry Errors (checks must be hard-coded in the programs) • Security problems • Database systems offer solutions to all the above problems
Data Organization • Two levels of data modeling • Conceptual or Logical level: describes data stored in database, and the relationships among the data. type customer = recordname : string;street : string;city : integer; end; • Physical level: describes how a record (e.g., customer) is stored. • Also, External (View) level: application programs hide details of data types. Views can also hide information (e.g., salary) for security purposes.
View of Data A logical architecture for a database system
Database Schemaឆ្ហឹងនៃពុម្ព • Similar to types and variables in programming languages • Schema – the structure of the database • e.g., the database consists of information about a set of customers and accounts and the relationship between them • Analogous to type information of a variable in a program • Physical schema: database design at the physical level • Logical schema: database design at the logical level
Data Organization • Data Models: a framework for describing • data • data relationships • data semanticsនិឃណ្កសាស្រ្ត • data constraintsការបព្ជា • Entity-Relationship model • We will concentrate on Relational model • Other models: • object-oriented model • semi-structured data models, XML
Entity-Relationship Model Example of schema in the entity-relationship model
Entity Relationship Model (Cont.) • E-R model of real world • Entities (objects) • E.g. customers, accounts, bank branch • Relationships between entities • E.g. Account A-101 is held by customer Johnson • Relationship set depositor associates customers with accounts • Widely used for database design • Database design in E-R model usually converted to design in the relational model (coming up next) which is used for storage and processing
Relational Model Attributes • Example of tabular data in the relational model customer- street customer- city account- number customer- name Customer-id Johnson Smith Johnson Jones Smith 192-83-7465 019-28-3746 192-83-7465 321-12-3123 019-28-3746 Alma North Alma Main North A-101 A-215 A-201 A-217 A-201 Palo Alto Rye Palo Alto Harrison Rye
Data Organization • Data Storage Where can data be stored? • Main memory • Secondary memory (hard disks) • Optical storage (DVDs) • Tertiary ទីបីstore (tapes) • Move data? Determined by buffer manager • Mapping data to files? Determined by file manager
Database Architecture(data organization) DBA DDL Commands DDL Interpreter File Manager Buffer Manager Storage Manager Metadata Data Secondary Storage Schema
Data retrieval • Queries Query = Declarative data retrieval describes what data, not how to retrieve it Ex. Give me the students with GPA > 3.5 vs Scan the student file and retrieve the records with gpa>3.5 • Why? • Easier to write • Efficient to execute (why?)
Data retrieval Query Query Optimizer Query Evaluator Plan Query Processor Data • Query Optimizerឪ្យធ្វើបានល្អ • “compiler” for queries (aka “DML Compiler”) • Plan ~ Assembly Language Program • Optimizer Does Better With Declarativeការប្រកាសប្រាប់ Queries: • 1. Algorithmic Query (e.g., in C) Þ 1 Plan to choose from • 2. Declarative Query (e.g., in SQL) Þ n Plans to choose from
SQL • SQL: widely used (declarative) non-procedural language • E.g. find the name of the customer with customer-id 192-83-7465selectcustomer.customer-namefromcustomerwherecustomer.customer-id = ‘192-83-7465’ • E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465selectaccount.balancefromdepositor, accountwheredepositor.customer-id = ‘192-83-7465’ anddepositor.account-number = account.account-number • Procedural languages: C++, Java, relational algebra
Data retrieval: Indexing • How to answer fast the query: “Find the student with SID = 101”? • One approach is to scan the student table, check every student, retrurn the one with id=101… very slow for large databases • Any better idea? 1st keep student record over the SID. Do a binary search…. Updates… 2nd Use a dynamic search tree!! Allow insertions, deletions, updates and at the same time keep the records sorted! In databases we use the B+-tree (multiway search tree) 3rd Use a hash table. Much faster for exact match queries… but cannot support Range queries. (Also, special hashing schemes are needed for dynamic data)
B+Tree Example B=4 Root 120 150 180 30 100 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35
User Query Statistics Database Architecture(data retrieval) DBProgrammer DBA Code w/ embedded queries DDL Commands Query Optimizer DML Precompiler DDL Interpreter Query Evaluator Query Processor File Manager Storage Manager Buffer Manager Secondary Storage Metadata Indices Data Schema
Data IntegrityTransaction processing • Why Concurrent Access to Data must be Managed? John and Jane withdraw $50 and $100 from a common account… Initial balance $300. Final balance=? It depends… Jane: 1. get balance 2. if balance > $100 3. balance = balance - $100 4. update balance John: 1. get balance 2. if balance > $50 3. balance = balance - $50 4. update balance
System crashes…. Data IntegrityRecovery Transfer $50 from account A ($100) to account B ($200) 1. get balance for A 2. If balanceA > $50 3. balanceA = balanceA – 50 4.Update balanceA in database 5. Get balance for B 6. balanceB = balanceB + 50 7. Update balanceB in database Recovery management
Transaction Manager Database Architecture DBProgrammer DBA User Code w/ embedded queries DDL Commands Query Query Optimizer DML Precompiler DDL Interpreter Query Evaluator Query Processor File Manager Recovery Manager Buffer Manager Storage Manager Metadata Indices Secondary Storage Data Integrity Constraints Statistics Schema
Outline • 1st half of the course: application-oriented • How to develop database applications: User + DBA • 2nd part of the course: system-oriented • Learn the internals of a relational DBMS (developer for Oracle..)