200 likes | 321 Views
CS 430 Database Theory. Winter 2005 Lecture 1: Introduction. What’s a Database. “Collection” of “related” “data” Contains data about some aspect of the “real world” Refers to a Universe of Discourse (UoD) A “logically coherent” collection of data Has a “specific purpose”.
E N D
CS 430Database Theory Winter 2005 Lecture 1: Introduction
What’s a Database • “Collection” of “related” “data” • Contains data about some aspect of the “real world” • Refers to a Universe of Discourse (UoD) • A “logically coherent” collection of data • Has a “specific purpose”
Typical Characteristics of Databases(1 of 3) • “Large” • Typically bigger than a spreadsheet • May be very large • Example, IRS Tax return database: • About 200M returns per year, 5 year retention • About 1K-10K bytes per return (guess) • About 1 – 10 Terabytes (without overhead) • Shared • More than single user and single application
Typical Characteristics (2 of 3) • Structured • More than a simple flat table • Self describing • Contains Metadata (data about data) describing the data contained in the database • Metadata maintained separately from applications that use and manipulate the data • Has a Catalog which is a “database” of the Metadata
Typical Characteristics (3 of 3) • Supports multiple views of the data • Different users and applications can view the data differently • ACID properties • Atomicity – Atomic transactions (updates are all or nothing) • Consistency – Enforces integrity constraints • Isolation – Transactions are isolated from each other • Durability – Data from completed transactions is never lost
A Little History of Databases (1 of 3) • Mid to late 1960s - first databases • Applications • Maintain parts data for Lunar Lander • Airline reservations • Multiple data models • Hierarchical, Network, Inverted File System • Early, mid 1970s - Relational data model • Edgar Codd – Father of Relation database • Basis for SQL (Structured or Standard Query Language)
History (2 of 3) • 1979 – Oracle Version 2 • Initial version (marketing decision) • Incomplete and slow • Late 1980s – IBM DB2 Version 1 • Used to define the SQL standard • Late 1980s – Object Oriented databases • Created to manage data for “non-traditional” applications
History (3 of 3) • 1990s – Object Relational Databases • Pioneered by Michael Stonebraker • Today • Dominant technology: Relational DBMS (RDBMS) • Oracle, MS SQL Server, IBM DB2, … • MySQL, PostgreSQL, … • OO capabilities being added to RDBMS • New: Object-Relational Mapping Software • Try to handle “impedance mismatch” between RDBMS and OO programming languages
Database Applications (1 of 2) • Traditional • Business applications • Personnel, accounting, ... • Student and Course data • Traditional data types • Numbers, strings, dates • Data warehousing • Large “historical” databases for analytic support • Manufacturing Control • Real-time issues
Database Applications (2 of 2) • Non-traditional • Image and Video • GIS (Geographic Information Systems) • Engineering • CAD (Computer Aided Drafting or Design) • Time Series • Stock market data • Full text search • Environmental and Remote Sensing
Data Base Management System (DBMS) • Software that manages and or facilitates • Data definition • E.g. creating and maintaining the catalog • Data construction • E.g. loading data into the database • Data manipulation • Applications retrieving and updating the database • Data sharing • ACID properties
DBMS In Context Users/Programmers Application Programs External Queries Database System DBMS Software Query Processing Application Program Interface Access/Update Stored Data Metadata Catalog The Data Elmasri and Navathe, Figure 1.1, Page 6
Database People (Actors) (1 of 2) • Data Administrator • Responsible for correctness of the data • Database Administrator • Configure DBMS, manage data storage, DBMS performance tuning • Database Designer • Design the database • All three of these may be same person or group of people
Database People (2 of 2) • Application Analysts and Developers • Responsible for analyzing, designing, building, and maintaining database applications • End Users • Use the database to accomplish useful work
Why use a DBMS? (1 of 2) • Manage redundancy • If the same data is stored multiple times (often enough, without periodic reconciliation) it is guaranteed to be inconsistent • Access Control • Not all the users can view and/or update all the data • Persistent storage of program data • Rather than having to implement your own DBMS internal to your application
Why a DBMS? (2 of 2) • Efficiency • DBMS vendors have done a lot of work to make their products work efficiently • Mixed blessing (see “Why not to use a DBMS?”) • Enforce integrity constraints • Defined and enforced once • Share data • Among multiple applications, GUIs, users • ACID Properties • Difficult to implement correctly
Why not to use a DBMS? • Learning curve • “It takes four years to learn to be an Oracle DBA” • Overhead costs (time and space) • Generality • Concurrency and transactions • Multiple application and user access • Complex data structures • Rule of thumb: Using an RDBMS doubles the space required for the data (e.g. versus a text file)
Course Administration • Course web site • http://faculty.cs.wwu.edu/reedyc/CS_430_Winter_2005 • Email: Chris.Reedy@wwu.edu • Textbook • Elmasri, Navathe, Fundamentals of Database Systems, Fourth Edition • Assignments • Use MySQL • Most convenient form of access? • Get hands dirty: • Design a database • Create database and load the data • Write a database application
Course Outline (1 of 2) • Introduction to Databases • Chapters 1 and 2 • Introduction to Data Modeling • Chapter 3 (partial) • Relation Data Model, Algebra, and Calculus • Chapters 5, 6 • Functional Dependencies and Normalization • Chapters 10 and 11 (partial)
Course Outline (2 of 2) • SQL Database Programming • Chapters 8 and 9 • Entity-Relationship Modeling • More of chapters 3, 4, and 7 • Overview: What’s inside a DBMS? • CS530, Chapters 13-19 • Overview of additional topics • Object-Oriented and Object Relational DBMSs • XML in Databases