2.27k likes | 5.49k Views
What is a database. A database is any organized collection of data. Some examples of databases you may encounter in your daily life are: a telephone book T.V. Guide airline reservation system motor vehicle registration records papers in your filing cabinet
E N D
What is a database A database is any organized collection of data.Some examples of databases you may encounter in your daily life are: • a telephone book • T.V. Guide • airline reservation system • motor vehicle registration records • papers in your filing cabinet • files on your computer hard drive.
Data vs. information: What is the difference? • What is data? • Data can be defined in many ways. Information science defines data as unprocessed information. • What is information? • Information is data that have been organized and communicated in a coherent and meaningful manner. • Data is converted into information, and information is converted into knowledge. • Knowledge; information evaluated and organized so that it can be used purposefully.
Why do we need a database? • Keep records of our: • Clients • Staff • Volunteers • To keep a record of activities and interventions; • Keep sales records; • Develop reports; • Perform research • Longitudinal tracking
What Is a Database System? • A database system is a software system which supports the definition and use of a database. • A Database Management System (DBMS)is a software system designed to store, manage, and facilitate access todatabases. • A database management system (DBMS) such as Access, FileMaker, Lotus Notes, Oracle or SQL Server which provides you with the software tools you need to organize that data in a flexible manner. It includes tools to add, modify or delete data from the database, ask questions (or queries) about the data stored in the database and produce reports summarizing selected contents
Database System Applications • DBMS contains information about a particular enterprise • Collection of interrelated data • Set of programs to access the data • An environment that is both convenient and efficient to use • Database Applications: • Banking: all transactions • Airlines: reservations, schedules • Universities: registration, grades • Sales: customers, products, purchases • Online retailers: order tracking, customized recommendations • Manufacturing: production, inventory, orders, supply chain • Human resources: employee records, salaries, tax deductions • Databases touch all aspects of our lives
What is the purpose of a database management system? Is to transform Data Information Knowledge Action
Purpose of Database Systems (Cont..) • In the early days, database applications were built directly on top of file systems • Drawbacks of using file systems to store data: • Data redundancy and inconsistency • Multiple file formats, duplication of information in different files • Difficulty in accessing data • Need to write a new program to carry out each new task • Data isolation — multiple files and formats • Integrity problems • Integrity constraints (e.g. account balance > 0) become “buried” in program code rather than being stated explicitly • Hard to add new constraints or change existing ones
Purpose of Database Systems (Cont.) • Drawbacks of using file systems (cont.) • Atomicity of updates • Failures may leave database in an inconsistent state with partial updates carried out • Example: Transfer of funds from one account to another should either complete or not happen at all • Concurrent access by multiple users • Concurrent accessed needed for performance • Uncontrolled concurrent accesses can lead to inconsistencies • Example: Two people reading a balance and updating it at the same time • Security problems • Hard to provide user access to some, but not all, data • Database systems offer solutions to all the above problems
Why Use a DBMS? • Data independence and efficient access. • Reduced application development time. • Data integrity and security. • Uniform data administration. • Concurrent access, recovery from crashes.
Types of Databases • Non-relational databases Non-relational databases place information in field categories that we create so that information is available for sorting and disseminating the way we need it. The data in a non-relational database, however, is limited to that program and cannot be extracted and applied to a number of other software programs, or other database files within a school or administrative system. The data can only be "copied and pasted.“ Example: a spread sheet • Relational databases In relational databases, fields can be used in a number of ways (and can be of variable length), provided that they are linked in tables. It is developed based on a database model that provides for logical connections among files (known as tables) by including identifying data from one table in another table
Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB These layers must consider concurrency control and recovery Structure of a DBMS • A typical DBMS has a layered architecture. • The figure does not show the concurrency control and recovery components. • Each database system has its own variations.
View of Data An architecture for a database system
Why Use Models? • Models can be useful when we want to examine or manage part of the real world • The costs of using a model are often considerably lower than the costs of using or experimenting with the real world itself • Examples: • airplane simulator • nuclear power plant simulator • flood warning system • model of US economy • model of a heat reservoir • map
Data Model A data model consists of notations for expressing: • data structures • integrity constraints • operations
FLIGHT-SCHEDULE DEPT-AIRPORT FLIGHT# AIRPORT-CODE FLIGHT# AIRLINE WEEKDAY PRICE 101 delta mo 156 101 atl 545 american we 110 912 cph 912 scandinavian fr 450 545 lax 242 usair mo 231 Data Model - Data Structures All data models have notation for defining: • attribute types • entity types • relationship types
FLIGHT-SCHEDULE DEPT-AIRPORT FLIGHT# AIRPORT-CODE FLIGHT# AIRLINE WEEKDAY PRICE 101 delta mo 156 101 atl 545 american we 110 912 cph 912 scandinavian fr 450 545 lax 242 usair mo 231 242 bos Data Model - Constraints Constraints express rules that cannot be expressed by the data structures alone: • Static constraints apply to database state • Dynamic constraints apply to change of database state • E.g., “All FLIGHT-SCHEDULE entities must have precisely one DEPT-AIRPORT relationship
FLIGHT-SCHEDULE DEPT-AIRPORT FLIGHT# AIRPORT-CODE FLIGHT# AIRLINE WEEKDAY PRICE 101 delta mo 156 101 atl 545 american we 110 912 cph 912 scandinavian fr 450 545 lax 242 usair mo 231 242 bos 97 delta tu 258 97 atl Data Model - Operations Operations support change and retrieval of data: • insert FLIGHT-SCHEDULE(97, delta, tu, 258); insert DEPT-AIRPORT(97, atl); • select FLIGHT#, WEEKDAY from FLIGHT-SCHEDULE where AIRLINE=‘delta’;
FLIGHT-SCHEDULE DEPT-AIRPORT FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE 101 delta mo 156 101 atl 545 american we 110 912 cph 912 scandinavian fr 450 545 lax 242 usair mo 231 242 bos Keys and Identifiers Keys (or identifiers) are uniqueness constraints • A key on FLIGHT# in FLIGHT-SCHEDULE will force all FLIGHT#’s to be unique in FLIGHT-SCHEDULE • Consider the following keys on DEPT-AIRPORT: FLIGHT# AIRPORT-CODE FLIGHT# AIRPORT-CODE FLIGHT# AIRPORT-CODE FLIGHT# AIRPORT-CODE
DEPT-AIRPORT FLIGHT-SCHEDULE FLIGHT# AIRPORT-CODE FLIGHT# AIRLINE WEEKDAY PRICE 101 atl 101 delta mo 156 912 cph 545 american we 110 545 lax 912 scandinavian fr 450 242 bos 242 usair mo 231 Integrity and Consistency • Integrity: does the model reflect reality well? • Consistency: is the model without internal conflicts? • a FLIGHT# in FLIGHT-SCHEDULE cannot be null because it models the existence of an entity in the real world • a FLIGHT# in DEPT-AIRPORT must exist in FLIGHT-SCHEDULE because it doesn’t make sense for a non-existing FLIGHT-SCHEDULE entity to have a DEPT-AIRPORT
Data Models • A collection of tools for describing • Data • Data relationships • Data semantics • Data constraints • Relational model • The relational model of datais the most widely used model today. • Main concept: relation, basically a table with rows and columns. • Every relation has a schema, which describes the columns, or fields. • Entity-Relationship data model (mainly for database design) No database system is based on the model • Object-based data models (Object-oriented and Object-relational) • Semi structured data model (XML) • Other older models: • Network model • Hierarchical model
Relational Model • Commercial systems include: ORACLE, DB2, SYBASE, INFORMIX, INGRES, SQL Server. • Dominates the database market on all platforms • Example of tabular data in the relational model Attributes
flight-schedule customer flight# customer# customer name p p reservation flight# date customer# Relational Model - Integrity Constraints • Keys • Primary Keys • Entity Integrity • Referential Integrity
Relational Model - Operations • Powerful set-oriented query languages • Relational Algebra: procedural; describes how to compute a query; operators like JOIN, SELECT, PROJECT • Relational Calculus: declarative; describes the desired result, e.g. SQL, QBE • insert, delete, and update capabilities
reservation customer customer-name customer# flight# date customer# _c _c .P .P LEO Relational Model - Operations • tuple calculus example (SQL) select flight#, date from reservation R, customer C where R.customer#=C.customer# and customer-name=‘LEO’; • algebra example (ISBL) ((reservation join customer) where customer-name=‘LEO’) [flight#, date];
Data Modeling DATABASE SYSTEM • REALITY • structures • processes MODEL data modeling • The model represents a perception of structures of reality • The data modeling process is to fix a perception of structures of reality and represent this perception • In the data modeling process we select aspects and we abstract
Process Modeling DATABASE SYSTEM process modeling • REALITY • structures • processes MODEL • The use of the model reflects processes of reality • Processes may be represented by programs with embedded database queries and updates • Processes may be represented by ad-hoc database queries and updates at run-time
Database Design The purpose of database design is to create a database which • is a model of structures of reality • supports queries and updates modeling processes of reality • runs efficiently
Instances and Schemas external schema1 external schema2 external schema3 • Instance – the actual content of the database at a particular point in time conceptual schema • external schema: • use of data • conceptual schema: • meaning of data • internal schema: • storage of data internal schema database
View 1 View 2 View 3 Conceptual Schema Physical Schema DB Example: University Database • Conceptual schema: • Students(sid: string, name: string, login: string, age: integer, gpa:real) • Courses(cid: string, cname:string, credits:integer) • Enrolled(sid:string, cid:string, grade:string) • External Schema (View): • Course_info(cid:string,enrollment:integer) • Physical schema: • Relations stored as unordered files. • Index on first column of Students.
View 1 View 2 View 3 Conceptual Schema Physical Schema DB Levels of Abstraction Users • Views describe how users see the data. Application programs hide details of data types. • Conceptual schema defines logical structure • Physical schema describes how a record (e.g., customer) is stored. Describes the files and indexes used.
View 1 View 2 View 3 Conceptual Schema Physical Schema DB Data Independence • Applications insulated from how data is structured and stored. • Logical data independence: Protection from changes in logical structure of data. • Physical data independence: Protection from changes in physical structure of data. The ability to modify the physical schema without changing the logical schema • Applications depend on the logical schema • In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.
Selecting a Database Management System Database management systems (or DBMSs) can be divided into two categories -- desktop databases and server databases. • Generally speaking, desktop databases are oriented toward single-user applications and reside on standard personal computers (hence the term desktop). • Server databases contain mechanisms to ensure the reliability and consistency of data and are geared toward multi-user applications.
Database Users Users are differentiated by the way they expect to interact with the system • Application programmers – interact with system through DML calls • Sophisticated users – form requests in a database query language • Specialized users – write specialized database applications that do not fit into the traditional data processing framework • Naïve users – invoke one of the permanent application programs that have been written previously • Examples, people accessing database over the web, bank tellers, clerical staff
Database Administrator • Coordinates all the activities of the database system • has a good understanding of the enterprise’s information resources and needs. • Database administrator's duties include: • Storage structure and access method definition • Schema and physical organization modification • Granting users authority to access the database • Backing up data • Monitoring performance and responding to changes • Database tuning
Functionality of a DBMS The programmer sees SQL, which has two components: • Data Definition Language - DDL • Data Manipulation Language - DML • query language Behind the scenes the DBMS has: • Query engine • Query optimizer • Storage management • Transaction Management (concurrency, recovery)
Data Definition Language (DDL) • Specification notation for defining the database schema Example: create tableaccount (account_numberchar(10), branch_name char(10), balanceinteger) • DDL compiler generates a set of tables stored in a data dictionary • Data dictionary contains metadata (i.e., data about data) • Database schema • Data storage and definition language • Specifies the storage structure and access methods used • Integrity constraints • Domain constraints • Referential integrity (e.g. branch_name must correspond to a valid branch in the branch table) • Authorization
Data Manipulation Language (DML) • Language for accessing and manipulating the data organized by the appropriate data model • DML also known as query language • Two classes of languages • Procedural – user specifies what data is required and how to get those data • Declarative (nonprocedural) – user specifies what data is required without specifying how to get those data • SQL is the most widely used query language
Transactions: ACID Properties • Key concept is a transaction: asequence of database actions (reads/writes). • DBMS ensures atomicity(all-or-nothing property) even if system crashes in the middle of a Xact. • Each transaction, executed completely, must take the DB between consistent states or must not run at all. • DBMS ensures that concurrent transactions appear to run in isolation. • DBMS ensures durability of committed Xacts even if system crashes. • Idea: Keep a log(history) of all actions carried out by the DBMS while executing a set of Xacts: • Before a change is made to the database, the corresponding log entry is forced to a safe location. • After a crash, the effects of partially executed transactions are undone using the log. Effects of committed transactions are redone using the log.
How the Programmer Sees the DBMS • Start with DDL to create tables: • Continue with DML to populate tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) . . . INSERT INTO Students VALUES(‘Charles’, ‘123456789’, ‘undergraduate’) . . . .
How the Programmer Sees the DBMS • Tables: Students: Takes: Courses: Still implemented as files, but behind the scenes can be quite complex “data independence” = separate logical view from physical implementation
Transactions • Enroll “Mary Johnson” in “CSE444”: BEGIN TRANSACTION; INSERT INTO Takes SELECT Students.SSN, Courses.CID FROM Students, Courses WHERE Students.name = ‘Mary Johnson’ and Courses.name = ‘CSE444’ -- More updates here.... IF everything-went-OK THEN COMMIT; ELSE ROLLBACK If system crashes, the transaction is still either committed or aborted
Advantages of a DBMS • Data independence • Efficient data access • Data integrity & security • Data administration • Concurrent access, crash recovery • Reduced application development time • So why not use them always? • Expensive/complicated to set up & maintain • This cost & complexity must be offset by need • General-purpose, not suited for special-purpose tasks (e.g. text search!)
Do not use a DBMS when Use a DBMS when this is important • the initial investment in hardware, software, and training is too high • the generality a DBMS provides is not needed • the overhead for security, concurrency control, and recovery is too high • data and applications are simple and stable • real-time requirements cannot be met by it • multiple user access is not needed • persistent storage of data • centralized control of data • control of redundancy • control of consistency and integrity • multiple user support • sharing of data • data documentation • data independence • control of access and security • backup and recovery