420 likes | 691 Views
DB and DBMS. Adapted from Silberschatz, Korth and Sudarshan. Summary. What is a Database (DB)? Data Models DB Schemas and Instances Database Management System (DBMS) DB Design Centralized DBMS Architecture DBMS Users and Administrators Advanced DBMS Architectures.
E N D
DB and DBMS Adapted from Silberschatz, Korth and Sudarshan
Summary • What is a Database (DB)? • Data Models • DB Schemas and Instances • Database Management System (DBMS) • DB Design • Centralized DBMS Architecture • DBMS Users and Administrators • Advanced DBMS Architectures
What is a Database (DB)? • A very large, integrated collection of data • Models real-world enterprise • Set of Entities (e.g., Student, Course) • Relationships (e.g., Student is Taking Course) • Associations (e.g., Madonna is Taking CS564)
(Logical) Data Models • A collection of tools, at a high abstraction level, for describing • Data manipulation • Data definition, data updating, data querying • Data relationships • Data constraints • Examples of data models • Entity-Relationship (ER) • Relational • Object-based (Object-oriented and Object-relational) • Semistructured (XML) • Other older models: • Network model • Hierarchical model
ER Model • Models an enterprise as a collection of entities and relationships • Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects • Described by a set of attributes • Relationship: an association among several entities • Represented diagrammatically by an entity-relationship diagram:
ER Model (Cont.) • The ER model is static • It does not support operations and data querying • Other weakness • Atomic attributes
Relational Model Attributes • Example of tabular data in the relational model
Relational Model • Weaknesses • Atomic columns • Does not explicitly support relationships
Relational Data Manipulation Language (DML) • Language for defining, updating and querying the data organized as relational tables • SQL is the most widely used DML language • Data Definition Language (DDL) • Data Updating Language • Data Query Language • Declarative: user specifies what data is required without specifying how to get those data • Select … From … Where … paradigm • Procedural Language • User specifies how (control) to get data
SQL DDL • Specification notation for defining the database Example: create tableaccount(account-number primary keychar(13),balancereal) • Data constraints • Domain constraints (char, real) • Key integrity (primary key) • Referential integrity (references) • Assertions check(balance > 0.0) • Authorization
SQL DDL: Assertion Example • Every loan has at least one borrower who maintains an account with a minimum balance or $1000.00 create assertion balance_constraint check (not exists ( select * from loanwhere not exists ( select * from borrower, depositor, accountwhere loan.loan_number = borrower.loan_numberand borrower.customer_name = depositor.customer_nameand depositor.account_number = account.account_numberand account.balance >= 1000)))
SQL DDL: Authorization Forms of authorization on parts of the database: • Read - allows reading, but not modification of data. • Insert - allows insertion of new data, but not modification of existing data. • Update - allows modification, but not deletion of data. • Delete - allows deletion of data. Forms of authorization to modify the database schema (covered in Chapter 8): • Index - allows creation and deletion of indices. • Resources - allows creation of new relations. • Alteration - allows addition or deletion of attributes in a relation. • Drop - allows deletion of relations.
SQL Data Updating Language • Insert • insert Into account values (‘1951-150737-7’, 2500.00) • Update • update account set balance = 2600.00 where account-number = ‘1951-150737-7’ • Delete • delete from account where account-number = ‘1951-150737-7’
SQL Query Language • Examples • Find the name of the customer with customer-id 192-83-7465select customer.customer_namefrom customerwherecustomer.customer_id = ‘192-83-7465’ • Find the balances of all accounts held by the customer with customer-id 192-83-7465selectaccount.balancefromdepositor, accountwheredepositor.customer_id = ‘192-83-7465’ anddepositor.account_number = account.account_number • Application programs generally access databases through one of • Language extensions – host languages -- to allow embedded SQL • SQL procedural languages • Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database
SQL Procedural Languages • MS Transact SQL • Oracle PL/SQL • Standard?
Procedural Constructs • For loop • Permits iteration over all results of a query • Example: find total of all balances at the Perryridge branchdeclare bal real default 0.0;for r as select balance from accountwhere branch_name = ‘Perryridge’do set bal = bal + r.balanceend for
Object Model • ODMG standard • ODL: Object Definition Language • Persistent classes • Class attributes • Non-atomic attributes • Class relationships • Class methods • Class constraints • Class repositories • OQL: Object Query Language • Object querying • Declarative
Object-Relational (OR) Model • Extend the relational data model by including object orientation and constructs to deal with added data types • Allow attributes of tuples to have complex types, including non-atomic values such as nested relations • Preserve relational foundations, in particular the declarative access to data, while extending modeling power • Provide upward compatibility with existing relational languages
OR Model: Non-1NF Relation • Example: library information system • Each book has • title, • a set of authors, • Publisher, and • a set of keywords
OR Model: Structured Types and Inheritance • Structured types can be declared and used in SQL create type Name as(firstnamevarchar(20),lastname varchar(20))final create type Address as (street varchar(20),city varchar(20),zipcode varchar(20)) not final • Note: final and not final indicate whether subtypes can be created • Structured types can be used to create tables with composite attributes create table customer ( name Name, address Address, dateOfBirthdate) • Dot notation used to reference components: name.firstname
OR Model: Structured Types (cont.) create typeCustomerTypeas ( name Name, address Address, dateOfBirth date, <instance_methods)> ) not final • Can then create a table whose rows are a user-defined type create tablecustomerofCustomerType
OR Model: Methods • Can add a method declaration with a structured type methodageOnDate (onDatedate) returns integer • Method body is given separately createinstance methodageOnDate (onDatedate) returns integer forCustomerType begin returnonDate - self.dateOfBirth; end • We can now find the age of each customer: selectname.lastname, ageOnDate (current_date) fromcustomer
OR Model: Inheritance • Suppose that we have the following type definition for people create typePerson(name varchar(20),address varchar(20)) [not final] • Using inheritance to define the student and teacher types create typeStudentunder Person(degree varchar(20),department varchar(20))create type Teacherunder Person(salary integer,department varchar(20)) • Subtypes can redefine methods by using overriding method in place of method in the method declaration
OR Model: Array and Multiset Types • Example of array and multiset declaration: create type Publisher as(namevarchar(20),branch varchar(20))create type Book as (title varchar(20),author-array varchar(20) array [10],pub-date date,publisher Publisher,keyword-set varchar(20) multiset ) create table books ofBook
OR Model: Querying Collection-Valued Attributes • To find all books that have the word “database” as a keyword select titlefrom bookswhere ‘database’ in (unnest(keyword-set )) • We can access individual elements of an array by using indices • E.g.: If we know that a particular book has three authors, we could write select author-array[1], author-array[2], author-array[3]from bookswhere title = `Database System Concepts’
Comparison of O-O and O-R Databases • Relational systems • simple data types, powerful query languages, high protection • Persistent-programming-language-based OODBs • complex data types, integration with programming language, high performance • Object-relational systems • complex data types, powerful query and procedural languages, high protection • Performance? • It depends on the DBMS Administrator • Note: Many real systems blur these boundaries • E.g. persistent programming language built as a wrapper on a relational database offers first two benefits, but may have poor performance
XML: Extensible Markup Language • Defined by the WWW Consortium (W3C) • Originally intended as a document markup language not a database language • The ability to specify new tags, and to create nested tag structures made XML a great way to exchange data, not just documents • XML has become the basis for all new generation data interchange formats. • A wide variety of tools is available for parsing, browsing and querying XML documents/data • Example of XML DBMS • Tamino
Schemas and Instances • Schema – the structure of the database at a certain abstraction level • Types of schema • Logical schema: database design according to an underlying logical data model • Example of an ER schema: the database consists of information about an entity set customer and another entity set account and the relationship between them • Other types of schema: relational schema, object schema, OR schema • Relational and OR schemas: SQL DDL • Physical schema: database design at the physical level • Example: the set of customers is an indexed-sequential data file • Instance – the actual content of the database at a particular point in time • Physical Data Independence – the ability to modify the physical schema without changing the logical schema • A fundamental requirement for database management systems • Applications depend on the logical schema • In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others
Database Management System (DBMS) • DBMS is a software that supports • Database • Set of programs to access the database • An environment that is both convenient and efficient to use • Benefits of using DBMS • No data redundancy • No duplication of information in different files • Make data access easy • No need to write a new program to carry out each new task • It becomes easy to guarantee integrity constraints • Integrity constraints (e.g. account balance > 0) become “buried” in program code rather than being stated explicitly • Soft to add new constraints or change existing ones
DBMS (Cont.) • Support for the Transaction concept • Atomicity of updates • Failures do not leave database in an inconsistent state since that partial updates are not carried out • Example: Transfer of funds from one account to another should either complete or not happen at all • Controlled concurrent access by multiple users • Uncontrolled concurrent accesses can lead to inconsistencies • Example: Two people reading a balance and updating it at the same time • Support for Security • Soft to provide user access control to all data • Access Efficiency • Query optimization
DBMS (Cont.) • Families of DBMSs • Relational DBMSs • MS SQL Server 2000 (?), Oracle 7 • Object-relational DBMSs • Oracle 8-10, IBM DB/2
Database Design The process of designing the general structure of the database • Conceptual Design – Deciding on the database schema at a higher abstraction level than relational or object-relational schemas • ER schema, Object schema • Logical Design – Deciding on the logical database schema implemented by DBMSs • R schema, OR schema • Physical Design – Deciding on the physical layout of the R(OR) database • Mappings • Conceptual schema logical schema • Case tools • Manual task • Logical schema physical schema • DBMS
Database Design (Cont.) OO Database Design R Database Design Object schema ER schema This mapping is a big issue R schema OR schema Physical OR schema Physical R schema
Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation
Query Processing (Cont.) • Alternative ways of evaluating a given query • Equivalent expressions • Different algorithms for each operation • Cost difference between a good and a bad way of evaluating a query can be enormous • Need to estimate the cost of operations • Depends critically on statistical information about relations which the database must maintain • Need to estimate statistics for intermediate results to compute cost of complex expressions
Transaction Management • A transaction is a collection of operations that performs a single logical function in a database application • Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures • Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database • Recovery manager assures atomicity of updates
DBMS Users Users are differentiated by the way they expect to interact with the system • Application programmers – interact with system through DML calls • Sophisticated users – form requests in a database query language • Specialized users – write specialized database applications that do not fit into the traditional data processing framework • Naïve users – invoke one of the permanent application programs that have been written previously • Examples, people accessing database over the web, bank tellers, clerical staff
Database Administrator • Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs • Database administrator's duties include • Schema definition • Storage structure and access method definition • Schema and physical organization modification • Granting user authority to access the database • Specifying integrity constraints • Acting as liaison with users • Monitoring performance and responding to changes in requirements
Advanced DBMS Architectures • Client-server • Two-tier • Three-tier • Parallel (multi-processor) • Distributed • Data Grid