450 likes | 528 Views
Chapter 1: Introduction and Basic concepts ( [S] chp . 1). Purpose of Database Systems View of Data Data Models Data Definition Language Data Manipulation Language Transaction Management Storage Management Database Administrator Database Users Overall System Structure.
E N D
Chapter 1: Introduction and Basic concepts ( [S] chp. 1) • Purpose of Database Systems • View of Data • Data Models • Data Definition Language • Data Manipulation Language • Transaction Management • Storage Management • Database Administrator • Database Users • Overall System Structure
DATABASE DEFINITION • A database represents some aspect of the real world, sometimes called the mini-world or the Universe of Discourse (UoD). • A database is a logically coherent collection of data with some inherit meaning. • A random assortment of data cannot correctly be referred to as a database. • A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested
What Is a Database? • A very large, integrated collection of data. • Models real-world enterprise. • Entities (e.g., students, courses) • Relationships (e.g., Madonna is taking CS564) • A Database Management System (DBMS)is a software package designed to store and manage databases.
Database Management System (DBMS) • Collection of interrelated data • Set of programs to access the data • DBMS provides an environment that is both convenient and efficient to use. • Database Applications: • Banking: all transactions • Airlines: reservations, schedules • Universities: registration, grades • Sales: customers, products, purchases • Manufacturing: production, inventory, orders, supply chain • Human resources: employee records, salaries, tax deductions • Databases touch all aspects of our lives
Purpose of Database System • In the early days, database applications were built on top of file systems • Drawbacks of using file systems to store data: • Data redundancy and inconsistency • Multiple file formats, duplication of information in different files • Difficulty in accessing data • Need to write a new program to carry out each new task • Data isolation — multiple files and formats • Integrity problems • Integrity constraints (e.g. account balance > 0) become part of program code • Hard to add new constraints or change existing ones
Purpose of Database Systems (Cont.) • Drawbacks of using file systems (cont.) • Atomicity of updates • Failures may leave database in an inconsistent state with partial updates carried out • E.g. transfer of funds from one account to another should either complete or not happen at all • Concurrent access by multiple users • Concurrent accessed needed for performance • Uncontrolled concurrent accesses can lead to inconsistencies • E.g. two people reading a balance and updating it at the same time • Security problems • Database systems offer solutions to all the above problems
Why Use a DBMS? • Separation of the Data definition and the Program • Abstraction into a simple model • Data independence and efficient access. • Reduced application development time – ad-hoc queries • Data integrity and security. • Uniform data administration. • Concurrent access, recovery from crashes. • Support for multiple different views
Why Study Databases?? ? • Shift from computation to information • at the “low end”: scramble to webspace (a mess!) • at the “high end”: scientific applications • Datasets increasing in diversity and volume. • Digital libraries, interactive video, Human Genome project, EOS project • ... need for DBMS exploding • DBMS encompasses most of CS • OS, languages, theory, “AI”, multimedia, logic
Levels of Abstraction • Many views, single conceptual (logical) schemaand physical schema. • Views describe how users see the data. • Conceptual schema defines logical structure. Sometime we separate between conceptual level and logical level • Physical schema describes the files and indexes used. View 1 View 2 View 3 Conceptual Schema Physical Schema • Schemas are defined using DDL (Data Definition Language) • data is modified/queried using DML (Data Manipulation Language)
Levels of Abstraction • Physical level describes how a record (e.g., customer) is stored. • Logical level: describes data stored in database, and the relationships among the data. type customer = recordname : string;street : string;city : integer;end; • View level: application programs hide details of data types. Views can also hide information (e.g., salary) for security purposes.
Instances and Schemas • Similar to types and variables in programming languages • Schema – the logical structure of the database • e.g., the database consists of information about a set of customers and accounts and the relationship between them) • Analogous to type information of a variable in a program • Physical schema: database design at the physical level • Logical schema: database design at the logical level • Instance – the actual content of the database at a particular point in time • Analogous to the value of a variable
Data Models • A collection of modeling tools for describing • data • data relationships • data semantics • data constraints • Entity-Relationship model • Relational model • Other models: • object-oriented model • semi-structured data models (XML) • Older models: network model and hierarchical model
Entity-Relationship Model Example of schema in the entity-relationship model
Entity Relationship Model (Cont.) • E-R model of real world • Entities (objects) • E.g. customers, accounts, bank branch • Relationships between entities • E.g. Account A-101 is held by customer Johnson • Relationship set depositor associates customers with accounts • Widely used for database design • Database design in E-R model usually converted to design in the relational model (coming up later) which is used for storage and processing
Relational Model Attributes • Example of tabular data in the relational model customer- street customer- city account- number customer- name Customer-id Johnson Smith Johnson Jones Smith 192-83-7465 019-28-3746 192-83-7465 321-12-3123 019-28-3746 Alma North Alma Main North A-101 A-215 A-201 A-217 A-201 Palo Alto Rye Palo Alto Harrison Rye
Physical (Storage) schema decisions • Mapping of entities to files (OS files) • Data representation and encoding (compression) • Access methods (Direct, Hashing, Indexed) • Which indexes to maintain • Clustering of records • OS/DBMS issues (buffer management)
External (View) schema decisions • Which entities to present/filter • Data representation and encoding (compression) • Programming language dependent issues • Changes to names, order of attributes • Derived (computed) fields and joined tables
Two views derived from the example database (*) Not relational…
Data Independence • Physical Data Independence – the ability to modify the physical schema without changing the application programs • Applications depend on the logical schema • DBA may change physical level (tuning) without affecting applications • The DBMS automatically make the required adjustments, and application programs are not changed (queries may need to be recompiled and optimized…) • Logical Data Independence – the ability to modify the logical schema without changing the application programs • Applications depend on the logical schema via the Views • Can be supported on a limited basis only (if view is not affected)
Data Definition Language (DDL) • Specification notation for defining the database schema • E.g. create tableaccount (account-numberchar(10),balanceinteger) • DDL compiler generates a set of tables stored in a data dictionary • Data dictionary contains metadata (i.e., data about data) • database schema • Data storage and definition language • language in which the storage structure and access methods used by the database system are specified • Usually an extension of the data definition language
Data Manipulation Language (DML) • Language for accessing and manipulating the data organized by the appropriate data model • A declarative DML is also known as query language • Two classes of languages • Procedural – user specifies what data is required and how to get those data (DML) • Nonprocedural – user specifies what data is required without specifying how to get those data (Query language) • SQL is the most widely used query language
SQL • SQL: widely used non-procedural language • E.g. find the name of the customer with customer-id 192-83-7465selectcustomer.customer-namefromcustomerwherecustomer.customer-id = ‘192-83-7465’ • E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465selectaccount.balancefromdepositor, accountwheredepositor.customer-id = ‘192-83-7465’ anddepositor.account-number = account.account-number • Application programs generally access databases through one of • Language extensions to allow embedded SQL • Application program interface (e.g. ODBC/JDBC) which allow SQL queries to be sent to a database
Database Users • Users are differentiated by the way they expect to interact with the system • Application programmers – interact with system through DML calls • Sophisticated users – form requests in a database query language • Specialized users – write specialized database applications that do not fit into the traditional data processing framework • Naïve users – invoke one of the permanent application programs that have been written previously • E.g. people accessing database over the web, bank tellers, clerical staff
Database Administrator • Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs. • Database administrator's duties include: • Schema definition • Storage structure and access method definition • Schema and physical organization modification • Granting user authority to access the database • Specifying integrity constraints • Acting as liaison with users • Monitoring performance and responding to changes in requirements
Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Structure of a DBMS These layers must consider concurrency control and recovery • A typical DBMS has a layered architecture. • The figure does not show the concurrency control and recovery components. • This is one of several possible architectures; each system has its own variations.
THE TRANSACTION CONCEPT Transfer money from: account A to: account B Begin Transaction SUBTRACT 100 FROM A ADD 100 TO B CRASH! End Transaction Abort, Commit, Rollback
The concurrency concept AGENT 1 AGENT 2 READ # SEATS # SEATS = #SEATS – 1 WRITE # SEATS READ # SEATS # SEATS = SEATS –1 WRITE # SEATS LOST UPDATE! Solution: Two-Phase locking
Storage Management • Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. • The storage manager is responsible to the following tasks: • interaction with the file manager • efficient storing, retrieving and updating of data
Concurrency Control • Concurrent execution of user programs is essential for good DBMS performance. • Because disk accesses are frequent, and relatively slow, it is important to keep the cpu humming by working on several user programs concurrently. • Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. • DBMS ensures such problems don’t arise: users can pretend they are using a single-user system.
Transaction Management • A transaction is a collection of operations that performs a single logical function in a database application • Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures. • Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.
Transaction: An Execution of a DB Program • Key concept is transaction, which is an atomicsequence of database actions (reads/writes). • Each transaction, executed completely, must leave the DB in a consistent stateif DB is consistent when the transaction begins. • Users can specify some simple integrity constraintson the data, and the DBMS will enforce these constraints. • Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed). • Thus, ensuring that a transaction (run alone) preserves consistency is ultimately the user’s responsibility!
Scheduling Concurrent Transactions • DBMS ensures that execution of {T1, ... , Tn} is equivalent to some serial execution T1’ ... Tn’. • Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction. (Strict 2PL locking protocol.) • Idea: If an action of Ti (say, writing X) affects Tj (which perhaps reads X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions. • What if Tj already has a lock on Y and Ti later requests a lock on Y? (Deadlock!) Ti or Tj is abortedand restarted!
The importance of the Data Dictionary • Contains all definitions: DDL (logical schema), Views definition, Physical schema definitions including Indexing and clustering information, Integrity constraints, security rules, stored procedures (SQL) • Essential for query parsing and optimization • Contains other important documentation and programs (regulations, standards, codes, etc.) • There are companies who sell Data Dictionary tools as a separate product!
DATABASE UTILITIES • Logical Design and Data-Dictionary Tools • Loading • Physical Design and File reorganization • Backup / Restore / Recovery • Performance Monitoring and Tuning
Application Architectures • Two-tier architecture: E.g. client programs using ODBC/JDBC to communicate with a database • Three-tier architecture: E.g. web-based applications, and applications built using “middleware”
DBMS TYPES • Hierarchical – Pre-historic – IMS • Network – Historic –IDMS, ADABAS, lead to Object- Oriented • RELATIONAL- current – 95% of the market – Oracle, Informix, SQL/ Server, Progress, IBM DB2, etc. • Object- ORIENTED Current – lot of HuHa but very narrow market, mainly CAD AND Engineering – Objectivity, Versant, Jasmine • Object – Relational- Current / Future – SQL3, Informix UDO , Oracle-9, IBM DB2. • XML – not much commercial success as a Database, in-spite of much research • Cloud and NOSQL databases
Database systens: a brief time line EVENT: PRE-1960S 1945-magnetic tapes developed (the first medium to allow searching). 1957- First commercial computer installed. 1959- McGee proposed the notion of generalized access to electronically stored data. THE 60s 1961- The first generalized DBMS-GEs Integrated Data Store (IDS) designed by Bachman. THE 70s – database technology experienced rapid growth. 1970- The relational model is developed by Ted Codd, an IBM research fellow. 1971- CODASYL Database Task Group Report. 1975- ACM Special Interest Group on Management of data organized first SIGMOD international conference. 1976- Entity- relationship (ER)model introduced by chen. THE 80s- DBMSs developed for personal computers (DBASE, PARADOX, etc). 1983- ANSI/SPARC survey revealed>100 relational systems had been implemented by the beginning of the 80s.
Database systens: a brief time line EVENT: • 1985- Preliminary SQL standard published. Business world influenced by “Fourth Generation Languages”. • *Trends in the ‘80s: extendable database systems:object- oriented DBMSs, client server architecture for distributed database. • The ’90s • * Demand for extending DBMS capabilities to meet new applications. • * Emergence of commercial object- oriented DBMSs. • * Demand for exploiting massively parallel processors (MPPs). • Total victory by the relational model • SQL 3 • Object relational systems. • The ’00s • The emergence of XML and the integration of XML and Relational databases • Web databases, Search engines, Semantic web • Cloud and NOSQL Databases
Databases make these folks happy ... • End users and DBMS vendors • DB application programmers • E.g. smart webmasters • Database administrator (DBA) • Designs logical /physical schemas • Handles security and authorization • Data availability, crash recovery • Database tuning as needs evolve Must understand how a DBMS works!
Summary • DBMS used to maintain, query large datasets. • Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. • Levels of abstraction give data independence. • A DBMS typically has a layered architecture. • DBAs hold responsible jobs and are well-paid! • DBMS R&D is one of the broadest, most exciting areas in CS. • Advanced databases course at the graduate level