340 likes | 590 Views
DBMS Software. Week 7 Conceptual Architecture - Ch 4 Software Architecture - Ch 4 Toolkit - Ch 11,12,13 Kernel Ch 14, 15, 16, 17. ANSI/SPARC Architecture. Problem of Data independence In ‘classical systems’ Application programs ‘know’ about how and where data is stored Problems when
E N D
DBMS Software • Week 7 • Conceptual Architecture - Ch 4 • Software Architecture - Ch 4 • Toolkit - Ch 11,12,13 • Kernel Ch 14, 15, 16, 17
ANSI/SPARC Architecture • Problem of Data independence • In ‘classical systems’ • Application programs ‘know’ about how and where data is stored • Problems when • additional data stored but not of relevance to prog • data reorganised for efficiency but no change in content
ANSI/SPARC (2) • Two distinctions • between the global ‘logical’ data model and the way the data is stored in physical data storage on disk • isolates issues of efficient storage such as indexes, replication of data for fast access and backup • between local application programs and the global data model • isolates functional areas from concern for the full conceptual model - remember ISIS has 250 tables
ANSI/SPARC (3) • External view - defines VIEWS which bring together a subset of the full model for a specific functional area - e.g. enrolment • Conceptual model - full ‘logical’ model of the organisation • Physical - storage of data in underlying files using the appropriate file structures, and allocated to storage units
ANSI/SPARC example • Lecturer view of ISIS • those students which she teaches (restriction) • relevant data items (projection) • denormalised (eg. name and award name by module) • Full model is 250 tables • Physical data storage • duplicates key files for fast retrieval • builds indexes based on common queries
DBMS Software Components • User programs • user developed programs (eg. as VB Macros, PLSQL procedures, PHP scripts,stored Queries) • DBMS Toolkit • tools to help user build systems and user programs • Interface • language and protocol for communication between User programs and Kernel • Kernel • core Database functions - data storage and retrieval • Operating System - file storage
Interface • User programs need to communicate with the Kernel to • update the Schema (Data Definition Language) • update the Factbase (Data Manipulation Language) • Interface must handle: • sending requests to the Kernel e.g. with SQL • receiving results from the Kernel • linking to Kernel, handling errors (e.g ODBC,JDBC) • Receiving results is tricky: • send whole relation in some serialized format • send pointer to temporary table and get each row
Interface (2) • Scripting languages such as PL/SQL (Oracle’s application language) and PHP (a Web server language) send SQL and provide routines for access to the results e.g. • the notion of a ‘CURSOR’ a pointer to the next row of a table to be returned • Access can automate this connection so that a table located on a remote server ‘appears’ to be a local table.
Kernel functions • CRUD - Create, Read, Update, Destroy • at Factbase and Schema level • Data Dictionary - Schema storage • Transaction Management - run-unit completion • Concurrency Control - multiple users • Recovery - backup and restore • Authorisation - users, passwords, areas • Data Communications ( or by OS) • Data integrity - foreign keys etc. • Import/Export, Monitoring
Schema table definitions integrity constraints views (stored queries) access control Stored in Data Dictionary DD is a Database itself, the System Catalog Manipulated with DDL and DCL Factbase tables Manipulated with DML Interface with operating system for file access Transaction control concurrency locking Kernel (2)
Toolkit - Application Development Tools • E.g. Access • Graphical User Interface (GUI) • QBE to generate SQL queries • Form and report definitions • VB Macros for user programs • Natural Language Interface • translation from natural language to SQL
User programs • SQL command line interface • SQLPlus for Oracle, similar for MySQL • Pre-compiler (COBOL..) • Program contains SQL statements which need to be compiled into calls to Kernel functions • Scripts (PHP,Perl) • Generation of SQL calls at runtime
CASE tools • Select SSADM, Rational Rose, Oracle Designer 2000 • allow developers to develop schema from ER model
Admin tools • Database software control • starting and stopping Kernel • backup and restore • Reconstruct indexes • Access control • create users, user groups, • assign user access rights to tables • Import and Export of data and schema • Schema migration - major schema change • monitoring and tuning
Kernel operations • Underlying file types • Indexing • Transactions • Locks • Rollback • Query optimisation
Physical organisation • Secondary storage (on disk) divided into blocks - say 16K in size • For a given file system, blocks have control info(index to records in block..) and free space as well as user records • structured blocks are called ‘pages’ • Data is transferred between main memory and secondary storage in pages • Pages can be held in memory (cached) for speed of access
Indexes in SQL • In Access, just mark a field as indexed • in SQL • CREATE INDEX enameInd on EMP(Ename); • drop index • DROP INDEX • Extra indexes speed retrieval but slow down updates (because Index structures must be updated too)
Transactions • ‘Transaction’ in a DBMS means an elemental unit of work which is either completed in full or fails totally • DB Transactions are not the same as a user transaction such as ‘place an order’ • Each transaction has • begin transaction • some work - reads and writes to the db • either COMMIT (make changes permanent) • or ROLLBACK (destroy all evidence of work)
ACID • A Atomicity • Transaction is indivisible unit of work - can’t partly succeed • C Consistancy • Transaction (failed or completed) must leave DB in a consistant state • I Isolation • Each transaction must appear to run in isolation to any other transactions • D Durability • Work done must be permanent
Implementing ACID • A Atomicity • system must be able to undo work if transaction fails - e.g. due to failed integrity constraint • system could record the before state of an record and restore these when rollback required • C Consistancy • fail if an integrity or transaction constraint violated • I Isolation • concurrency problem >> • D Durability • DB backup and recovery, transaction logging
Concurrency Control • If transactions could be serialised - executed one at a time - each would execute in Isolation • but this would slow the system down - most of the time is spent waiting for disk access • If not prevented, interaction between transactions can cause anomolies
T1 bot read(x) x=x+1 write(x) commit if x=2 at start, it should be 4 at end T2 bot read(x) x=x+1 write(x) commit but its only 3! Lost Update
T1 bot read(x) x=x+1 write(x) abort x should be 3 T2 bot read(x) x=x+1 write(x) commit but it is 4! Dirty Read
T1 bot read(x) read(x) commit T1 sees different values of x during its execution T2 bot read(x) x=x+1 write(x) commit Inconsistent Read
x+y must = 100 T1 bot read(y) read(x) ? x+y=90 commit T2 bot read(y) read(x) x=x -10 write(x) y=y+10 write(y) commit now x+y = 100 again Ghost update
Scheduling • A schedule is a sequence of operations (reads or writes) from multiple transactions • Reads and writes are assumed to be atomic themselves • The Scheduler tries to create a schedule which preserves Isolation • Serial schedule puts one transaction after another, but this will be ineffficient • Need to find equivalent, shorter schedules
Pessimistic/Optimistic • Pessimistic • assume transactions will interact and prevent it • Locking • Timestamping • Optimistic • assume transactions will NOT interact but take action if they do
Locking • Read lock (shared lock) • any number can read but no one can write • readers need counting • Write lock (exclusive lock) • no one else can read or write • Unlock - drop the lock • Transaction waits if resource already locked • Locks held in DB - lock table • Lock granularity • best if only a record is locked but lock table large
Two-phase locking • Discipline on transactions to ensure schedule is serialiable • Growing phase: • Transaction must acquire all its locks in one phase • Lock level can be escalated ( read > write) • Shrinking phase • Transaction must release all its locks in the second phase • Lock level can reduce (write > read) • Can’t acquire a lock after releasing a lock
T1 bot wlock(x) read(x) x=x+1 write(x) unlock(x) commit T2 bot wlock(x) wait wait wait read(x) x=x+1 write(x) commit Lost Update with Locking
T1 bot wlock(x) read(x) x=x+1 write(x) unlock(x) abort x should be 3 but it is still 4 ! T2 bot wlock(x) wait wait wait read(x) x=x+1 write(x) unlock(x) commit Dirty Read with locking
Strict Two-phase locking • Locks can only be released after commit • Deadlock • T1 wlocks(x) and waits to wlock(y) • T2 wlocks(y) and waits to wlock(x)
Deadlock • Deadlock • T1 wlocks(x) and waits to wlock(y) • T2 wlocks(y) and waits to wlock(x) • Approaches • timeout - lock expires so transaction aborts • deadlock detection - identifing deadlocks and killing one transaction