540 likes | 554 Views
Delve into the organized world of Chapter 8 on Database Management Systems (DBMS) in this comprehensive guide. Learn about database fundamentals, file systems, three-schema architecture, ACID properties, and more. Understand database integrity, normalization, operators, and Oracle data types. Discover the advantages and disadvantages of DBMS, Oracle Database Server Architecture, and the role of a Database Administrator. Explore models like Hierarchical, Network, and Relational, as well as Codd's rules and RDBMS products.
E N D
Management Information SystemComprehensive and Organized Chapter 8 Database Management System (DBMS)
Chapter Highlights • Database Fundamentals • File system and database system • Three schema architecture • ACID properties of database • Models of database management system • Codd’s rules • Object Oriented database • NoSQL Database • Database integrity using keys • Normalization of tables • Database operators and Oracle data types • Advantages and disadvantages of DBMS • Oracle Database Server Architecture • Database Administrator • Two tier, Three tier and Service Oriented Architecture
Database Fundamentals What is a DBMS? • DBMS consists of a collection of inter-related data and a set of programs to access the data. • The collection of data is usually referred to as database which contains information about one particular enterprise. Goal of DBMS • The primary goal of a DBMS is to provide an environment that is both convenient and efficient to use in retrieving and storing database information. • DBMS is the interface between the user of application programs on one hand and the database on the other
Three Schema Architecture • Objective of this architecture is to separate each user’s view of the database from the way it is physically represented. • Three schema architecture defines DBMS schemas at three levels In other cases, the developer may be unsure of the efficiency of the algorithm, the adaptability of an OS. • In these, and many other situations, prototyping paradigm is the best approach. Prototype serves as a first system
ACID Properties of Database ACID Properties of Database In the context of transaction processing, the acronym ACID refers to the four key properties of a transaction: Atomicity, Consistency, Isolation, and Durability. Atomicity • All changes to data are performed as if they are a single operation. That is, all the changes are performed, or none of them are. • Example: an application that transfers funds from one account to another, the atomicity property ensures that, if a debit is made successfully from one account, the corresponding credit is made to the other account. Consistency • Data is in a consistent state when a transaction starts and when it ends. • Example: an application that transfers funds from one account to another, the consistency property ensures that the total value of funds in both the accounts is the same at the start and end of each transaction
ACID Properties of Database Isolation • The intermediate state of a transaction is invisible to other transactions. As a result, transactions that run concurrently appear to be serialized. • Example: an application that transfers funds from one account to another, the isolation property ensures that another transaction sees the transferred funds in one account or the other, but not in both, nor in neither. Durability • After a transaction successfully completes, changes to data persist and are not undone, even in the event of a system failure. • Example: an application that transfers funds from one account to another, the durability property ensures that the changes made to each account will not be reversed.
DBMS Models – Hierarchical Model At the top of hierarchy there is only one entity which is called Root. Here department is the root and children are course and student. Great grandparent is the root of the structure. GET UNIQUE, GET NEXT student within departments etc. IMS DB/DC is an example of Hierarchical database
DBMS Models – Network Model This model is more flexible than Hierarchical model. In the network model, entities are organized in a graph, in which some entities can be accessed through several path. Examples of Network database, are, IDS, IDMS, Total, Adabas, DMS2
DBMS Models – Relational Model The Relational Model was the first theoretically founded and well thought out Data Model, proposed by E. F. Codd in 1970, then a researcher at IBM. It has been the foundation of most database software and theoretical database research ever since.
RDBMS Products RDBMS Products • Examples of popular RDBMS products are • Oracle Database 12c, • DB2of IBM, • SQL Server of Microsoft, • open source PostgreSQL, • recently introduced Amazon Aurora of Amazon.com which is compatible with open source RDBMS MySQL
Relational Model and Codd’s rules Codd proposed 13 rules to test DBMS concept against his relational model. Codd’s rules defines the features that a DBMS is required to possess in order to become a Relational Database System (RDBMS). Till now there is hardly any commercial products that strictly follow all the 13 rules of Codd. Rule 0: This rule states that a system to qualify as an RDBMS, it must be able to manage database entirely through the relational capabilities. Rule 1: Information rule - All information (including metadata) is to be represented as data stored in cells of a table Rule 2: Guaranteed Access - Each unique piece of data should be accessible by table name + primary key (row) + attribute (column) Rule 3: Systematic treatment of null - Null may mean missing data, not applicable, no value should be handled consistently - not zero or blank. Primary key cannot be null.
Relational Model and Codd’s rules Rule 4: Active Online Catalogue - Database dictionary (catalogue) to have description of database catalogue to be governed by same rules as a rest of database. Same query language can be used on catalogue as on application database. • Top level DB Hierarchy consists of catalogs, each of which consists of schema, namely, SQL, views, etc. First connect the database with inputs like username and id and when connected the default catalogue and schema are set up for connection. • DBMS catalogue stores the following information: • No of tuples in the relation • No of block in the containing tuples of relation • Size of a tuple of relation in bytes • No of distinct values that appear in the relation r for attribute A
Relational Model and Codd’s rules Rule 5: Powerful language - Well defined language to provide all manners of access to data e.g. Structured Query Language (SQL) Rule 6: View Updation Rule - All views that are theoretically updatable should be updatable. View or a 'Virtual Table', is temporarily derived from base tables. If a view is formed as Join of 3 tables, changes to view should be reflected in base table Rule 7: Relational level operators - Insert, Update, Delete as well as set operators namely, Union, Intersection, Minus should be supported Rule 8: Physical Data Independence - Physical storage should not matter to system. If some files supporting table was renamed or moved from one disk to other, it should not affect application Rule 9: Logical data independence - If there is a change in the logical structure (table structure) of the database, the user view of the data should not change.
Relational Model and Codd’s rules Rule 10: Integrity independence - Database should be able to enforce its own integrity rather than using other program's rule - provision of filter to correct data. There are three types of integrity, namely, a) Entity integrity where primary key is not null and each tuple of the table is uniquely identified b) Referential integrity consistency among tuple by referring the primary key as foreign key in another table c) Domain integrity to enforce the range of specified values of a particular attribute in a table. Rule 11: Distribution independence - A database should work properly regardless of its distribution across network as well as remotely. Rule 12: Non subversion rule - If low level access is allowed to a system it should not be able to subvert or bypass integrity rules to change data. This may be achieved by locking /encryption.
Object Oriented Database • The object oriented database model is considered to be a high end technology framework dedicated to the new generation multimedia web based applications. An object comprises of data that describes the attributes of an entity coupled with the operations that can be performed upon the data. • This encapsulationcapability allows the object oriented model to more easily handle complex types of data (graphics, pictures, voice, and text) than other database structures. • The object oriented model also supports inheritance that is new objects can be automatically created by replicating some or all of the characteristics of one or more parent objects.
Object Oriented Database Advantages and Disadvantages Advantages • OOD facilitates reduced maintenance, reusability of source program code, real world modeling, and improved reliability and flexibility. • It has the ability to reduce some of the major expenses associated with systems, such as maintenance and development of programming code. • In view of this, it is economic to adopt OOD approach. • Some of the benefits of the object-oriented approach are listed here: Reduced Maintenance Real-World Modeling Improved Reliability and Flexibility High Code Reusability:
Object Oriented Database Advantages and Disadvantages Disadvantages • Object-oriented Development may not be a complete solution: • Object-oriented Development is not a technology: • Object-oriented Development is not yet completely accepted by major vendors: • Scarce Expertise • Lack of standards: • Locking at object level may impact performance: Many • Lack of universal data model: There is no universally agreed data model for an OODBMS, and most models lack a theoretical foundation. OODBMS Products: Examples of OODBMS are Db4o, Smalltalk, Cache used for computer aided design, geographic information system
Distributed Database and Centralized Database Large commercial databases may exist in two different topologies. • Centralized- where the database is physically in one location and users typically use an Internet connection to access it. Large banks, insurance companies use centralized databases. • Distributed- Where the database is in many locations often you have a national or international company and customers tend to regularly interact with a local branch. For example: Google uses Big-Table a distributed DBMS as searching tends to be by users in a particular region of the world. • In both cases the database "looks" like one database.
Distributed Database and Centralized Database Centralized • A single database maintained in one location, usually based on relational model. • Generally managed by a database administrator and local team. • Access via a communications network • LAN • WAN • Terminals provide distributed access Examples • Some major banks do all their processing on a mainframe, in some cases in a different country. • Clients may use several branches, and online banking for transactions. • Airline reservation systems need to be centralized to avoid double bookings.
Distributed Database and Centralized Database Distributed database A single logical database that is spread physically across computers in multiple locations connected through data communications link (WAN). • Most processing is local • Need for local ownership of data • Data sharing require • In distributed database users perceive as if they are working with a single corporate database Example Google: Use a DBMS called Bigtable. (Note it is not a Relational Database).
Super key, Candidate key, Primary key, Foreign key • Super key - Non minimal candidate key, Set of one or more attributes that are taken collectively and can identify all other attributes uniquely. • Candidate key - If a relationship schema has more than one key, each is called a candidate key, and one of the candidate keys arbitrarily is assessed primary key. A candidate key is a minimal set of attributes necessary to identify a tuple; this is also called a minimal super key • Unique key - Since one table can have at the most one primary key, to avoid duplicates in other database columns, unique key can be defined • Primary key - Primary key is minimal candidate key, There should at least one primary key for a table, Primary key columns do not accept NULL value, Primary key columns do not accept duplicate values • Foreign key - A foreign key is an attribute or combination of attributes in a relation whose value matches a primary key in another relation. The table in which foreign key is created is called as dependent table.
Database NormalizationUnnormalized table Unnormalized table has repeating groups, non atomic values in cells
Database NormalizationFirst Normal Form As per the rules of 1 NF, each row should be uniquely identified, the value of each cell in the table should be atomic and there should be no repeating group in the table.
Database NormalizationSecond Normal Form A table to be in 2nd Normal Form, a) it should be in 1 NF b) the non key attributes should be functionally dependent on the key attribute.
Database NormalizationThird Normal Form A table to be in 3rd Normal Form, a) it should be in 2 NF b) the non key attributes should be non transitively dependent on the key attribute. 3rdNormal form is achieved by removing the attribute namely Supplier_Location from the Supplier table, which is not directly dependent on the primary key, and creating another two tables namely, Supplier_Location Table and Location_MasterTable
Database NormalizationBoyce Codd Normal Form A table is in BCNF if every determinant is a candidate key. A table can be in 3NF but not in BCNF. This occurs when a non key attribute is a determinant of a key attribute. The dependency diagram may look like the one below:
Database NormalizationFourth Normal Form This specifies that the relation should be in 1st, 2nd and 3rd Normal form and it attempts to rationalize the multi-valued dependencies in a relation. • For example, STAFF_NAME, PROJECT_ID, PASTIME. • Here, the staff can be associated with multiple projects and can have multiple pastime. • Therefore, this can be decomposed into the following two relations: 1. STAFF_PROJECT: STAFF_NAME, PROJECT_ID 2. STAFF_PASTIME: STAFF_NAME, PASTIME
Database NormalizationFifth Normal Form A table is said to be in the 5NF a) if it is in 4th normal form b) if and only if every non-trivial join dependency in it is implied by the candidate keys. For example: a table containing SALESMAN_ID, PRODUCT_ID and BRAND_ID can be split into three tables: a) SALESMAN_PRODUCT: SALESMAN_ID, PRODUCT_ID b) PRODUCT_BRAND: PRODUCT_ID, BRAND_ID c) SALESMAN_BRAND: SALESMAN_ID, BRAND_ID
Relational Algebra • Relational algebra provides a formal foundation for relational model operation forming a basis for implementing and optimizing queries that have been incorporated into the SQL for RDBMS. • Relational algebra is the integral part of relational database model and a procedural query language, which takes instances of relations as input and yields instances of relations as output. • The fundamental operations of relational algebra are as follows: • Select (σ) • Project (∏) • Union (U) • Set different ( - ) • Cartesian product (×) • Rename (ρ)
Relational Algebra Applications of relational algebra can be explained through usage of the following two operators a) σ or sigma for Select operations b) ∏ or pi for Project operations σ (Sigma) select certain rows from the table discarding the other rows Notation:σpredicate(Relation Table Name) Example: Selection of products with sales value above Rs. 8000 from table ‘products’: σsales>8000(products) ∏ (Pi) selects certain columns from the table discarding the other columns Notation: ∏(selected attributes) (Relation Table Name) Project operation can also be used in conjunction with select operation. Notation: ∏(selected attributes) (σpredicate(Relation Table Name)) Example: Projecting product names and quantity of products sold above Rs. 8000 from table ‘products’ ∏ product_name, product_qty(σsales>8000(products))
Relational Calculus Relational Calculus or Tuple Relational Calculus provides what information the result should contain whereas Relational Algebra defines a set of operations for the relational model. • Generalized notation of tuple calculus is: { t | cond(t) } • If the earlier example of relational algebra has to be expressed in terms of tuple calculus, the following expression can be considered: • {t.product_name, t.product_qty | products(t) and t.sales>8000} • Condition products(t) provides that range relation of tuple variable t is products
Structured Query Language (SQL) • SQL is used primarily to access and retrieve data from table(s). SQLs can be embedded in a computer program or Programming Language SQL (PL/SQL) can be used for manipulation in the database stored in the computer. • Originally, SQL was called SEQUEL (for Structured English Query Language) and was designed and implemented at IBM research as the interface for an experimental relational database system called SYSTEM R. • SQL is now the standard language for commercial relational DBMSs. • A joint effort by ANSI and ISO has led to a standard version of SQL (ANSI 1986), called SQL86 or SQL1, later revised and expanded to SQL92 or SQL2 (Elmasri & Navathe, 2000).
SQL Classification • Data Definition Language (DDL): This is used to create, alter, drop a table from the database. Example: CREATE, ALTER, DROP • Data Query Language (DQL): Using these commands, it is possible to retrieve data from the table and displaying the list on the screen. Example: SELECT. (It is often included in DML). • This is a widely used SQL command and typical syntax is: SELECT <ATTRIBUTE NAME> FROM <TABLE NAME> or SELECT * FROM <TABLE NAME> for listing all the attributes and their values. • Data Manipulation Language (DML): For inserting record or tuple, or to change, modify value of attribute(s) in a table. Example: INSERT, DELETE, UPDATE • Data Control Language (DCL): With these commands, it is possible to restrict access of users. This is made to ensure security of the database. Example: GRANT, REVOKE • Transaction Control Language (TCL): Using these commands, it is possible to ensure durability of the transaction carried out in the system or to reinstate the transaction in the previously executed state in case of any eventuality: COMMIT, SAVEPOINT, and ROLLBACK
Oracle Data Type • For declaring different variables, following data types are commonly used in Oracle SQL
Oracle Database Server Architecture Oracle Server: Oracle server containsseveral files, processes, and memory structures. Instance: Every running Oracle database is associated with an Oracle instance. When a database is started on a database server (regardless of the type of computer), Oracle allocates a memory area called the System Global Area (SGA) and starts one or more Oracle processes. This combination of the SGA and the Oracle processes is called an Oracle instance. User Process: During execution of Pro*C program or SQL*Plus by the user, the OS creates a client process (termed as a user process). Server Process: Oracle DB creates server processes to handle the requests of client processes connected to the instance. A client process always communicates with a database through a separate server process. Server processes created on behalf of a database application can perform one or more of the following tasks, namely, parse and run SQL statements, execute PL/SQL code, read data blocks from data files into the database buffer cache
Oracle Database Server Architecture Mandatory Background Processes • Program Global Area or Process Global Area (PGA): PGA is a memory area (RAM) storing data and control information for a single process. The PGA is allocated at the time of creation of a process and deallocated when the process is terminated. • System Global Area (SGA) The most important memory structure in Oracle is SGA. Summarily, it is a large part of memory that all the Oracle background processes access. SGA consists of Buffer cache (store SQL statements), Shared pool (comprises of SQL library cache for parsing SQL statement and data dictionary cache), Redo log buffer (every change written here), Large Pool (I/O server process and backup) , Java Pool (holds Java execution code)
Oracle Database Server Architecture • PMON (Process Monitor Process): PMONwatches the user processes in the database to make sure that they work correctly. • SMON (System Monitor): The usage and function of this Oracle background process is twofold. First, in the event of an instance failure – when the memory structures and processes that comprise the Oracle instance cannot continue to run – the SMON process handles recovery from that instance failure. Second, the SMON process handles disk space management issues on the database by taking smaller fragments of space and “coalescing” them, or piecing them together. • DBWR (Database Writer Process): This background process handles all data blocks that write to disk. It works in conjunction with the Oracle database buffer cache memory structure. • LGWR (Log Writer Process): This background process handles the writing of redo log entries from the redo log buffer to online redo log files on disk. • RECO (Optional): The recoverer process. In Oracle databases using the distributed option, this background process handles the resolution of distributed transactions against the database. • ARCH (Optional): The archiver process.
Database Administration A database administrator (DBA) is an IT professional responsible for the installation, configuration, upgrading, administration, monitoring, maintenance and security of databases in an organization. A summary of DBA responsibilities is furnished here: • Incorrect calculation • Incorrect data edits & ineffective data Architecting the design of central database, distributed database or object oriented database and specify the protocols of communications across the databases for DDBMS • Preparation of data dictionary, installation of database on server, maintaining database, documentation • Establishing database procedures and standards, like data security, recovery procedures • Password and user identification • Data access and manipulation procedures • Backup procedure, what state the backup refers to, for example, hot-backup or cold-backup, frequency or schedule of backup • Plans and implements recovery processes • Software usability by end users & obsolete software • Archival of data and determining physical storage requirements, that is, volume of data to be stored, anticipated growth, average size of records etc. • Troubleshoots with problems regarding the databases, applications and development tools • Responsibility for the physical database design • Manage sharing resources among applications • Administers all database schema objects, e.g., tables, clusters, index, views etc. • Enforces database constraints to maintain integrity
Database Architecture – Two Tier • During the late 70s, large mainframe system catering to multiple users were based on the concept of 2 tier architecture. • It was alternatively known as client-server architecture implemented in the mid range systems where the entire database was loaded on the server and client desktop had the applications for carrying out transactions processing ultimately updating the database on the server. • But the serious limitations of such architecture were a) controlling version was a challenge b) limited scalability as it supports limited number of users c) it did not provide the facility to get connected through Internet d) in other words there was no remote login or connectivity excepting for dedicated leased telecom lines, which was quite expensive. e) Besides, it did not support the concept of centralized database system. f) For example, earlier Oracle Version 9.0 database was used in the server and client based screen designer - Designer 2000 installed on the desktop.
Database Architecture – Three Tier Three tier architecture included the webserver apart from the database and application servers. This architecture supported the concept of centralized application system allowing concurrent users from different geographic locations to access the system. It has improved scalability and integrity, enhanced security and reusabiity of component objects. The schematic diagram of two tier architecture is shown