Understanding Persistent Data Management Systems

Lecture 15 Persistence Database Management Systems

Resources so far • We studied: • How memory is managed within the virtual memory space of a process. • How messages can be exchanged between processes, using inter-process communication

Storage resource • Storage remains unchanged beyond the end of the process execution. • RTE provides services to processes to support persistent data storage. • Persistent: the data survives after the process with which it was created has ended.

Persistent storage • Can be shared among several processes • We will discuss two types of persistent storage services: • The file system • Database services.

File System (FS) Services • The file system provides the following interface to processes: • Map names to storage locations: • Storage location=storage on disk. • FS maps names ("/usr/local/file.ext") to concrete locations on disk. • Interface through system calls:open(name), create(name), delete(name);queries getfiles(name-pattern). • FS organizes space of file names in hierarchical structure of embedded directories.

File System Services • Access through streams: • Stream interface to read and writea file as a stream of bytes. • FS Allows to seek within the file, read its content and write new content. • Allocationof space / automatic increase storage space (in writes).

File System (FS) Services (interface) • Locking files: • Allows multiple processes to synchronize access on shared files. • One can lock a whole file or just a segment of bytes within the content of a file.

File Systems • FS provides abstract interface of system calls (open, read, write, seek etc) • implemented by specific drivers which know how to interact with specific hardware devices. • Distributed FS: • process can request access to files stored on a remote machine: • Local FS in RTE passes the system calls (open, read, etc) to FS-server on server • Distributed file systems (for example, NFS and SAMBA) have complex locking mechanisms.

Database Management System (DBMS) • Another way to store persistent data. • DBMS is present as a service. • Processes connect through inter-process communication protocol (over TCP). • DBMS manages storage, often by accessing FS on its side.

File systems vs. DBMS - differences • Data Model • FS is very simple: a stream of bytes with the open/close, read/write and lock operation. • DBMS defines a rich data model, manages serialization (encoding/decoding). • Programmer must define the specific data model for the data he is interested in storing.

File systems vs. DBMS • Data Independence – • DBMSs provide an abstract interface for data storage/access. • Programmer does not need to define in which file, at which offset data is to be stored. • One can query the DBMS by content (e.g., "retrieve all content that satisfies certain criteria").

File systems vs. DBMS • Efficient concurrency • DBMSs are built to support thousands of concurrent users. • Ensure data is kept consistent • Provide efficient management of such high concurrency.

File systems vs. DBMS • Reduced application development time • DBMSs allow data manipulation using a simple API. • DBMS manages storage, query optimization, concurrency and integrity management.

DBMS Services • Security management: • Verify that users are authorized to access data. • Specific rights can be granted for each part of the data and each group of users.

DBMS Services • Session and Transaction management: • Users can request the execution of data transactions. • Transaction = complex data operation - read and modify many different objects • Viewed from the outside as a single atomic operation- completely succeeds or not performed.

DBMS Services • Query optimization and execution: • clients interact with DBMS by complex queries • DBMS execute queries most efficiently • Powerful query compilation: relational database servers.“Relational” == “tabular”.

Back-end services • not directly visible to user: • File and Access methods. • Buffer management. • Disk space management.

Transaction: • Unitof work performed against a database management system • Treated in a coherent and reliable way independentlyof other transactions. • A transaction is atomic, consistent, isolated, durable (ACID). • Composed of independent units of work, each reading and/or writing information to DB

Transaction • Provide "all-or-nothing" - work units must complete or take no effect whatsoever. • Must be isolated from other transactions • Results must agree with existing constraints in database. • If completed successfully transactions must be committed to the storage.

Transaction pattern • A transaction is usually issued to the DB in a language like SQL, using a pattern similar to the following: • Begin transaction. • Executedata manipulations and queries. • If no errors occur then committransaction. • If errors occur rollbackthe transaction.

Transactions resources • To implement transactions, the RTE employs the following logic: • Identify resources that will be accessed during the transaction. • SQL: full tables or parts of tables (rows). • Acquire locks on all resources before transaction starts.

Transactions safety • Perform transaction. • Modifications performed on snapshot in memory. • isolation: data in transaction is not affected by other threads. intermediate modifications not committed are not visible to other threads.

Transactions safety • Transaction committed: modifications are merged to persistent storage • If transaction is rollbacked: modifications are deleted and the storage is not modified.

Data Models • Data model = which data values can be sent for storage • Example: Object Oriented method: • Primitive data types (int, bool, char etc) • Complex data types • References, methods etc. • Dominates the programming world. • DBMS - relational model – dominates DB world.

Relational Model • "Relation" = "table","relational" = "based on tables“… • Logical representation of information. • Database normalization - is the process of organizing the columns (attributes) and tables (relations) of a DB to reduce data redundancy and improve data integrity.

Case Study: A Data Model for Colleges • Data model of an academic college. • Different departments, each specializing in teaching a certain domain. • Computer Science / Physics department • Department has a department head and a geographical location. • Students may be enrolled in one department and registered to courses given by it.

building blocks • Relation - set of records of same fields. header of a table • Our example: students, departments, etc. • Attribute (columns) - field within a relation(data type, name) • Our example: student name, department head,

building blocks • Domain - restriction of data types • our example: student ID is 9 digits 0-9 each. • Records - (or tuple), single row within a relation. • Integrity constraints - constraint on attribute with regard to all records • our example: Student ID is unique.

Relational Algebra • Selection - select specific rows from a relation. • Projection - projects specific columns • Set operations - union, intersection, cross-product, set-difference • on relations that have the same structure (same attributes) • Join - cross product followed by selection and projection. • Renaming operation - change name of columns.

Example: select Selection: Projection:

Join example: consider the tables

Cross join

Inner join

Properties • Take existing relations as arguments and return new relations as value • Columns of tables in DB are restricted to being simple values • Columns must be primitive data types • No embedded tables or array. • Complex data - primary and foreign keys.

SQL (Structured Query Language) • Query language - interface with relational DBMS • DDL: Data Definition Language define schemas, relations and domains. • DML: Data Manipulation Language - queries, insertions, updates and deletions

Client interacts with SQL server by sending SQL queries (TCP protocol) • Receiving as answers a result-set. • The client requests the rows of the result as needed (result may be large). • Communication protocol between an SQL server and its clients is statefuland session oriented.

Keys • subset of attributes of relation which uniquely identify rows. • Example: relation for students with attributes name, birthday, address and Social Security Number (SSN). • Name is not a key (two students could have the same name). SSN is a key. • Several attributes as a key: courses (course name, department, semester, lecturer) course name is not a key / but the pair (course name, semester) could be.

Database Normalization and Foreign Keys • minimize duplication of information (also known as "Don't Repeat Yourself" (DRY) • Duplication= data appear twice in the domain of the problem modeled. • Example: relation student (ID, Name, Course) a student is enrolled to 3 courses = 3 records duplicating the ID and Name

Solution: student data will be stored only in a single relation. • Use keys to remove data repetition • use a single attribute SSN to refer to a student.

Join • Retrieve the information on the student given a tuple from the courses relation, we need to operate a join operation on the SSN attribute (which is shared by the 2 relations) • The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there are no matches in the right table (table_name2).

Foreign key • In student relation, SSN is a key. • In course relation, StudentSSN is not a key • Several courses taken by a single student • StudentSSNrefers to the key of another table. • It is foreign key in the course relation - refers to the student relation.

Data Integrity Constraints • Cannot create a tuple in the course relation that refers to a student that does not exist. • New row in course relation: check foreign key exists in student relation. • Update SSN field of course relation: check new StudentSSNexists in student relation. • Delete row in student relation: verify no rows in course that refer to it • Update row in student relation: verify no rows in course that refer to old value.

When defining a data model on a server, user declares which constraints are part of the model: • keys and foreign keys. • DBMS server enforces constraints and verifies that insert, update, delete operations – have no conflict with constraints.

NULL • belongs to all domains • indicateslack of knowledge- we do not know what its value is • example (1 = NULL) is false and (1 != NULL) is also false! (NULL = NULL) is also false. • compare NULL with value, result is always false.

Indexing • Representing the records in a relation in a data structure to allow efficient queries with respect to certain attributes. • B-Tree / hash-tables • query efficient • cost time when inserting or deleting data. • Implicit indexing - employed by DBMS for attributes defined as unique (keys).

Design Steps

Step A – Identify simple correlation among attributes • A Student has a name and ID. • ignore Courses and Department for now - not simple - hold correlation to other attributes as well: • Department has a name, location and head. • Course has a name.

Step B – Identify unique attributes • Unique attribute – the most important feature. • We wish to have at least one attribute in a relation that is unique • Primary Key - a unique attribute that must have a value (NULL is not allowed). • Student relation: ID = primary key. • Courses relation = artificial course number. department ID.

Any relation that should be later queried efficiently must have a primary key. • in rare cases, no need to define primary key. • example: logs stored in DB, only insertions are performed occasionally based on a time range. • no requirement to uniquely identify a single row of a log - so there is no need to define a key on such a table.

Understanding Persistent Data Management Systems

Understanding Persistent Data Management Systems

Presentation Transcript

Lecture 15

Lecture 15

Lecture #15

Lecture 15

Lecture 15

Lecture 15

Lecture 15

Lecture 15

Lecture 15

Lecture #15

Lecture 15

Lecture #15

Lecture 15

LECTURE 15

Lecture 15

Lecture 15

Lecture 15

Lecture 15

Lecture 15

Lecture 15

Lecture 15

Lecture 15