330 likes | 441 Views
Multimedia Information Systems. CS 4570. Outlines. Introduction to DMBS Relational database and SQL B + - tree index structure. Database Management System (DBMS). Collection of interrelated data Set of programs to access the data DBMS contains information about a particular enterprise
E N D
Multimedia Information Systems CS 4570
Outlines • Introduction to DMBS • Relational database and SQL • B+- tree index structure
Database Management System (DBMS) • Collection of interrelated data • Set of programs to access the data • DBMS contains information about a particular enterprise • DBMS provides an environment that it both convenient and efficient to use
Purpose of Database Systems • To overcome the disadvantages of the typical file–processing systems: • Data redundancy and inconsistency • Difficulty in accessing data • Data isolation multiple files and formats • Integrity problems • Atomicity of updates • Concurrent access by multiple users • Security problems
Data and Data Models • Data is really the “given facts” • A data model is a collection of tools for describing: • Data • Data relationships • Data semantics • Data constraints • Object-based logical models: • object-oriented model, semantic model… • Record-based logical models: • relational model (e.g. SQL, DB2)…
Entity-Relationship (ER) Model • A database can be modeled as: • a collection of entities, • relationships among entities
Relational Databases • A relational database is a database that is perceived be its users as a collection of tables. ( and nothing but tables ).
Relation Instance • The current values (relation instance) of a relation are specified by a table. • An element t of r is a tuple; represented by a row in a table. tuple attribute
Structured Query Language (SQL) • SQL is based on set and relational operations with certain modifications and enhancements • A typical SQL query has the form: selectA 1 , A 2 , ..., A n fromr 1 , r 2 , ..., r m whereP • Ais represent attributes • ris represent relations • P is a predicate. • The result of an SQL query is a relation.
Banking Example branch(branch-name, branch-city, assets) customer(customer-name, customer-street, customer-city) account(branch-name, account-number, balance) loan(branch-name, loan-number, amount) depositor(customer-name, account-number) borrower(customer-name, loan-number)
The Select Clause • The select clause corresponds to the projection operation of the relational algebra. It is used to list the attributes desired in the result of a query. • Find the names of all branches in the loan relation selectbranch-name fromloan • An asterisk in the select clause denotes”all attributes” select * fromloan
The Select Clause(Cont.) • SQL allows duplicates in relations as well as in query results. • To force the elimination of duplicates, insert the keyword distinct after select. Find the names of all branches in the loan relation, and remove duplicates selectdistinctbranch-name fromloan • The keyword all specifies that duplicates not be removed. Selectallbranch-name fromloan
The Select Clause(Cont.) • The select clause can contain arithmetic expressions involving the operators,+,-,*,/,and operating on constants or attributes of tuples. • The query: selectbranch-name,loan-number,amount * 100 fromloan would return a relation which is the same as the loan relation, except that the attribute amount is multiplied by 100
borrower (customer-name, loan-number) loan (branch-name, loan-number, amount) borrower × loan ( borrower.customer-name, borrower.loan-number, loan.branch-name, loan.loan-number, loan.amount) The from Clause • The from clause corresponds to the Cartesian product operation of the relational algebra. It lists the relations to be scanned in the evaluation of the expression. • Find the Cartesian product borrower loan select * fromborrower, loan • Find the name and loan number of all customers having a loan at the Perryridge branch. selectdistinctcustomer-name,borrower.loan-number fromborrower,loan whereborrower.loan-number=loan.loan-numberand branch-name=“Perryridge”
The where Clause • The where clause corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations that appear in the from clause. • Find all loan numbers for loans made at the Perryridge branch with loan amounts greater than $1200. selectloan-number from loan wherebranch-name=“Perryridge” and amount>1200 • SQL uses the logical connectives and, or, and not. It allows the use of arithmetic expressions as operands to the comparison operators.
The where Clause(Cont.) • SQL includes a between comparison operator in order to simplify where clauses that specify that a value be less than or equal to some value and greater than or equal to some other value. • Find the loan number of those loans with loan amounts between $90,000 and $100,000(that is, $90,000 and $100,000) selectloan-number fromloan where amountbetween 90000 and 100000
Example Query • Find all branches that have greater assets than some branch located in Brooklyn. selectdistinctT.branch-name frombranchasT, branchasS whereT.assets > S.assetsand S.branch-city =“Brooklyn”
B+ -Tree Index Files B+ -Tree indices are an alternative to indexed-sequential files. • Disadvantage of indexed-sequential files: performance degrades as file grows, both for index lookups and for sequential scans through the data. • Advantage of B+ -Tree index files: automatically reorganizes itself with small, local, changes, in the face of insertions and deletions. • Disadvantage of B+ -Trees: extra insertion and deletion overhead, space overhead. • Advantages of B+ -Trees outweigh disadvantages, and they are used extensively.
B+ -Tree Index Files (Cont.) A B+ -Tree is a rooted tree satisfying the following properties: • All paths from root to leaf are of the same length • Each node that is not a root or a leaf has between n/ 2 and n children • A leaf node has between (n-1)/2 and n-1 values • Special cases:if the root is not a leaf it has at least 2 children; if the root is a leaf (that is there are no other nodes in the tree) it can have between 0 and (n-1) values.
B+ -Tree Node Structure • Typical node • Ki are the search-key values • Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of records (for leaf nodes). • The search-keys in a node are ordered K1 < K2 < K3 < ...
Leaf Nodes in B+ -Trees Properties of a leaf node: • For i = 1, 2, . . . , n-1, i either points to a file record with search-key value Ki , or to a bucket of pointers to file records, each record having search-key value Ki . Only need bucket structure if search-key does not form a primary key. • If Li , Lj are leaf nodes and i < j , Li‘s search-key values are less than Lj’s search-key values • Pn points to next leaf node in search-key order
Non-Leaf Nodes in B+ -Trees • The nonleaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with n pointers: • All the search-keys in the subtree to which P1 points are less than K1 • For 2 i n-1 all the search-key in the subtree to which Pi points have values greater than or equal to Ki-1 and less than Ki • All the search-keys in the subtree to which Pn points are greater than or equal to Kn-1
Queries on B+-Trees (Cont.) • In processing a query, a path is traversed in the tree from the root to some leaf node. • If there are K search-key values in the file, the path is no longer than log n/ 2 (K ). • A node is generally the same size as a disk block, typically 4 kilobytes, and n is typically around 100 (40 bytes per index entry). • With 1 million search key values and n = 100, at most log50 (1, 000, 000) = 4 nodes are accessed in a lookup. • Contrast this with a balanced binary tree with 1 million search key values --- around 20 nodes are accessed in a lookup • above difference is significant since every node access may need a disk I/O, costing around 30 millisecond!
Updates on B+-Trees: Insertion (Cont.) • Splitting a node: • take the n (search-key value, pointer) pairs (including the one being inserted) in sorted order. Place the first n/ 2 in the original node, and the rest in a new node. • let the new node be p, and let k be the least key value in p. Insert (k, p) in the parent of the node being split. If the parent is full, split it and propagate the split further up. • The splitting of nodes proceeds upwards till a node that is not full is found. In the worst case the root node may be split increasing the height of the tree by 1.
Self-Practice • Construct a B+-tree for the following set of key values: (2, 3, 5, 7, 11, 17, 19, 23, 29, 31) with n = 6