700 likes | 795 Views
PMIT-6102 Advanced Database Systems. By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University. Lecture 02 Overview of Relational DBMS. Outline. Overview of Relational DBMS Structure of Relational Databases Relational Algebra. Why Relational DBMS.
E N D
PMIT-6102Advanced Database Systems By- JesminAkhter Assistant Professor, IIT, Jahangirnagar University
Outline • Overview of Relational DBMS • Structure of Relational Databases • Relational Algebra
Why Relational DBMS • Most of the distributed database technology has been developed using the relational model • Very simple model. • Often a good match for the way we think about our data. • Example of a Relation: account (account-number, branch-name, balance)
Relational Design Simplest approach (not always best): convert each Entity Set to a relation and each relationship to a relation. Entity Set Relation Entity Set attributes become relational attributes. Becomes: account (account-number, branch-name, balance) branch-name account-number balance account
Relational Model • Table = relation. • Column headers = attributes. • Row = tuple • Relation schema = name(attributes) + other structure info.,e.g., keys, other constraints. Example: Account (account-number, branch-name, balance) • Order of attributes is arbitrary, but in practice we need to assume the order given in the relation schema. • Relation instance is current set of rows for a relation schema. • Database schema = collection of relation schemas. Account
Basic Structure • Formally, given sets D1, D2, …. Dn a relation r is a subset of D1 x D2 x … x DnThus a relation is a set of n-tuples (a1, a2, …, an) where each ai Di • Example: if customer-name = {Jones, Smith, Curry, Lindsay}customer-street = {Main, North, Park}customer-city = {Harrison, Rye, Pittsfield}Then r = { (Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield)} is a relation over customer-name, customer-street, customer-city
Relational Data Model Set theoretic Domain — set of values like a data type n-tuples (V1,V2,...,Vn) s.t., V1 D1, V2 D2,...,VnDn Tuples = members of a relation inst. Arity = number of domains Components = values in a tuple Domains — corresp. with attributes Cardinality = number of tuples Relation as table Rows = tuples Columns = components Names of columns = attributes Set of attribute names = schema REL (A1,A2,...,An) A1 A2 A3 ... An a1 a2 a3 an b1 b2 a3 cn a1 c3 b3 bn . . . x1 v2 d3 wn Attributes C a r d i n a l i t y Tuple Component Arity
Relation: Example Name address tel # 5 3 7 Cardinality of domain Domains N A T N1 A1 T1 N2 A2 T2 N3 A3 T3 N4 T4 N5 T5 T6 T7 Domain of Relation N A T N1 A1 T1 N1 A1 T2 N1 A1 T3 . . . N1 A1 T7 N1 A2 T1 N1 A3 T1 N2 A1 T1 Arity 3 Cardinality <=5x3x7 of relation Attribute Component Tuple Domain
Attribute Types • Each attribute of a relation has a name • The set of allowed values for each attribute is called the domain of the attribute • Attribute values are (normally) required to be atomic, that is, indivisible • E.g. multivalued attribute values are not atomic • E.g. composite attribute values are not atomic • The special value null is a member of every domain
Relation Schema • A1, A2, …, Anare attributes • R = (A1, A2, …, An ) is a relation schema E.g. Customer-schema = (customer-name, customer-street, customer-city) • r(R) is a relation on the relation schema R E.g. customer (Customer-schema)
Relation Instance • The current values (relation instance) of a relation are specified by a table • An element t of r is a tuple, represented by a row in a table attributes (or columns) customer-name customer-street customer-city Jones Smith Curry Lindsay Main North North Park Harrison Rye Rye Pittsfield tuples (or rows) customer
Relations are Unordered • Order of tuples is irrelevant (tuples may be stored in an arbitrary order) • E.g. account relation with unordered tuples
Database • A database consists of multiple relations • Information about an enterprise is broken up into parts, with each relation storing one part of the information E.g.: account : stores information about accountsdepositor : stores information about which customer owns which account customer : stores information about customers • Storing all information as a single relation such as bank(account-number, balance, customer-name, ..)results in • repetition of information (e.g. customer own two account) • the need for null values (e.g. represent a customer without an account) • Normalization theory deals with how to design relational schemas
The branch Relation The customer Relation Account Relation The depositor Relation
borrowerRelation • The Loan Relation
E-R Diagram for the Banking Enterprise Total Participation mean Every account must be related via account-branch to some branch Arrow from account-branch to branch mean Each account is for a single branch
Keys • A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity. • A candidate key of an entity set is a minimal super key • Customer-id is candidate key of customer • account-number is candidate key of account • Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.
Determining Keys from E-R Sets • Strong entity set. The primary key of the entity set becomes the primary key of the relation. • Weak entity set. The primary key of the relation consists of the union of the primary key of the strong entity set and the discriminator of the weak entity set. • Relationship set. The union of the primary keys of the related entity sets becomes a super key of the relation. • For binary many-to-one relationship sets, the primary key of the “many” entity set becomes the relation’s primary key. • For one-to-one relationship sets, the relation’s primary key can be that of either entity set. • For many-to-many relationship sets, the union of the primary keys becomes the relation’s primary key
Query Languages • Language in which user requests information from the database. • Categories of languages • Procedural • User instructs the system to perform a sequence of operations on the database to compute the desired result. • non-procedural • User describes the desired information without giving a specific procedure for obtaining that information. • “Pure” languages: • Relational Algebra • Tuple Relational Calculus • Domain Relational Calculus • Pure languages form underlying basis of query languages that people use.
Relational Algebra • Procedural language • Six basic operators • select • project • union • set difference • Cartesian product • rename • The operators take two or more relations as inputs and give a new relation as a result.
Select Operation – Example A B C D • Relation r 1 5 12 23 7 7 3 10 • A=B ^ D > 5(r) A B C D 1 23 7 10
Select Operation • Notation: p(r) • p is called the selection predicate • Defined as: p(r) = {t | t rand p(t)} Where p is a formula in propositional calculus consisting of terms connected by : (and), (or), (not)Each term is one of: <attribute> op <attribute> or <constant> where op is one of: =, , >, . <.
Example of selection: branch-name = “Perryridge” (loan) branch-name=“Perryridge”(loan)
Project Operation – Example • Relation r: A B C 10 20 30 40 1 1 1 2 A C A C • A,C (r) 1 1 1 2 1 1 2 Duplicate rows removed =
Project Operation • Notation:A1, A2, …,Ak (r) where A1, A2 are attribute names and r is a relation name. • The result is defined as the relation of k columns obtained by erasing the columns that are not listed • Duplicate rows removed from result, since relations are sets • E.g. To eliminate the branch-name attribute of accountaccount-number, balance (account)
Union Operation – Example • Relations r, s: A B A B 1 2 1 2 3 s r r s: A B 1 2 1 3
Union Operation • Notation: r s • Defined as: r s = {t | t r or t s} • For r s to be valid. 1. r,s must have the same arity (same number of attributes) 2. The attribute domains must be compatible (e.g., 2nd column of r deals with the same type of values as does the 2nd column of s) • E.g. to find all customers with either an account or a loancustomer-name (depositor) customer-name (borrower)
Names of All Customers Who Have Either a Loan or an Account Union Operation customer-name (depositor) customer-name (borrower)
Set Difference Operation • Notation r – s • Defined as: r – s = {t | t rand t s} • Set differences must be taken between compatible relations. • r and s must have the same arity • attribute domains of r and s must be compatible
Set Difference Operation – Example • Relations r, s: A B A B 1 2 1 2 3 s r r – s: A B 1 1
Cartesian-Product Operation • Notation r x s • Defined as: r x s = {t q | t r and q s} • Assume that attributes of r(R) and s(S) are disjoint. (That is, R S = ). • If attributes of r(R) and s(S) are not disjoint, then renaming must be used.
Cartesian-Product Operation-Example A B C D E Relations r, s: 1 2 10 10 20 10 a a b b r s r x s: A B C D E 1 1 1 1 2 2 2 2 10 10 20 10 10 10 20 10 a a b b a a b b
Composition of Operations • Can build expressions using multiple operations • Example: A=C(r x s) • r x s • A=C(r x s) A B C D E 1 1 1 1 2 2 2 2 10 10 20 10 10 10 20 10 a a b b a a b b A B C D E 10 10 20 a a b 1 2 2
Rename Operation • Allows us to refer to a relation by more than one name. Example: x (E) returns the expression E under the name X If a relational-algebra expression E has arityn, then x(A1, A2, …, An)(E) returns the result of expression E under the name X, and with the attributes renamed to A1, A2, …., An.
Banking Example branch (branch-name, branch-city, assets) customer (customer-name, customer-street, customer-city) account (account-number, branch-name, balance) loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number)
Example Queries • Find all loans of over $1200 • amount> 1200 (loan) loan • Find the loan number for each loan of an amount greater than $1200 • loan-number (amount> 1200 (loan))
Example Queries • Find the names of all customers who have a loan, an account, or both, from the bank • customer-name (borrower) customer-name (depositor) • Find the names of all customers who have a loan and an • account at bank. • customer-name (borrower) customer-name (depositor)
Example Queries • Find the names of all customers who have a loan at the Perryridge branch. customer-name (branch-name=“Perryridge” (borrower.loan-number = loan.loan-number(borrower x loan))) • Find the names of all customers who have a loan at the Perryridge branch but do not have an account at any branch of the bank. customer-name (branch-name = “Perryridge” (borrower.loan-number = loan.loan-number(borrower x loan))) – customer-name(depositor)
Result of branch-name = “Perryridge” (borrower loan) customer-name (branch-name = “Perryridge” (borrower.loan-number = loan.loan-number(borrower x loan))) customer-name (branch-name = “Perryridge” (borrower.loan-number = loan.loan-number(borrower x loan))) – customer-name(depositor)
Customers With An Account But No Loan customer-name(depositor)- customer-name(borrower)
Example Queries Find the largest account balance • Rename account relation as d • The query is: balance(account) - account.balance(account.balance < d.balance (account x rd (account))) account.balance(account.balance < d.balance(account x rd (account)) (account x rd (account) Duplicate removed Result - =
Formal Definition • Let E1 and E2 be relational-algebra expressions; the following are all relational-algebra expressions: • E1 E2 • E1 - E2 • E1 x E2 • p (E1), P is a predicate on attributes in E1 • s(E1), S is a list consisting of some of the attributes in E1 • x(E1), x is the new name for the result of E1
Additional Operations We define additional operations that do not add any power to the relational algebra, but that simplify common queries. • Set intersection • Natural join • Division • Assignment
Set-Intersection Operation • Notation: r s • Defined as: • rs ={ t | trandts } • Assume: • r, s have the same arity • attributes of r and s are compatible • Note: rs = r - (r - s)
Set-Intersection Operation - Example • Relation r, s: • r s A B A B 1 2 1 2 3 r s A B 2
Natural-Join Operation • Notation: r s • Let r and s be relations on schemas R and S respectively. Then, r s is a relation on schema R S obtained as follows: • Consider each pair of tuplestr from r and ts from s. • If tr and ts have the same value on each of the attributes in RS, add a tuplet to the result, where • t has the same value as tr on r • t has the same value as ts on s • Example: R = (A, B, C, D) S = (E, B, D) • Result schema = (A, B, C, D, E) • rs is defined as:r.A, r.B, r.C, r.D, s.E (r.B = s.Br.D = s.D (r x s))
r s Natural Join Operation – Example • Relations r, s: B D E A B C D 1 3 1 2 3 a a a b b 1 2 4 1 2 a a b a b r s A B C D E 1 1 1 1 2 a a a a b