910 likes | 1.09k Views
This chapter discusses the structure of relational databases, the basics of relational algebra, and the concepts of relations, attributes, and operations. It also covers the use of query languages such as Relational Algebra.
E N D
Chapter 2: Relational Model • Structure of Relational Databases • Relational Algebra
Basic Structure • Formally, given sets D1, D2, …. Dn a relation r is a subset of D1 x D2 x … x Dn • Thus, a relation is a set of tuples (a1, a2, …, an) where each ai Di • Example: cust-name = {Jones, Smith, Curry, Lindsay}cust-street = {Main, North, Park}cust-city = {Harrison, Rye, Pittsfield} r = {(Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield)} r is a relation over cust-name x cust-street x cust-city
Relations are Unordered • In this (mathematical) context, since a relation is a set, the order of tuples is irrelevant and may be thought of as arbitrary. • In a real DBMS, tuple order is typically very important and not arbitrary.
Table vs. Relation • The Table: • The Relation: • { (A-101,Downtown,500), (A-102,Perryridge,400), (A-201,Brighton,900), : • (A-305,Round Hill,350) }
Attribute Types • Each attribute of a relation has a name. • The set of allowed values for each attribute is called the domain of the attribute. • Attribute values are required to be atomic, that is, indivisible (note how this differs from ER modeling). • Multi-valued attribute values are not atomic. • Composite attribute values are not atomic.
The Evil Value “Null” • The special value null is an implicit member of every domain. • Tuples can have a null value for some of their attributes. • A null value can be interpreted in several ways: • Value is unknown • Value does not exist • Value is known and exists, but just hasn’t been entered yet • The null value causes complications in the definition of many operations. • We shall consider their effect later.
Relation Schema • Let A1, A2, …, An be attributes. Then R = (A1, A2, …, An ) is a relation schema. Customer-schema = (customer-name, customer-street, customer-city) • Sometimes referred to as a relational schema or relational scheme. • Let R be a relation schema. Then r(R) is a relation variable. customer (Customer-schema)
Database • A database consists of multiple relations. account - account informationdepositor - depositor information, i.e., who deposits into which accounts customer - customer information • Storing all information as a single relation is possible, as in: bank(account-number, balance, customer-name, ..) • This results in: • Repetition of information (e.g. two customers own an account) • The need for null values (e.g. represent a customer without an account). • Normalization (Chapter 7) deals with how to design relational schemas.
E-R Diagramfor the Banking Enterprise • The (modified) E-R diagram from the banking enterprise:
Relational Schemesfor the Banking Enterprise • The following relational schemes result: branch (branch-name, branch-city, assets) customer (customer-name, customer-street, customer-city) account (account-number, branch-name, balance) loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number)
Query Languages • Language in which user requests information from the database. • Categories of languages • procedural • non-procedural • “Pure” languages: • Relational Algebra (primarily procedural) • Tuple Relational Calculus (non-procedural) • Domain Relational Calculus (non-procedural) • Pure languages form underlying basis of “real” query languages.
Relational Algebra • Procedural language (primarily) • Six basic operators: • select • project • union • set difference • cartesian product • rename
Relational Algebra • Each operator takes one or more relations as input and gives a new relation as a result. • Each operation defines: • Requirements or constraints on its’ parameters. • Attributes in the resulting relation, including their types and names. • Which tuples will be included in the result.
Select Operation – Example • Relation r A B C D 1 5 12 23 7 7 3 10 • A=B ^ D > 5 (r) A B C D 1 23 7 10
Select Operation • Notation: p(r) where p is a selection predicateand r is a relation (or more generally, a relational algebra expression). • Defined as: p(r) = {t | t rand p(t)} where p is a formula in propositional logic consisting of termsconnected by: (and), (or), (not), and where each term can involve the comparison operators: =, , >, , <,
Select Operation, Cont. • Example of selection:branch-name=“Perryridge”(account) • Logically, one can think of selection as performing a table scan, but technically this may or may not be the case, i.e., an index may be used.
Project Operation – Example • Relation r: A B C 10 20 30 40 1 1 1 2 • A,C (r) A C A C 1 1 1 2 1 1 2 =
Project Operation • Notation:A1, A2, …, Ak (r) where A1, A2 are attribute names and r is a relation. • The result is defined as the relation of k columns obtained by erasing the columns that are not listed. • Duplicate rows are removed from result, since relations are sets. • Example: To eliminate the branch-name attribute of account:account-number, balance (account) • Note, however, that account is not actually modified.
Union Operation – Example • Relations r, s: A B A B 1 2 1 2 3 s r r s A B 1 2 1 3
Union Operation • Notation: r s • Defined as: r s = {t | t r or t s} • For r s to be valid: • r,s must have the same arity (same number of attributes) • The attribute domains must be compatible (e.g., 2nd attribute of r deals with “the same type of values” as does the 2nd attribute of s) • Example: find all customers with either an account or a loan customer-name (depositor) customer-name (borrower)
Set Difference Operation • Relations r, s: A B A B 1 2 1 2 3 s r r – s A B 1 1
Set Difference Operation, Cont. • Notation r – s • Defined as: r – s = {t | t rand t s} • Set differences must be taken between compatible relations. • r and s must have the same arity • attribute domains of r and s must be compatible • Note that there is no requirement that the attribute names be the same. • So what about attributes names in the result? • Similarly for union.
Cartesian-Product Operation A B C D E • Relations r, s: 1 2 10 10 20 10 a a b b r s r x s: A B C D E 1 1 1 1 2 2 2 2 10 10 20 10 10 10 20 10 a a b b a a b b
Cartesian-Product Operation, Cont. • Notation r x s • Defined as: r x s = {tq | t r and q s} • The attributes of r and s are assumed to be disjoint, i.e., that R S = . • If the attributes of r and s are not disjoint, then renaming must be used. • Each attributes name has its originating relations name as a prefix. • If r and s are the same relation, then the rename operation can be used.
A B C D E 1 1 1 1 2 2 2 2 10 10 20 10 10 10 20 10 a a b b a a b b A B C D E 10 20 20 a a b 1 2 2 Composition of Operations • Expressions can be built using multiple operations r x s A=C(r x s)
Rename Operation • The rename operator allows the results of an expression to be renamed. • Example: x (E) returns the expression E under the name X • If a relational-algebra expression E has arity n, then x(A1, A2, …, An)(E) returns the result of expression E under the name X, and with the attributes renamed to A1, A2, …., An.
Formal Definition • A basic expression in relational algebra consists of one of the following: • A relation in the database • A constant relation • Let E1 and E2 be relational-algebra expressions. Then the following are all also relational-algebra expressions: • E1 E2 • E1 - E2 • E1 x E2 • p (E1), P is a predicate on attributes in E1 • s(E1), S is a list consisting of attributes in E1 • x(E1), x is the new name for the result of E1
Banking Example • Recall the relational schemes from the banking enterprise: branch (branch-name, branch-city, assets) customer (customer-name, customer-street, customer-city) account (account-number, branch-name, balance) loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number)
Example Queries • Find all loans of over $1200 (a bit ambiguous). • amount > 1200 (loan) • Find the loan number for each loan with an amount greater than $1200. • loan-number (amount> 1200 (loan))
Example Queries • Find the names of all customers who have a loan, an account, or both. • customer-name (borrower) customer-name (depositor) • Find the names of all customers who have a loan and an account. • customer-name (borrower) customer-name (depositor)
Example Queries • Find the names of all customers who have a loan at the Perryridge branch. customer-name (branch-name=“Perryridge” (borrower.loan-number = loan.loan-number(borrower x loan))) • Notes: • The two selections could have been combined into one. • The selection on branch-name could have been applied to loan first, as shown on the next slide…
Example Queries • Alternative - Find the names of all customers who have a loan at the Perryridge branch. • customer-name(loan.loan-number = borrower.loan-number( borrower x branch-name = “Perryridge”(loan))) • Notes: • What are the implications of doing the selection first? • How does a non-Perryridge borrower get eliminated? • Couldn’t the amount and the branch-name attributes be eliminated from loan after the selection? • What would be the implications?
Example Queries • Find the names of all customers who have a loan at the Perryridge branch but no account at any branch of the bank. customer-name (branch-name = “Perryridge” (borrower.loan-number = loan.loan-number (borrower x loan))) – customer-name(depositor)
Example Queries • Find the largest account balance. • Requires a cartesian product between account and itself, resulting in ambiguity of attribute names. • Resolved by renaming one instance of the account relation as d. balance(account)– account.balance(account.balance < d.balance(account x rd(account)))
Additional Operations • The following operations do not add any power to relational algebra, but simplify common queries. • Set intersection • Natural join • Theta join • Division • Assignment • All of the above can be defined in terms of the six basic operators.
Set-Intersection Operation • Notation: r s • Defined as: rs = { t | trandts } • Assume: • r, s have the same arity • attributes of r and s are compatible • Note: rs = r - (r - s)
A B A B 1 2 1 2 3 A B 2 Set-Intersection Operation, Cont. • Relation r, s: • r s r s
Natural-Join Operation • Notation: r s • Let r and s be relations on schemas R and S respectively. • r s is a relation that: • Has all attributes in R S • For each pair of tuples tr and ts from r and s, respectively, if tr and ts have the same value on all attributes in RS, add a tuple t to the result, where: • t has the same value as tr on attributes in R • t has the same value as ts on attributes in S
Natural-Join Example • Relational schemes for relations r and s, respectively: R = (A, B, C, D) S = (E, B, D) • Resulting schema for rs : (A, B, C, D, E) • rs is defined as:r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))
Natural Join Example • Relations r, s: • Contents of r s: B D E A B C D 1 3 1 2 3 a a a b b 1 2 4 1 2 a a b a b r s A B C D E 1 1 1 1 2 a a a a b
Natural Join – Another Example • Find the names of all customers who have a loan at the Perryridge branch. Original Expression: Using the Natural Join Operator: • Specifying the join explicitly makes it look nicer, plus it helps the query optimizer. customer-name (branch-name=“Perryridge” (borrower.loan-number = loan.loan-number(borrower x loan))) customer-name(branch-name = “Perryridge”(borrower loan))
Theta-Join Operation • Notation: r θ s • Let r and s be relations on schemas R and S respectively. • Then, r θ s is a relation that: • Has all attributes in R S including duplicates. • For each pair of tuples tr and ts from r and s, respectively, if θ evaluates to true for tr and ts, then add a tuple t to the result, where: • t has the same value as tr on attributes in R • t has the same value as ts on attributes in S • rθ s is defined as:θ (r x s)
Theta-Join Example • Example: R = (A, B, C, D) S = (E, B, D) • Resulting schema: (r.A, r.B, r.C, r.D, s.E, s.B, s.D)
Theta Join – Another Example • Consider the following relational schemes: Exam = (Exam#, Average) Score = (SS#, Exam#, Grade) • Find the SS#s for those students who scored less than average on some exam. Along with the SS# list the exam#, exam average, and the students’ grade. • Score.SS#, Exam.Exam#, Exam.Average, Score.Grade ( • Exam Exam.Exam# = Score.Exam# Exam.Average > Score.Grade Score)
Division Operation • Notation: • Suited to queries that include the phrase “for all” • Let r and s be relations on schemas R and S respectively where S R, and let the attributes of R and S be: R = (A1, …, Am, B1, …, Bn) S = (B1, …, Bn) The result of r s is a relation on schema R – S = (A1, …, Am) where: r s = { t | t R-S(r) u s ( tu r ) } r s
Division – Example #1 Relations r, s: r s: A B B A 1 2 1 2 3 1 1 1 3 4 6 1 2 s r
Division – Example #2 Relations r, s: r s: D E A B C A B C D E a a a b 1 1 a a a a a a a a a a b a b a b b 1 1 1 1 3 1 1 1 A B C s r
Division – Example #3 Relations r, s: r s: A B C D E B D A C E a a a a a a a a a a b a b a b b 1 1 1 1 3 1 1 1 a a a b 1 1 s r
Division Operation (Cont.) • Let r(R) and s(S) be relations, and let S R. Then: r s = R-S (r) – R-S ( (R-S(r) x s) – R-S,S(r)) To see why: • R-S,S(r) simply reorders attributes of r • R-S(R-S(r) x s) – R-S,S(r)) gives those tuples t in R-S(r) such that for some tuple u s, tu r. • Property: • Let q = r s • Then q is the largest relation satisfying q x s r