390 likes | 512 Views
Chapter 2: Relational Algebra. -- Introduction to database principles Maoying Wu ( bi203.sjtu@gmail.com ) March 6, 2013. Relation terminology. Learning strategies. Know how => know why 适当的不求甚解. After-class assignment 1. Install MySQL on your own computer
E N D
Chapter 2: Relational Algebra -- Introduction to database principles Maoying Wu (bi203.sjtu@gmail.com) March 6, 2013
Learning strategies • Know how => know why • 适当的不求甚解
After-class assignment 1 • Install MySQL on your own computer • Linux: from source package or LAMP • Windows: from rebuilt package or XAMP • Learn some basic knowledge on HTML/PHP and B/S development • Review of C programming skills
DML (数据操作语言)Data Manipulation Language • Language for accessing and manipulating the data organized by appropriate data model • aka, query language (查询语言) • Two classes of languages • procedural (过程性): specifies what data is required and how to get these data • declarative (声明性): specifies what data is required without knowing how to get these data • SQL: most widely used query language
DDL (数据定义语言)Data-defintion Language • Specified notation for defining the database schema • Example: create table account( account_no char(10), balance integer) • DDL compiler generates a set of tables stored in a data dictionary (数据字典) • Data dictionarycontains metadata (元数据) • Database schema • Data storage and definition language • specifies the storage structure and access methods (存储结构和访问方法) • Integrity constraints • domain constraints (类型约束) • referential integrity (参考完整性) • assertions (声明约束) • Authorization (权限)
SQL (结构化查询语言)Structural Query Language • Principal language to describe and manipulate relational databases • SQL-99 standard • Two aspects: • Data-Definition language (DDL) for declaring database schemas • Data-Manipulation language (DML) for querying and modifying
SQL: 3 kinds of relations • tables (表):relations stored in the database allowing modification as well queries • views (视图): relations defined by a computation, constructed when needed • temporary tables (临时表): constructed by SQL processor when performing queries.
Relational Algebra -- Operations on relations, construct new relations from old ones
Four classes of operations • set operations: union, intersection, and difference • removing: “selection” eliminates some rows (tuples), and “projection” eliminates some columns • relation combination: Cartesian product, join • renaming: changing the relation schema, i.e., the names of the attributes and/or the name of relation itself.
Outlines • Procedural language (过程性语言) • Six basic operators (基本操作) • Select: σ (选择) • Project: Π (投影) • Union: ᴜ (并) • Set difference: - (差) • Cartesian product: × (笛卡尔积) • rename: ρ (重命名) • The operators take one or two relations as inputs and produce a new relation as a result
Select Operator (选择) • Relation r • σ (A=B)Ʌ(D>5) (r)
select operator • Notation: σ p(r) • p is called the selection predicate (选择谓词) • Defined as • p is a formula in propositional calculus (命题演算) consisting of terms connected by: ˄(and), ˅(or), ¬(not) • Each term can be one of • <attribute> op (<attribute> or <constant>) • where op can be: =, ≠, ≤, ≥, >, < • Example of selection: • σ name=“Charlie” (account)
Project operator: example • Relation r: • ΠA,C(r):
Project operator (投影) • Notation: ΠA1,A2,…, Ak(r) • A1, …, Ak are the attribute names and r is a relation name • The result is defined as the relation of k columns obtained by erasing the columns that are not selected • Duplicate rows are removed from result, since relation are sets • Example: To eliminate the sname attribute of relation exam: • Πsid, score(exam)
union operator: example • relation r, s: • r ᴜ s:
union operator • Notation: r ᴜ s • Defined as: • For union operation to be valid, • r, s must have the same arity (same number of attributes) • the attribute domains must be compatible (each column of the two operands should have the same data type) • Example: find all students with A or Not pass
set difference operator • Relation r, s • r – s
set difference operator • Notation: r – s • Defined as: • Set differences must be taken between compatible relations • r and s must have the same arity • attribute domains of r and s must be compatible
Cartesian product - example • Relation r, s: • r × s:
Cartesian product • Notation: r ×s • Defined as: • Assume that attributes of r(R) and s(S) are disjoint. That is, R Ʌ S = ø • If attributes of r(R) and s(S) are not disjoint, then rename must be used.
combination of operations • a single expression can contain multiple operations • Example: • r × s • σA=C(r × s)
Renaming operator • Renaming operator allows us to name, and therefore to refer to the results of relational-algebra expression (关系代数表达式) • Allows us to refer to a relation by more than one name. • Example: • ρX(E) returns the expression E under the name X • If a relational algebra expression E has arity n, then ρX(A1,A2,…, An)(E) returns the result of expression E under the name X, and with the attributes renamed to A1, A2, …, An
Banking example • branch(branch_name, branch_city, assets) • customer(customer_name, customer_street, customer_city) • account(account_number, branch_name, balance) • loan(loan_number, branch_name, amount) • depositor(customer_name, account_number) • borrower(customer_name, loan_number)
Examples • Find loans over $1200: • σamount>1200(loan) • Find the loan numbers for loans with amount over 1200: • Πloan_number(σamount>1200(loan)) • Find the customer names for all who have a loan, an account, or both, from the bank: • Πcustomer_name(borrower) ᴜ Πcustomer_name(depositor)
Example queries • Find the names of all customers who have a loan at the Shanghai branch: Πcustomer_name(σbranch_name=“Shanghai”(σborrower.loan_number=loan.loan_number(borrower×loan))) • Find the names of all customers who have a loan at the Shanghai branch but do not have an account at any branch of the bank: Πcustomer_name(σbranch_name=“Shanghai”(σborrower.loan_number=loan.loan_number(borrower×loan)))- Πcustomer_name(depositor)
Two different strategies • find the names of all customers who have a loan at the Shanghai branch (1) Πcustomer_name(σbranch_name=“Shanghai”(σborrower.loan_number=loan.loan_number(borrower×loan))) (2) Πcustomer_name(σborrower.loan_number=loan.loan_number(σbranch_name=“Shanghai”(loan))×borrower))
Example queries • find the largest account balance • Strategy: • Find those balances that are not the largest (a) rename account as d so that we can compare each account balance with all others • use set difference to find those account balances that were not found in the previous step Πbalance(account) – Πaccount.balance(σaccount.balance < d.balance(account ×ρd(account)))
Formal definition • A basic expression in the relational algebra consists of either one of the following: • A relation in the database • A constant relation • Let E1 and E2 be relational-algebra expressions; the following are all relational-algebra expressions: • E1∪E2 (并) • E1 – E2 (差) • E1 x E2 (笛卡尔积) • σp(E1), p is a predicate on attributes in E1 (选择) • Πs(E1), s is a list of some of the attributes in E1 (投影) • ρx(E1), X is the new name for the result of E1 (更名)
Additional operations • Set intersection (交) • Natural join (自然连接) • Division (除) • Assignment (赋值) • We must keep in mind that additional operations do NOT add anything to relational algebra, but that simplify some common queries
set intersection - example • Relation r, s • r∩s:
Set intersection • Notation: r∩s • Defined as: • Assume: • r, s have the same arity • attributes of r and s are compatible • Note: r∩s = r – (r-s)
Natural join • Notation: • Let r and s be relations on schemas R and S, respectively. Then is a relation on schema R ∪S obtained as follows: • Consider each pair of tuplestr from r and ts from s. • If tr and ts have the same value on each of the attributes in R∩S, add a tuple t to the result, where t has the same value as tr on r and t has the same value as ts on s • Example: • R= (A, B, C, D) • S= (E, B, D) • Result schema = (A, B, C, D, E)
Natural join - example • Relation r, s: • :
Division operator (除法) • Notation: r ÷s • Suitable for queries that include the phrase “for all” • Let r and s be relations on schemas R and S, where • R = (A1, …, Am, B1, …, Bn) • S = (B1, …, Bn) • The result of r ÷s is a relation on schema R – S = (A1, A2, …, Am) • where tu means the concatenation of tuples t and u to produce a single tuple
Division example 1 • r ÷s:
Division example 2 • r ÷s:
Assignment operator • The assignment operator (←) provides a convenient way to expression complex queries • write query as a sequential program consisting of a series of assignments • followed by an expression whose value is displayed as a result of the query • Assignment must always be made to a temporary relation variable • Example: write r÷s as • temp1 ← ΠR-S(r) • temp2 ← ΠR-S((temp1×s) - ΠR-S,S(r) • result = temp1 – temp2