2.02k likes | 2.15k Views
Temple University – CIS Dept. CIS616– Principles of Database Systems. V. Megalooikonomou Relational Model (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU). Overview. history concepts Formal query languages relational algebra rel. tuple calculus
E N D
Temple University – CIS Dept.CIS616– Principles of Database Systems V. Megalooikonomou Relational Model (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)
Overview • history • concepts • Formal query languages • relational algebra • rel. tuple calculus • rel. domain calculus
History • before: records, pointers, sets etc • introduced by E.F. Codd (1923-2003) in 1970 • revolutionary!!! • first systems: 1977-8 (System R; Ingres) • Turing award in 1981
Concepts • Database: a set of relations (= tables) • rows: tuples • columns: attributes (or keys) • superkey, candidate key, primary key
Example Database:
Example: cont’d Database: k-th attribute (Dk domain) rel. schema (attr+domains) tuple
Example: cont’d rel. schema (attr+domains) instance
Example: cont’d • Di: the domain of the I-th attribute (eg., char(10) • Formally: an instance is a subset of (D1 x D2 x …x Dn) rel. schema (attr+domains) instance
Example: cont’d • superkey (eg., ‘ssn , name’): determines record • cand. key (eg., ‘ssn’, or ‘st#’): minimal superkey (no subset of it is a superkey) • primary key: one of the cand. keys
Another example • Example: if Customer-name = {Jones, Smith, Curry, Lindsay}Customer-street = {Main, North, Park}Customer-city = {Harrison, Rye, Pittsfield} Then r = { (Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield)} is a relation over Customer-name x Customer-street x Customer-city
Relations, tuples • Relation Schema: • A1, A2, …, Anare attributes • R = (A1, A2, …, An ) is a relation schema E.g. Customer-schema = (customer-name, customer-street, customer-city) • r(R) is a relation on the relation schema R E.g. customer (Customer-schema) • Relations are unordered • Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
Database • A database consists of multiple relations • Information about an enterprise is broken up into parts, with each relation storing one part of the informationE.g.: account : stores information about accountsdepositor : stores information about which customer owns which account customer : stores information about customers • Storing all information as a single relation such as bank(account-number, balance, customer-name, ..)results in • repetition of information (e.g. two customers own an account) • the need for null values (e.g. represent a customer without an account) • Normalization theory (discuss later) deals with how to design relational schemas
Overview • history • concepts • Formal query languages • relational algebra • rel. tuple calculus • rel. domain calculus
Formal query languages • How do we collect information? • Eg., find ssn’s of people in cis331 • (recall: everything is a set!) • One solution: Relational algebra, i.e., set operators (procedural language) • Q1: Which operators?? • Q2: What is a minimal set of operators?
Relational operators • . • . • . • set union U • set difference ‘-’
Example: • Q: find all students (part or full time) • A: PT-STUDENT union FT-STUDENT
Observations: • two tables are ‘union compatible’ if they have the same attributes (i.e., same arity: number of attributes and same ‘domains’) • Q: how about intersection ? U
Observations: • A: redundant: • STUDENT intersection STAFF = STUDENT - (STUDENT - STAFF) STAFF STUDENT
Relational operators • . • . • . • set union • set difference ‘-’ U
Other operators? • E.g., find all students on ‘Main street’ • A: ‘selection’
Other operators? • Notice: selection (and rest of operators) expect tables, and produce tables • --> can be cascaded!! • For selection, in general:
Selection - examples • Find all ‘Smiths’ on ‘Forbes Ave’ ‘condition’ can be any boolean combination of ‘=‘, ‘>’, ‘>=‘, ...
Relational operators • selection • . • . • set union • set difference R - S R U S
Relational operators • selection picks rows - how about columns? • A: ‘projection’ - eg.: finds all the ‘ssn’ - removing duplicates
Relational operators Cascading: ‘find ssn of students on ‘forbes ave’
Relational operators • selection • projection • . • set union • set difference R - S R U S
Relational operators Are we done yet? Q: Give a query we can not answer yet!
TAKES SSN c-id grade 123 cis331 A 234 cis331 B Relational operators A: any query across two or more tables, eg., ‘find names of students in cis351’ Q: what extra operator do we need?? A: surprisingly, the cartesian product is enough!
Cartesian product • E.g., dog-breeding: MALE x FEMALE • gives all possible couples = x
TAKES SSN c-id grade 123 cis331 A 234 cis331 B so what? • Eg., how do we find names of students taking cis351?
Ssn Name Address ssn cid grade 123 smith main str 123 cis331 A 234 jones forbes ave 123 cis331 A 123 smith main str 234 cis331 B 234 jones forbes ave 234 cis331 B Cartesian product • A:
Cartesian product Ssn Name Address ssn cid grade 123 smith main str 123 cis331 A 234 jones forbes ave 123 cis331 A 123 smith main str 234 cis331 B 234 jones forbes ave 234 cis331 B
Ssn Name Address ssn cid grade 123 smith main str 123 cis331 A 234 jones forbes ave 123 cis331 A 123 smith main str 234 cis331 B 234 jones forbes ave 234 cis331 B
FUNDAMENTALRelational operators • selection • projection • cartesian product MALE x FEMALE • set union • set difference R - S R U S
Relational ops • Surprisingly, they are enough, to help us answer almost any query we want!! • derived operators, for convenience • set intersection • join (theta join, equi-join, natural join) • ‘rename’ operator • division
Joins • Equijoin:
Cartesian product • A: Ssn Name Address ssn cid grade 123 smith main str 123 cis331 A 234 jones forbes ave 123 cis331 A 123 smith main str 234 cis331 B 234 jones forbes ave 234 cis331 B
Joins • Equijoin: • theta-joins: generalization of equi-join - any condition
Joins • very popular: natural join: RS • like equi-join, but it drops duplicate columns: STUDENT(ssn, name, address) TAKES(ssn, cid, grade)
Joins • nat. join has 5 attributes Ssn Name Address ssn cid grade 123 smith main str 123 cis331 A 234 jones forbes ave 123 cis331 A 123 smith main str 234 cis331 B 234 jones forbes ave 234 cis331 B equi-join: 6
Natural Joins - nit-picking • if no attributes in common between R, S: • nat. join -> cartesian product:
Overview - rel. algebra • fundamental operators • derived operators • joins etc • rename • division • examples
rename op. • Q: why? • A: • Shorthand (BEFORE can be a relational algebra expression) • self-joins; … • for example, find the grand-parents of ‘Tom’, given PC(parent-id, child-id)
rename op. • PC(parent-id, child-id)
rename op. • first, WRONG attempt: • (why? how many columns?) • Second WRONG attempt:
rename op. • we clearly need two different names for the same table - hence, the ‘rename’ op.
Overview - rel. algebra • fundamental operators • derived operators • joins etc • rename • division • examples
Division • Rarely used, but powerful. • Suited for queries that include the phrase “for all” • Example: find suspicious suppliers, i.e., suppliers that supplied all the parts in A_BOMB
Division • Observations: ~reverse of cartesian product • It can be derived from the 5 fundamental operators (!!) • How?