410 likes | 421 Views
ECE 569 Database System Engineering Fall 2004. Yanyong Zhang www.ece.rutgers.edu/~yyzhang Course URL www.ece.rutgers.edu/~yyzhang/fall04. Today’s topic . Relational data model Different data models exist: relational, network, hierarchical, object-oriented Powerful, simple, declarative
E N D
ECE 569 Database System EngineeringFall 2004 Yanyong Zhang www.ece.rutgers.edu/~yyzhang Course URL www.ece.rutgers.edu/~yyzhang/fall04
Today’s topic • Relational data model • Different data models exist: relational, network, hierarchical, object-oriented • Powerful, simple, declarative • Relational algebra
Overview of Database Design • Conceptual design (ER model is used at this stage) • What are the entities and relationships in the enterprise? • What information about these entities and relationships should we store in the database? • What are the integrity constraints or business rules that hold • A database ‘schema’ in the ER model can be represented pictorially • Can map an ER diagram into a relational schema
Entity-Relationship model • To describe the conceptual scheme. • An entity is a thing that exists and is distinguishable. • Entity sets consist of a group of all “similar” entities. • Each entity has certain attributes. • A relationship among entity sets is an ordered list of entity sets. • Relationship set: collection of similar relationships • An n-ary relationship set R relates n entity sets E1, …, En; each relationship in R involves entities e1 E1, …, en En • Same entity set could participate in different relationship sets, or in different “roles” in same set
Example of entity-relationship model Date Payment Balance Account Made To Amount Billed Pulse Name Vital Sign From Room# Time Patient Address Blood Diagnosed Treatment Disease Name
Data Model • A data model is a mathematical formalism with two parts: • A notation for describing data • A set of operations used to manipulate that data
Why Study the Relational Model? • Most widely used model • Vendors: IBM, informix, Microsoft,Oracle,Sybase,etc. • “legacy systems” in older models • e.g., IBM’s IMS • Recent competitor: object-oriented model • ObjectStore, Versant, Ontos • A synthesis emerging: object-relational model • Informix Universal server, UniSQL, O2, Oracle, DB2
Relational Database: Definitions • Relational database: a set of relations • Relation: made up of 2 parts • Instance: a table, with rows and columns • # Rows = cardinality, # columns = degree • Schema: specifies name of relation, plus name and type of each column • e.g., students(sid:string, name:string, login:string,age:integer, gpa:real) • Can think of a relation as a set of rows or tuples (i.e., all rows are distinct)
Set-theoretic notion of a relation • A domainD is a set of values • colors = {red, blue, green} • age = set of positive integers less than 20 • last_name = {a-zA-Z}+ • A relation over domains D1,D2,…,Dn is a subset of D1D2D3…Dn. • Example • R(colors, age) {(red,1), (blue,1),(green,1), (red,2), (blue,2),(green,2), …}
Set-theoretic notion of a relation (cont) • We are only interested in finite relations. • The members of a relation are called tuples. • Each relation that is a subset of the product of k domains is k-degree. • A relation can be represented as a table, where each row is a tuple and each column corresponds to one domain. PERSON (NAME, AGE)
Set-theoretic notion of a relation (cont) • Because a relation is a set of tuples, the order of tuples in the table is insignificant (but the order of domains does matter).
An alternative definition – set of mappings • A relation schema R = {A1,A2,…,An} (no order imposed on attributes) • A relation instance of R is a finite set of mappings {1, 2, …, m} where 2: R D where D = null dom(A1) dom(A2)… dom(An). • Each tuple i maps each attribute in tuple i to a value. • Note that attributes must be given names that are unique in a relation schema. • Under this definition, all three instances above are equivalent.
Keys • A set S of attributes is a candidate key for R if • No semantically correct instance of R can include two tuples such that [s] = [s], and • No proper subset of S satisfies (a) • A set of attributes satisfying (a) but not necessarily (b) is a superkey. • A relation may have multiple candidate keys. One can be designated as the primary key.
Example • Schema for a relation of degree four patient(name,address,balance_due,room#) • An instance of patient with three tuples that satisfies the key constraint (name is a candidate key) No other candidate key is possible (assuming that this is a semantically correct instance of patient).
Example – Healthcare DB patient(name, address, balance_due, room#) payments(name, amount, date) vital_signs(name, pulse, blood, date, time) diagnosis(patient_name, disease_name) disease(disease_name, treatment) treats(patient, doctor)
Relational Query Language • A major strength of the relational model: supports simple, powerful querying of data • Queries can be written intuitively, and the DBMS is responsible for efficient evaluation • Key: precise semantics for relational queries • Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change
The SQL Query Language • Developed by IBM (system R) in the 1970s • Need for a standard since it is used by many vendors • Standards • SQL-86 • SQL-89 (minor revision) • SQL-92 (major revision) • SQL-99 (major extensions, current standard)
The SQL Query Language • To find all 18 year old students, we can write • To find just names and logins, replace the first line: select * from student S where s.age=18 select s.name, s.login
Logical DB design • Representing entity-relationship diagrams • An entity set E can be represented by a relation whose relation scheme consists of all the attributes of the entity set. • Each tuple of the relation represents one entity in the current instance of E. • A relationship R among E1, E2, …, Ek is represented by a relation whose relation scheme consists of the attributes in the keys for each of E1, E2, …, Ek.
Relational Query language • Query language: Allow manipulation and retrieval of data from a database • Relational model supports simple, powerful QLs: • Strong formal foundation based on logic • Allows for much optimization • Query language != programming language
Formal Relational Query Languages • Two mathematical query languages form the basis for “real” languages (e.g., SQL) and for implementation: • Relational algebra: more operational, very useful for representing execution plans • Relational calculus: let users describe what they want, rather than how to compute it (Non-operational, declarative)
Relational Algebra • A set of five basic operations that map one or more relations to a new relation. • Union: R S t tR or tS • The set of tuples in R or S or both • The degrees of R and S must match S R T R T
Relational Algebra (cont) • Difference: R - S t tR and tS • The set of tuples in R, but not in S • The degrees of R and S must match R - T S R T
Relational Algebra (cont) • Cartesian Product: R X S (a1, a2, …, ar+s) (a1, a2, …, ar) R and (ar+1, ar+2, …, ar+s) S • R x S is of degree r+s where r=degree(R) and s=degree(s) • a tuple t is in R x S if its first r components match those of a tuple in R and its last s components from a tuple in S R.A R.B R.C S.D S.E S R T R x S
Relational Algebra (cont) • Projection: i1, i2, …, im (R) (ai1, ai2, …, aim) (a1, a2, …, an) R • i1, i2, …, im (R) is of degree m • A tuple t in the result is obtained by removing and/or rearranging the attributes of a tuple of R S R 2,1 (T) T
Relational Algebra (cont) • Selection: F(R) t t R and F(t) • F(R) is of the same degree as R • All tuples in R that satisfy predicate F are included in result • F is a formula involving • Constants or components of the tuple – component i is denoted $I • Comparison operators, <, =, >, , , • Logical operators and, or, not $1=d or $2=$3 (T) T
Examples • Find the names of all patients with a balance of more than $10,000. • Find the names of all patients that have a pulse less than 50 or have been diagnosed with hypertension.
Examples (cont) • Find all patients whose treatment includes the “application of leeches” • Print the names of all pairs of patients that occupy the same hospital room and have been diagnosed with the same disease.
Examples (cont) • Print the name of the patient with the highest pulse recorded today • Find all patients that had pulse measured today • Find the set of all ‘losers’, i.e., those whose pulse is less than or equal to that of some other patient • Subtract (b) from (a) and project out patient name
Examples (cont) • List all patients that suffer from at least one illness that no other patient suffers from
Additional Operations • Intersection: R S t tR and tS • The set of tuples in both R and S • The degrees of R and S must match • R S = R – (R – S) S R R – S R – ( R – S)
Additional Operations (cont) • Theta-join: R ij S $i $r+j(R S) where r = degree(R) and <, =, >, , , • When is ‘=‘ operation is called equijoin. • Find patient with highest pulse
Additional Operations (cont) • Natural Join: R S • An equijoin for all attributes that R and S have in common. • The shared columns originating in S are projected out • R S i1, i2, …, im R.A1 = S. A1 and … and R.Ak = S.Ak (R x S) where i1, i2, … im is the list of all attributes of R x S, in order, except the attributes S.A1, …, S.Ak. • Example: Names of patients being treated with leeches.
Sample Queries • Consider the following set of relations frequents (drinker, bar) serves (bar, beer) likes (drinker, beer) • List the drinkers that frequent at least one bar that serves a beer they like • Construct the relation should_visit(drinker,bar) consisting of all tuples <d, b> where bar b serves a beer that drinker d likes.
Sample Queries (cont) • List the drinkers that frequent only bars that serve some beer they like. (Assume every drinker frequents at least one bar.) • Print the drinkers that frequent no bar that serves a beer that they like
Questions on Healthcare DB • Retrieve all pairs of patients that shared a room at the same time • List all patients that finished paying their bill before one of their roomates began paying theirs. • Print the names of all pairs of patients that occupy the same hospital room and have been diagnosed with the same disease.
Components of Data-Intensive Systems • Three separate types of functionalities • Data management • Application logic • Presentation • The system architecture determines whether these three components reside on a single system (“tier”) or are distributed across several tiers
Single-tier architecture • All functionality combined into a single tier, usually on a mainframe • Users access through dumb terminals • Advantages: • Easy maintenance and administration • Disadvantages: • Today, users expect graphical user interface • Centralized computation of all of them is too much for a central system
Client-Server Architecture • Thin client: work division • Client implements only the graphical user interface • Server implements business logic and data management • Thick client: • work division • Client implements both the graphical user interface and the business logic • Server implements data management • Disadvantages: • No central place to update the business logic • Security issues: server needs to trust clients • Scalability: does not scale more than several 100s of clients
The three-tier architecture Client program (web browser) Application Server (Tomcat, Apache) Database System (DB2)