1 / 41

ECE 569 Database System Engineering Fall 2004

ECE 569 Database System Engineering Fall 2004. Yanyong Zhang www.ece.rutgers.edu/~yyzhang Course URL www.ece.rutgers.edu/~yyzhang/fall04. Today’s topic . Relational data model Different data models exist: relational, network, hierarchical, object-oriented Powerful, simple, declarative

Download Presentation

ECE 569 Database System Engineering Fall 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 569 Database System EngineeringFall 2004 Yanyong Zhang www.ece.rutgers.edu/~yyzhang Course URL www.ece.rutgers.edu/~yyzhang/fall04

  2. Today’s topic • Relational data model • Different data models exist: relational, network, hierarchical, object-oriented • Powerful, simple, declarative • Relational algebra

  3. Overview of Database Design • Conceptual design (ER model is used at this stage) • What are the entities and relationships in the enterprise? • What information about these entities and relationships should we store in the database? • What are the integrity constraints or business rules that hold • A database ‘schema’ in the ER model can be represented pictorially • Can map an ER diagram into a relational schema

  4. Entity-Relationship model • To describe the conceptual scheme. • An entity is a thing that exists and is distinguishable. • Entity sets consist of a group of all “similar” entities. • Each entity has certain attributes. • A relationship among entity sets is an ordered list of entity sets. • Relationship set: collection of similar relationships • An n-ary relationship set R relates n entity sets E1, …, En; each relationship in R involves entities e1 E1, …, en En • Same entity set could participate in different relationship sets, or in different “roles” in same set

  5. Example of entity-relationship model Date Payment Balance Account Made To Amount Billed Pulse Name Vital Sign From Room# Time Patient Address Blood Diagnosed Treatment Disease Name

  6. Data Model • A data model is a mathematical formalism with two parts: • A notation for describing data • A set of operations used to manipulate that data

  7. Why Study the Relational Model? • Most widely used model • Vendors: IBM, informix, Microsoft,Oracle,Sybase,etc. • “legacy systems” in older models • e.g., IBM’s IMS • Recent competitor: object-oriented model • ObjectStore, Versant, Ontos • A synthesis emerging: object-relational model • Informix Universal server, UniSQL, O2, Oracle, DB2

  8. Relational Database: Definitions • Relational database: a set of relations • Relation: made up of 2 parts • Instance: a table, with rows and columns • # Rows = cardinality, # columns = degree • Schema: specifies name of relation, plus name and type of each column • e.g., students(sid:string, name:string, login:string,age:integer, gpa:real) • Can think of a relation as a set of rows or tuples (i.e., all rows are distinct)

  9. Set-theoretic notion of a relation • A domainD is a set of values • colors = {red, blue, green} • age = set of positive integers less than 20 • last_name = {a-zA-Z}+ • A relation over domains D1,D2,…,Dn is a subset of D1D2D3…Dn. • Example • R(colors, age) {(red,1), (blue,1),(green,1), (red,2), (blue,2),(green,2), …}

  10. Set-theoretic notion of a relation (cont) • We are only interested in finite relations. • The members of a relation are called tuples. • Each relation that is a subset of the product of k domains is k-degree. • A relation can be represented as a table, where each row is a tuple and each column corresponds to one domain. PERSON (NAME, AGE)

  11. Set-theoretic notion of a relation (cont) • Because a relation is a set of tuples, the order of tuples in the table is insignificant (but the order of domains does matter).  

  12. An alternative definition – set of mappings • A relation schema R = {A1,A2,…,An} (no order imposed on attributes) • A relation instance of R is a finite set of mappings {1, 2, …, m} where 2: R  D where D = null  dom(A1)  dom(A2)…  dom(An). • Each tuple i maps each attribute in tuple i to a value. • Note that attributes must be given names that are unique in a relation schema. • Under this definition, all three instances above are equivalent.

  13. Keys • A set S of attributes is a candidate key for R if • No semantically correct instance of R can include two tuples    such that [s] = [s], and • No proper subset of S satisfies (a) • A set of attributes satisfying (a) but not necessarily (b) is a superkey. • A relation may have multiple candidate keys. One can be designated as the primary key.

  14. Example • Schema for a relation of degree four patient(name,address,balance_due,room#) • An instance of patient with three tuples that satisfies the key constraint (name is a candidate key) No other candidate key is possible (assuming that this is a semantically correct instance of patient).

  15. Example – Healthcare DB patient(name, address, balance_due, room#) payments(name, amount, date) vital_signs(name, pulse, blood, date, time) diagnosis(patient_name, disease_name) disease(disease_name, treatment) treats(patient, doctor)

  16. Example instance of students relation

  17. Relational Query Language • A major strength of the relational model: supports simple, powerful querying of data • Queries can be written intuitively, and the DBMS is responsible for efficient evaluation • Key: precise semantics for relational queries • Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change

  18. The SQL Query Language • Developed by IBM (system R) in the 1970s • Need for a standard since it is used by many vendors • Standards • SQL-86 • SQL-89 (minor revision) • SQL-92 (major revision) • SQL-99 (major extensions, current standard)

  19. The SQL Query Language • To find all 18 year old students, we can write • To find just names and logins, replace the first line: select * from student S where s.age=18 select s.name, s.login

  20. Logical DB design • Representing entity-relationship diagrams • An entity set E can be represented by a relation whose relation scheme consists of all the attributes of the entity set. • Each tuple of the relation represents one entity in the current instance of E. • A relationship R among E1, E2, …, Ek is represented by a relation whose relation scheme consists of the attributes in the keys for each of E1, E2, …, Ek.

  21. Relational Query language • Query language: Allow manipulation and retrieval of data from a database • Relational model supports simple, powerful QLs: • Strong formal foundation based on logic • Allows for much optimization • Query language != programming language

  22. Formal Relational Query Languages • Two mathematical query languages form the basis for “real” languages (e.g., SQL) and for implementation: • Relational algebra: more operational, very useful for representing execution plans • Relational calculus: let users describe what they want, rather than how to compute it (Non-operational, declarative)

  23. Relational Algebra • A set of five basic operations that map one or more relations to a new relation. • Union: R  S  t  tR or tS  • The set of tuples in R or S or both • The degrees of R and S must match S R T R  T

  24. Relational Algebra (cont) • Difference: R - S  t  tR and tS  • The set of tuples in R, but not in S • The degrees of R and S must match R - T S R T

  25. Relational Algebra (cont) • Cartesian Product: R X S   (a1, a2, …, ar+s) (a1, a2, …, ar) R and (ar+1, ar+2, …, ar+s)  S  • R x S is of degree r+s where r=degree(R) and s=degree(s) • a tuple t is in R x S if its first r components match those of a tuple in R and its last s components from a tuple in S R.A R.B R.C S.D S.E S R T R x S

  26. Relational Algebra (cont) • Projection: i1, i2, …, im (R)   (ai1, ai2, …, aim) (a1, a2, …, an) R • i1, i2, …, im (R) is of degree m • A tuple t in the result is obtained by removing and/or rearranging the attributes of a tuple of R S R 2,1 (T) T

  27. Relational Algebra (cont) • Selection: F(R)  t  t R and F(t) • F(R) is of the same degree as R • All tuples in R that satisfy predicate F are included in result • F is a formula involving • Constants or components of the tuple – component i is denoted $I • Comparison operators, <, =, >, , ,  • Logical operators and, or, not $1=d or $2=$3 (T) T

  28. Examples • Find the names of all patients with a balance of more than $10,000. • Find the names of all patients that have a pulse less than 50 or have been diagnosed with hypertension.

  29. Examples (cont) • Find all patients whose treatment includes the “application of leeches” • Print the names of all pairs of patients that occupy the same hospital room and have been diagnosed with the same disease.

  30. Examples (cont) • Print the name of the patient with the highest pulse recorded today • Find all patients that had pulse measured today • Find the set of all ‘losers’, i.e., those whose pulse is less than or equal to that of some other patient • Subtract (b) from (a) and project out patient name

  31. Examples (cont) • List all patients that suffer from at least one illness that no other patient suffers from

  32. Additional Operations • Intersection: R  S  t  tR and tS  • The set of tuples in both R and S • The degrees of R and S must match • R  S = R – (R – S) S R R – S R – ( R – S)

  33. Additional Operations (cont) • Theta-join: R ij S  $i  $r+j(R  S) where r = degree(R) and    <, =, >, , ,   • When  is ‘=‘ operation is called equijoin. • Find patient with highest pulse

  34. Additional Operations (cont) • Natural Join: R S • An equijoin for all attributes that R and S have in common. • The shared columns originating in S are projected out • R S   i1, i2, …, im  R.A1 = S. A1 and … and R.Ak = S.Ak (R x S) where i1, i2, … im is the list of all attributes of R x S, in order, except the attributes S.A1, …, S.Ak. • Example: Names of patients being treated with leeches.

  35. Sample Queries • Consider the following set of relations frequents (drinker, bar) serves (bar, beer) likes (drinker, beer) • List the drinkers that frequent at least one bar that serves a beer they like • Construct the relation should_visit(drinker,bar) consisting of all tuples <d, b> where bar b serves a beer that drinker d likes.

  36. Sample Queries (cont) • List the drinkers that frequent only bars that serve some beer they like. (Assume every drinker frequents at least one bar.) • Print the drinkers that frequent no bar that serves a beer that they like

  37. Questions on Healthcare DB • Retrieve all pairs of patients that shared a room at the same time • List all patients that finished paying their bill before one of their roomates began paying theirs. • Print the names of all pairs of patients that occupy the same hospital room and have been diagnosed with the same disease.

  38. Components of Data-Intensive Systems • Three separate types of functionalities • Data management • Application logic • Presentation • The system architecture determines whether these three components reside on a single system (“tier”) or are distributed across several tiers

  39. Single-tier architecture • All functionality combined into a single tier, usually on a mainframe • Users access through dumb terminals • Advantages: • Easy maintenance and administration • Disadvantages: • Today, users expect graphical user interface • Centralized computation of all of them is too much for a central system

  40. Client-Server Architecture • Thin client: work division • Client implements only the graphical user interface • Server implements business logic and data management • Thick client: • work division • Client implements both the graphical user interface and the business logic • Server implements data management • Disadvantages: • No central place to update the business logic • Security issues: server needs to trust clients • Scalability: does not scale more than several 100s of clients

  41. The three-tier architecture Client program (web browser) Application Server (Tomcat, Apache) Database System (DB2)

More Related