1 / 35

Lecture 2: Relational Model

Lecture 2: Relational Model. Aug. 30 2006 ChengXiang Zhai. Most slides are adapted from Kevin Chang’s lecture slides. Data Model. Mathematical formalism providing: notation & structure for describing data set of operations used to manipulate the data

zuzana
Download Presentation

Lecture 2: Relational Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 2: Relational Model Aug. 30 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture slides

  2. Data Model • Mathematical formalism providing: • notation & structure for describing data • set of operations used to manipulate the data • A way of organizing a collection of facts pertaining to a system being managed • A way of thinking about the world and the phenomena • Fundamental “paradigm” of data management

  3. The State of the World: Pre Rel. Model • Network/hierarchical DBMS, 1960’s • IDS network DBMS: Bachman at GE, 1961 • IMS hierarchical DBMS: IBM in 1968 – Still exists today! • CODASYL approach to data management, 1960’s • CODASYL: Conf. Of Data System Languages, set up by US DOD, to standardize software applications • COBOL (comm. bus. oriented lang.) defined by CODASYL • ruled the business data processing world • incorporating prior data definition languages • DBTG (Database Task Group, under CODASYL), 1971 • closely aligned with COBOL • DBTG Report would standardize network model (Bachman got Turing award in 1973 for the network model)

  4. Network Model: DBTG Report • Network DB: • a collection of records • record = collection of fields • similar to an entity in E-R model • records connected by binary, many-to-one links • similar to binary relationships in E-R model • simulate one-to-one, many-to-many by many-to-one

  5. Network Model: Implementation • Student record linked to enrollment record • A lot of linkage pointers • ring-structured ptrs implements many-one links • Data manipulation is thus navigational Johnson … CS001 A+ CS308 B

  6. DBTG Query Example SQL: select name from student where dept = “CS” DBTG: student.dept = “CS”; findany student using dept; while DB-status = 0 do begin get student; print (student.name); findduplicate student using dept; end

  7. DBTG Query Example: Predicates SQL: select name from student where dept = “CS” and grade <= “C” DBTG: student.dept = “CS”; findany student using dept; while DB-status = 0 do begin get student; if student.grade <= “C” print (student.name); findduplicate student using dept; end

  8. DBTG Query Example: Navigation SQL: select E.grade from student S, enrollment E where S.name = “Johnson” and E.id = S.id DBTG: student.name = “Johnson”; findany student using name; find first enrollment within StudentEnroll while DB-status = 0 do begin get enrollment; print (enrollment.grade); findnext enrollment within StudentEnroll; end

  9. What’s Wrong?

  10. What’s Wrong? • Data dependence: Application program highly depends on the data representation • As a result, • Programming is difficult and complex • Application can become incorrect once there’s a change in data representation • Just like programming in assembly languages (as opposed to high-level programming languages)

  11. Data Dependence • Ordering dependence • Applications may rely on a particular ordering of the stored data • Indexing dependence • Applications may rely on the availability of certain indices, but indices are semantically redundant and only necessary for “optimization” • Access path dependence • Applications would hard code access paths to data, so would rely on the continued existence of the used access paths

  12. Codd’s Seminal Paper • Concepts of data independence and declarative queries • argues for more natural data-centric modeling • argues for declarative queries • Complete definition of relational model • data structure based on relations • algebra for manipulating data • Formal theoretic notions • expressive power, redundancy, and consistency • starting research on “database theories”

  13. DBMS: Levels of Abstraction?? • Physical implementation • storage structures, access methods • Logical data model • conceptual data structure and manipulation • Views • different portions of the database • ?? Who should see each different level?

  14. Relational Model Provide Independence? • Ordering dependence? • Indexing dependence? • Access path dependence?

  15. A Relational Model of Data • Relations • given sets S1, S2, …, Sn (not distinct) • relation R is a subset of the Cartesian product S1 x S2 x … x Sn • Sj is jth domain of R, n is degree of R • Relations as tables • each row represents an n-tuple of R • ordering of rows is immaterial • all rows are distinct • ordering of columns is significant • label each column with the name of the corresponding domain

  16. Data Manipulation • Relational algebra • operations • Relational calculus • semantics in terms of logics • Essential Beauty of Relational Model: Named Relations Expressible Relations

  17. Operations on Relations • Usual set operations: • since relations are sets of homogeneous tuples

  18. Operations on Relations: Relation Deriving • Permutation • interchange the columns of an n-ary relation • Projection • select columns and remove any duplication in the rows • Join • selectively combining tuples in two relations • as a “class” of new relations that losslessly take some columns from either source relations-- why such a def suffices for here? • Composition • join two relations and remove join columns • Restriction • filter one relation with another

  19. Algebra: Questions • What’s missing in this set of operators? • Is it minimal? • How is it different from “current” algebra?

  20. The Algebra Primitives: NOW • Selection • Projection • Set union • Set difference • Cartesian product • Renaming

  21. Redundancy and Consistency • Redundancy • redundant if something can be “derived” from others • foundation: what operations allowed in “derivation” • Consistency • data snapshot must satisfy some constraints • Set the landscape for database theory research • e.g., normal forms; normalization

  22. What’s Good about the Relational Model? • Simplicity! • Mathematically complete data model • Declarative query languages • queries can be automatically compiled, executed, and optimized without resorting to low-level programming

  23. Unexpected Benefits • Client-server architecture • SQL request/response enables high-level, compact exchange between clients and server • clients: input and output, application logics • server: data processing • Parallel processing: relations in and out • pipeline: piping the output of one op into the next • partition: N op-clones, each processes 1/N input • Graphical user interfaces • relations fits the spreadsheet (table) metaphor

  24. The Rising of Relational Model • Codd’s paper in 1970 • resistance even within IBM • First implementations, 1973 • System R at IBM San Jose Lab • INGRES at UC.Berkeley • The “Great Debate” in 1975 SIGMOD conf. • Codd got Turing award in 1981

  25. The Great Debate (1975, SIGMOD) • COBOL/CODASYL  Relational • too mathematical (to understand) • Relational  COBOL/CODASYL • too complicated (to program)

  26. Relational Model/System Impact • Codd’s paper published in 1970 • First implementations, 1973-- • System R at IBM San Jose Lab, 1974-1978 • INGRES at UC.Berkeley, 1973-1977 • System R influence: • IBM DB2 • HP ALLBASE • Oracle: started from published spec. of System R • INGRES: • member later funded Sybase • evolved into Microsoft SQL server by buying code from Sybase

  27. What Have Changed Over the Years? • Changed by implementations • First implementations (System R and INGRES) started 1973, 74 • Changed over time

  28. What Have Changed Over the Years? • Columns identified by position instead of names • ordering of columns was significant • names (as role.domain) were simply “user interface” • Rows were distinct-- set semantics of relation • SQL: “select distinct” to eliminate duplicates

  29. What Have Changed Over the Years? • Semantically defined non-algebraic join • Codd’s joins can be “plural” • joins are now defined “operationally”

  30. What Have Changed Over the Years? • Non-simple domains: i.e. complex objects • allowed only built-in data types • new: object-relational DB, multimedia DB • Generations of relations: temporal aspect • temporal databases • e.g.: query GPA at the end of year 2000

  31. Problems with the Relational Model • Too simple? • unable to handle beyond business data processing? • Data is often hierarchical/complex in nature • normalization is unnatural decomposition of data for storage, to be assembled by joins at query time

  32. Network/Hierarchical Model Coming Back? • Web is a huge navigational database! • XML is both navigational and hierarchical <student> <name>John Smith</name> <dept>CS</dept> <enrollments> <enrollment> <course>CS311</course> <grade>A+</grade> </enrollment> … … // more enrollments </enrollment> <student>

  33. All About Paradigm Shift • Bachman: offline sequential access --> online navigation This revolution in thinking is changing the programmer from a stationery viewer of objects passing before him in core into mobile navigator who is able to probe and traverse a database at will. The Programmer as Navigator, 1973 Turing Lecture • Codd: navigation over records & links --> declaration over relations The most important motivation for this research work is to provide a sharp and clear boundary between the logical and physical aspects of data management. … We call this data independence objective. Relational Databases: A Practical Foundation for Productivity, 1981 Turing Award Lecture What will be the next paradigm shift?

  34. What You Should Know • Relational model, relational algebra • What is the fundamental difference between the relational model and the previous data models? • What are some major advantages of the relational model? • What are some limitations of the relational model?

  35. Carry Away Messages • Raise important research questions • See deficiencies in the current state of the world (data dependency) • Propose a change to the world that would address some of the deficiencies (declarative queries) • Leverage principled/mathematical tools (relational algebra)

More Related