360 likes | 517 Views
Lecture 2: Relational Model. Aug. 30 2006 ChengXiang Zhai. Most slides are adapted from Kevin Chang’s lecture slides. Data Model. Mathematical formalism providing: notation & structure for describing data set of operations used to manipulate the data
E N D
Lecture 2: Relational Model Aug. 30 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture slides
Data Model • Mathematical formalism providing: • notation & structure for describing data • set of operations used to manipulate the data • A way of organizing a collection of facts pertaining to a system being managed • A way of thinking about the world and the phenomena • Fundamental “paradigm” of data management
The State of the World: Pre Rel. Model • Network/hierarchical DBMS, 1960’s • IDS network DBMS: Bachman at GE, 1961 • IMS hierarchical DBMS: IBM in 1968 – Still exists today! • CODASYL approach to data management, 1960’s • CODASYL: Conf. Of Data System Languages, set up by US DOD, to standardize software applications • COBOL (comm. bus. oriented lang.) defined by CODASYL • ruled the business data processing world • incorporating prior data definition languages • DBTG (Database Task Group, under CODASYL), 1971 • closely aligned with COBOL • DBTG Report would standardize network model (Bachman got Turing award in 1973 for the network model)
Network Model: DBTG Report • Network DB: • a collection of records • record = collection of fields • similar to an entity in E-R model • records connected by binary, many-to-one links • similar to binary relationships in E-R model • simulate one-to-one, many-to-many by many-to-one
Network Model: Implementation • Student record linked to enrollment record • A lot of linkage pointers • ring-structured ptrs implements many-one links • Data manipulation is thus navigational Johnson … CS001 A+ CS308 B
DBTG Query Example SQL: select name from student where dept = “CS” DBTG: student.dept = “CS”; findany student using dept; while DB-status = 0 do begin get student; print (student.name); findduplicate student using dept; end
DBTG Query Example: Predicates SQL: select name from student where dept = “CS” and grade <= “C” DBTG: student.dept = “CS”; findany student using dept; while DB-status = 0 do begin get student; if student.grade <= “C” print (student.name); findduplicate student using dept; end
DBTG Query Example: Navigation SQL: select E.grade from student S, enrollment E where S.name = “Johnson” and E.id = S.id DBTG: student.name = “Johnson”; findany student using name; find first enrollment within StudentEnroll while DB-status = 0 do begin get enrollment; print (enrollment.grade); findnext enrollment within StudentEnroll; end
What’s Wrong? • Data dependence: Application program highly depends on the data representation • As a result, • Programming is difficult and complex • Application can become incorrect once there’s a change in data representation • Just like programming in assembly languages (as opposed to high-level programming languages)
Data Dependence • Ordering dependence • Applications may rely on a particular ordering of the stored data • Indexing dependence • Applications may rely on the availability of certain indices, but indices are semantically redundant and only necessary for “optimization” • Access path dependence • Applications would hard code access paths to data, so would rely on the continued existence of the used access paths
Codd’s Seminal Paper • Concepts of data independence and declarative queries • argues for more natural data-centric modeling • argues for declarative queries • Complete definition of relational model • data structure based on relations • algebra for manipulating data • Formal theoretic notions • expressive power, redundancy, and consistency • starting research on “database theories”
DBMS: Levels of Abstraction?? • Physical implementation • storage structures, access methods • Logical data model • conceptual data structure and manipulation • Views • different portions of the database • ?? Who should see each different level?
Relational Model Provide Independence? • Ordering dependence? • Indexing dependence? • Access path dependence?
A Relational Model of Data • Relations • given sets S1, S2, …, Sn (not distinct) • relation R is a subset of the Cartesian product S1 x S2 x … x Sn • Sj is jth domain of R, n is degree of R • Relations as tables • each row represents an n-tuple of R • ordering of rows is immaterial • all rows are distinct • ordering of columns is significant • label each column with the name of the corresponding domain
Data Manipulation • Relational algebra • operations • Relational calculus • semantics in terms of logics • Essential Beauty of Relational Model: Named Relations Expressible Relations
Operations on Relations • Usual set operations: • since relations are sets of homogeneous tuples
Operations on Relations: Relation Deriving • Permutation • interchange the columns of an n-ary relation • Projection • select columns and remove any duplication in the rows • Join • selectively combining tuples in two relations • as a “class” of new relations that losslessly take some columns from either source relations-- why such a def suffices for here? • Composition • join two relations and remove join columns • Restriction • filter one relation with another
Algebra: Questions • What’s missing in this set of operators? • Is it minimal? • How is it different from “current” algebra?
The Algebra Primitives: NOW • Selection • Projection • Set union • Set difference • Cartesian product • Renaming
Redundancy and Consistency • Redundancy • redundant if something can be “derived” from others • foundation: what operations allowed in “derivation” • Consistency • data snapshot must satisfy some constraints • Set the landscape for database theory research • e.g., normal forms; normalization
What’s Good about the Relational Model? • Simplicity! • Mathematically complete data model • Declarative query languages • queries can be automatically compiled, executed, and optimized without resorting to low-level programming
Unexpected Benefits • Client-server architecture • SQL request/response enables high-level, compact exchange between clients and server • clients: input and output, application logics • server: data processing • Parallel processing: relations in and out • pipeline: piping the output of one op into the next • partition: N op-clones, each processes 1/N input • Graphical user interfaces • relations fits the spreadsheet (table) metaphor
The Rising of Relational Model • Codd’s paper in 1970 • resistance even within IBM • First implementations, 1973 • System R at IBM San Jose Lab • INGRES at UC.Berkeley • The “Great Debate” in 1975 SIGMOD conf. • Codd got Turing award in 1981
The Great Debate (1975, SIGMOD) • COBOL/CODASYL Relational • too mathematical (to understand) • Relational COBOL/CODASYL • too complicated (to program)
Relational Model/System Impact • Codd’s paper published in 1970 • First implementations, 1973-- • System R at IBM San Jose Lab, 1974-1978 • INGRES at UC.Berkeley, 1973-1977 • System R influence: • IBM DB2 • HP ALLBASE • Oracle: started from published spec. of System R • INGRES: • member later funded Sybase • evolved into Microsoft SQL server by buying code from Sybase
What Have Changed Over the Years? • Changed by implementations • First implementations (System R and INGRES) started 1973, 74 • Changed over time
What Have Changed Over the Years? • Columns identified by position instead of names • ordering of columns was significant • names (as role.domain) were simply “user interface” • Rows were distinct-- set semantics of relation • SQL: “select distinct” to eliminate duplicates
What Have Changed Over the Years? • Semantically defined non-algebraic join • Codd’s joins can be “plural” • joins are now defined “operationally”
What Have Changed Over the Years? • Non-simple domains: i.e. complex objects • allowed only built-in data types • new: object-relational DB, multimedia DB • Generations of relations: temporal aspect • temporal databases • e.g.: query GPA at the end of year 2000
Problems with the Relational Model • Too simple? • unable to handle beyond business data processing? • Data is often hierarchical/complex in nature • normalization is unnatural decomposition of data for storage, to be assembled by joins at query time
Network/Hierarchical Model Coming Back? • Web is a huge navigational database! • XML is both navigational and hierarchical <student> <name>John Smith</name> <dept>CS</dept> <enrollments> <enrollment> <course>CS311</course> <grade>A+</grade> </enrollment> … … // more enrollments </enrollment> <student>
All About Paradigm Shift • Bachman: offline sequential access --> online navigation This revolution in thinking is changing the programmer from a stationery viewer of objects passing before him in core into mobile navigator who is able to probe and traverse a database at will. The Programmer as Navigator, 1973 Turing Lecture • Codd: navigation over records & links --> declaration over relations The most important motivation for this research work is to provide a sharp and clear boundary between the logical and physical aspects of data management. … We call this data independence objective. Relational Databases: A Practical Foundation for Productivity, 1981 Turing Award Lecture What will be the next paradigm shift?
What You Should Know • Relational model, relational algebra • What is the fundamental difference between the relational model and the previous data models? • What are some major advantages of the relational model? • What are some limitations of the relational model?
Carry Away Messages • Raise important research questions • See deficiencies in the current state of the world (data dependency) • Propose a change to the world that would address some of the deficiencies (declarative queries) • Leverage principled/mathematical tools (relational algebra)