370 likes | 521 Views
CS511 Design of Database Management Systems. Lecture 02: Codd’s Relational Model Kevin C. Chang. Announcements. Comments/questions in class: please ask for and use a microphone to speak up please tell us your name first. Contributions of this Paper?. Contributions of this Paper?.
E N D
CS511Design of Database Management Systems Lecture 02: Codd’s Relational Model Kevin C. Chang
Announcements • Comments/questions in class: • please ask for and use a microphone to speak up • please tell us your name first
Contributions of this Paper? • Concepts of data independence and declarative queries • argues need for more natural data-centric modeling • argues declarative queries • Complete definition of relational model • data structure based on relations • algebra for manipulating data • Formal theoretic notions • expressive power, redundancy, and consistency • starting research on “database theories”
Data Model • Mathematical formalism providing: • notation & structure for describing data • set of operations used to manipulate that data • A way of organizing a collection of facts pertaining to a system being managed • A way of thinking about the world and the phenomena • Fundamental “paradigm” of data management
DBMS: Levels of Abstraction?? • Physical implementation • storage structures, access methods • Logical data model • conceptual data structure and manipulation • Views • different portions of the database • ?? Who should see each different level?
The State of the World: COBOL Camp • Network/hierarchical DBMS, 1960’s • IDS network DBMS: Bachman at GE, 1961 • IMS hierarchical DBMS: IBM in 1968 – Still exists today! # • CODASYL approach to data management, 1960’s • CODASYL: Conf. Of Data System Languages, set up by US DOD, to standardize software applications • COBOL (comm. bus. oriented lang.) defined by CODASYL • ruled the business data processing world • incorporating prior data definition languages • DBTG (Database Task Group, under CODASYL), 1971 • closely aligned with COBOL • DBTG Report would standardize network model • Bachman (for network model) got Turing award in 1973
The State of the World: Relational camp • Codd’s paper in 1970 • resistance even within IBM • First implementations, 1973 • System R at IBM San Jose Lab • INGRES at UC.Berkeley • The “Great Debate” in 1975 SIGMOD conf. • Codd got Turing award in 1981
Arguments Against the Other Side? • COBOL/CODASYL Relational • too mathematical (to understand) • Relational COBOL/CODASYL • too complicated (to program)
All About Paradigm Shift • Bachman: offline sequential access --> online navigation This revolution in thinking is changing the programmer from a stationery viewer of objects passing before him in core into mobile navigator who is able to probe and traverse a database at will. The Programmer as Navigator, 1973 Turing Lecture • Codd: navigation over records & links --> declaration over relations The most important motivation for this research work is to provide a sharp and clear boundary between the logical and physical aspects of data management. … We call this data independence objective. Relational Databases: A Practical Foundation for Productivity, 1981 Turing Award Lecture
Network Model: DBTG Report • Network DB: • a collection of records • record = collection of fields • similar to an entity in E-R model • records connected by binary, many-to-one links • similar to binary relationships in E-R model • simulate one-to-one, many-to-many by many-to-one
Network Model: Implementation # • Student record linked to enrollment record • A lot of pointer linkage • ring-structured ptrs implements many-one links • Data manipulation is thus navigational Johnson … CS001 A+ CS308 B
DBTG Query Example select name from student where dept = “CS” student.dept = “CS”; findany student using dept; while DB-status = 0 do begin get student; print (student.name); findduplicate student using dept; end
DBTG Query Example: Predicates select name from student where dept = “CS” and grade >= “A-” student.dept = “CS”; findany student using dept; while DB-status = 0 do begin get student; if student.grade >= “A-” print (student.name); findduplicate student using dept; end
DBTG Query Example: Navigation # select E.grade from student S, enrollment E where S.name = “Johnson” and E.id = S.id student.name = “Johnson”; findany student using name; find first enrollment within StudentEnroll while DB-status = 0 do begin get enrollment; print (enrollment.grade); findnext enrollment within StudentEnroll; end
What’s Wrong? • Mix presentation and access in modeling • very much like HTML today :-) • Difficult and complex to program • Data dependence
Data Dependence • Ordering dependence • stored ordering should be independent of access & presentation • Indexing dependence • indices should be redundant and for “optimization” • Access path dependence • access paths to data should not be hard-coded
Relational Model Provide Independence? • Ordering dependence? • Indexing dependence? • Access path dependence?
Relations • Definition • given sets S1, S2, …, Sn (not distinct) • relation R is a subset of the Cartesian product S1 x S2 x … x Sn • Sj is jth domain of R, n is degree of R • Properties • each row represents an n-tuple of R • ordering of rows is immaterial • all rows are distinct • ordering of columns is significant • label each column with the name of the corresponding domain
Data Manipulation • Relational algebra • operations • Relation calculus • semantics in terms of logics • Essential Beauty of Relational Model: Named Relations Expressible Relations
Operations on Relations • Usual set operations: • since relations are sets of homogeneous tuples
Operations on Relations: Relation Deriving • Permutation • interchange the columns of an n-ary relation • Projection • select columns and remove any duplication in the rows • Join • selectively combining tuples in two relations • as a “class” of new relations that losslessly take some columns from either source relations-- why such a def suffices for here? • Composition • join two relations and remove join columns • Restriction • filter one relation with another
Algebra: Questions • What’s missing in this set of operators? • Is it minimal? • How is it different from “current” algebra?
The Algebra Primitives: NOW • Selection • Projection • Set union • Set difference • Cartesian product • Renaming
Redundancy and Consistency • Redundancy • redundant if something can be “derived” from others • foundation: what operations allowed in “derivation” • Consistency • data snapshot must satisfy some constraints • Set the landscape for database theory research • e.g., normal forms; normalization
What’s Good about the Relational Model? • Simplicity! • Mathematically complete data model • Declarative query languages • queries can be automatically compiled, executed, and optimized without resorting to programming
Unexpected Benefits • Client-server architecture • SQL request/response enables high-level, compact exchange between clients and server • clients: input and output, application logics • server: data processing • Parallel processing: relations in and out • pipeline: piping the output of one op into the next • partition: N op-clones, each processes 1/N input • Graphical user interfaces • relations fits the spreadsheet (table) metaphor
What Have Changed Over the Years? • Changed by implementations • First implementations (System R and INGRES) started 1973, 74 • Changed over time
What Have Changed Over the Years? • Columns identified by position instead of names • ordering of columns was significant • names (as role.domain) were simply “user interface” • Rows were distinct-- set semantics of relation • SQL: “select distinct” to eliminate duplicates
What Have Changed Over the Years? • Semantically defined non-algebraic join • Codd’s joins can be “plural” • joins are now defined “operationally”
What Have Changed Over the Years? • Non-simple domains: i.e. complex objects • allowed only built-in data types • new: object-relational DB, multimedia DB • Generations of relations: temporal aspect • temporal databases • e.g.: query GPA at the end of year 2000
Problems with the Relational Model • Too simple? • unable to handle beyond business data processing? • Data is often hierarchical/complex in nature • normalization is unnatural decomposition of data for storage, to be assembled by joins at query time
Network/Hierarchical Model Coming Back? • Web is a huge navigational database! • XML is both navigational and hierarchical <student> <name>John Smith</name> <dept>CS</dept> <enrollments> <enrollment> <course>CS311</course> <grade>A+</grade> </enrollment> … … // more enrollments </enrollment> <student>
Relational System Influence • Codd’s paper published in 1970 • First implementations, 1973-- • System R at IBM San Jose Lab, 1974-1978 • INGRES at UC.Berkeley, 1973-1977 • System R influence: • IBM DB2 • HP ALLBASE • Oracle: started from published spec. of System R • INGRES: • member later funded Sybase • evolved into Microsoft SQL server by buying code from Sybase
What’s Next? • System R: • one of the first implementations of relational DBMS • IBM San Jose Research Labs