260 likes | 348 Views
Evaluating Relational Theory. Delivering end-user programming General-purpose data modelling. Generalities of DBs: themes of the module. Two views of impact of databases … … can view the DBMS as a program generator for the end-user cf. current research on end-user programming
E N D
Evaluating Relational Theory Delivering end-user programming General-purpose data modelling CS319 Theory of Databases
Generalities of DBs: themes of the module • Two views of impact of databases … • … can view the DBMS • as a program generator for the end-user • cf. current research on end-user programming • as a means to record persistent real-world state • cf. current research on virtual reality • Key issue: Is it possible to align paradigms for programming and general-purpose data modelling? CS319 Theory of Databases
Characteristics of electronic data 1970 (1) • “Abstract model of the entire corpus of operational data” • Separation between persistent & transient data sharper • file vs executing program • Isolation of persistent data more complete • changes to persistent data initiated by human action • persistent data accessed through text interfaces • Electronic data storage & management rare, expensive • ‘intelligent’ interpretation of electronic state by human • no direct connection between environment and data CS319 Theory of Databases
Modern context for general data modelling Programs CS319 Theory of Databases
Characteristics of electronic data 1970 (2) • “Abstract model of the entire corpus of operational data” • Demands of the abstract model in 1970 quite low … • small volumes of data, modest performance • limited levels of volatility and automation tolerated • Today is very different, BUT – subject to viewing human agency as a metaphor for any agency, the key issues to be addressed by a classical database are still vital • Any DB modelling paradigm must handle 70s problems CS319 Theory of Databases
Evaluating Relational Query Languages Relational theory strengths and limitations From relational theory to practice CS319 Theory of Databases
Perspectives on evaluating relational theory • Briefly review original motivation for relational theory • with particular attention to relational query languages … and their suitability for end-user programming • general discussion of issues of principle • review SQL to expose the relation to normal practice • review sqleddi to reveal issues in bad current practice CS319 Theory of Databases
Generalities of DBs: themes of the module • Two views of impact of databases … • … can view the DBMS • as a program generator for the end-user • cf. current research on end-user programming • as a means to record persistent real-world state • cf. current research on virtual reality • Key issue: Is it possible to align paradigms for programming and general-purpose data modelling? CS319 Theory of Databases
Relational query languages • Informally can identify the notion of a pure (as distinct from a commercial) relational query language (RQL) • Queries in pure RQLs have precisely the abstract mathematical functionality identified by Codd • Languages such as ISBL and EDDI come closest to this • Commercial RQLs have the expressive power of pure RQLs plus additional features for practical use CS319 Theory of Databases
Strengths of pure relational query languages • Pure relational query languages provide … • wide range of abstract queries for record-based data • excellent mathematical semantics • physical and logical data independence • relative simplicity for the human user (cf. general PL) • scope for automatic optimisation • a declarative rather than a procedural emphasis CS319 Theory of Databases
Limitations of pure relational query languages 1 • Pure relational query languages DON’T provide … • computational completeness: can’t compute the transitive closure of a relation • For instance, consider the table: • BIRTHS(NAME,MOTHER,FATHER,YEAR,GENDER) • & try to write a relational query to list ALL ancestors • an intrinsic procedural interpretation (cf. a simple functional programming language) CS319 Theory of Databases
Limitations of pure relational query languages 2 • Pure relational query languages DON’T provide … • procedural elements of the DDL (e.g. update, create) • aggregate functions • “syntactic sugar” to appeal to a naïve user’s intuition • presentation features (e.g. ordering, forms) • access features (e.g. GUIs for users, PL interfaces) CS319 Theory of Databases
Exposing the limitations of RQLs • Exploration of this theme is an exercise to the reader … • Can appreciate the similarities and natural discrepancies between a pure RQL and a commercial RQL by reviewing the features of SQL (see handout associated with sql.ppt) • Can expose the pathological discrepancies between SQL and and a pure RQL by studying the relationship between EDDI, SQL0 and TOYSQL (see worksheets 5 and 6, and the slides BeyondSQL0.ppt accessible via the CS233 website) CS319 Theory of Databases
B: Pathologies in standard SQL 1 • To run the EDDI interpreter consult CS233 Worksheet 5 • 3 evaluation conventions in EDDI reflect a pure RQL • no multiple rows • strict type checking on domains and attributes • use of natural join • Can change these via the Uneddifying Interface • See Worksheet 6 Questions 3-6 for illustration CS319 Theory of Databases
B: Pathologies in standard SQL 2 • To run the SQL0 interpreter consult CS233 Worksheet 6 • SQL0 ( SQL) respects all three evaluation conventions • BUT Standard SQL violates all 3 evaluation conventions: • allows duplicate rows - implements two types of selection: SELECT DISTINCT and SELECT • dispenses with type checking on attributes • uses “unnatural” join • Consequence: logical flaws & obscure semantics [HD] CS319 Theory of Databases
B: Pathologies in standard SQL 3 • Issue: How to implement standard SQL using EDDI? • Worksheet 6 provides the context for discussing this … • BeyondSQL0.ppt explains in more detail how such an implementation can be carried out and highlights the problematic consequences of the poor design of standard SQL where implementation is concerned CS319 Theory of Databases
Meta-agenda raised by RQLs • RQLs went a long way to resolving the data modelling challenges for end-user programming in the 70s • BUT (arguably) the solution they offer has its limitations in respect of modern requirements … • Some limitations clearly stem from a failure to be faithful to the principles of Codd’s relational theory • Is it conceivable that some limitations stem from the limitations of mathematical theory itself? • Will return to these themes later in the module … CS319 Theory of Databases
Modelling ‘real-world’ state Modelling state in computer science: State as ‘the current state of affairs’ Modelling state for and with state-change CS319 Theory of Databases
Generalities of DBs: themes of the module • Two views of impact of databases … • … can view the DBMS • as a program generator for the end-user • cf. current research on end-user programming • as a means to record persistent real-world state • cf. current research on virtual reality • Key issue: Is it possible to align paradigms for programming and general-purpose data modelling? CS319 Theory of Databases
Modelling state 1 • Modelling state is fundamental to computer science • Important and confusing distinction between • modelling the current state of affairs • modelling to support state changing activities • File system or database content = persistent storage • … represents a current state of affairs • Object-oriented analysis from use-cases • … reflects the manner in which state is to be changed CS319 Theory of Databases
Modelling state 2 • Philosophical issues raised by this distinction … • Commonly argued that perception of state is mediated by the goal of our interaction with it • This is consistent with the dominant emphasis in CS on • “state as specified only in the context of a behaviour” • cf. state in a finite state machine, or procedural program • Objects, ADTs, functional programs etc are viewed as • abstractions to support the specification of state-change CS319 Theory of Databases
Modelling state 3 • Philosophical issues raised by this distinction … • NB description of state description of state-change • A key reason for this is that assuming the environment is stable, we can describe recipes for action that make no explicit representation to current state • Consider e.g. running up familiar stairs ‘automatically’ • Internal memory of the recipe memory of the stairs CS319 Theory of Databases
Modelling state 4 • Philosophical issues raised by this distinction … • Relational DBs definitely aspire to model state itself • e.g. with the aim of supporting ‘open-ended’ queries • Relational queries are viewed as interrogating state NOT as changing state • Adding a view as making a new observation of a state • Not all OO traditions approve of the emphasis on supporting state-change (e.g. Simula, anti-use-case) CS319 Theory of Databases
Modelling state 5 • Philosophical issues raised by this distinction … • State bound up with agency and modes of observation: • use of RDBs in a timetabling exercise • spreadsheets • persistence as presumed absence of agency • entity-relationship modelling CS319 Theory of Databases
Modelling state 6 • Entity-relationship modelling makes use of diagrams: • Note that ER diagrams: • represent state not state change • supply direct metaphorical representation of state: entities, relationships, attributes = nodes of graph • Metaphor: there is a perceived correspondence between the features of the diagram and the state that it represents – an experiential not formal issue • In practice, the metaphor is hard to sustain … CS319 Theory of Databases
Modelling state 7 • Actual relationship between relation schemes and real-world observables is very subtle … • … by way of illustration, consider additional constraints to which real-world relations may be subject, not FDs: • … relations may be subject to data dependencies that motivate 4NF and 5NF … cf. ‘The Connection Trap’ CS319 Theory of Databases