460 likes | 475 Views
This module explores generalities on databases, entity-relationship models, relational models, commercial languages, technical issues, optimization, integrity constraints, database design, future trends, object-oriented models, extended relational systems, and SQL. Recommended reading includes Date's "Database Primer" and "Relational Database Writings." Guest lecturer Dr. Hugh Darwen discusses the relational model and SQL with upcoming lecture schedules detailed. Learn about data management problems, database uses in various industries, and the importance of database management systems in addressing data issues.
E N D
Theory of Databases • Module lecturer: Dr Meurig Beynon, Room 3.15 • Module Content • Generalities on databases • Database models: Theory and Practice • Entity-relationship models • Relational models: algebraic and logical • Commercial relational languages CS319 Theory of Databases
CS319 Theory of Databases • Module Content (cont.) • Technical Issues and Algorithms • Optimisation • Integrity constraints • Database design • Future Trends: Theory and Practice • Object-oriented models • Extended relational systems • The Relational Model and SQL CS319 Theory of Databases
Books on Classical Relational Databases • Date, C.J. A Database Primer • Date, C.J. An Introduction to Database Systems (Vol. 1) Addison-Wesley 6th edition, 1995 • Korth, H.F. and Silberschatz Database System Concepts McGraw-Hill 2nd edition 1991 • Ullman, J.D. Principles of Database Systems 2nd edition Computer Science Press 1982 CS319 Theory of Databases
Books on more general database issues • Ullman, J.D. Principles of Database & Knowledge-base Systems Vols 1 & 2 Computer Science Press 1988 • Brown, A.W. Object-oriented Databases: Applications in Software Engineering McGraw-Hill International Series in Engineering, 1991 • Gray, P.D. Logic, Algebra & Databases, Ellis-Horwood Series, Computers and their Applications 29 • Kent, William Data and Reality: basic assumptions in data processing reconsidered, North-Holland 1987 CS319 Theory of Databases
Books on Relational Theory in its application • Date, C.J. Relational Database Writings 1985-1989 Addison-Wesley [and other collections in this series] • Hugh Darwen and C.J.Date Foundations for Object/Relational Databases: The Third Manifesto 1998 Harlow, Addison-Wesley Longman • Dr Hugh Darwenwas an IBM consultant on databases: • he is our guest lecturer in weeks 18 and 19 • NOTE: he will give 4 lectures in week 18 thus: • Monday February 21th at 2pm and 5pm in CS1.04 • Thursday February 24th at 2pm to 5pm in CS1.04 • He will also give 4 lectures in week 19at similar times CS319 Theory of Databases
CS319 Schedule for 2005 • Dr Hugh Darwenwas an IBM consultant on databases: • he is our guest lecturer in weeks 18 and 19 • NOTE: the lectures in week 18will be on The Relational Model and SQL thus: • Monday February 21st at 2pm and 5pm in CS1.04 • Thursday February 24th at 2pm and 5pm in CS1.04 • the lectures in week 19will be on • Temporal Data and the Relational Model thus: • Monday February 28th at 2pm and 5pm in CS1.04 • Thursday March 3rd at 2pm and 5pm in CS1.04 CS319 Theory of Databases
DB generalities : What is a database? • Chris Date: • Database = computer-based record keeping system • R.W. Engles: "A Tutorial on DB Organisation" (1974) • collection of stored operational data used by the applications system of a particular enterprise • enterprise: hospital, university, bank, company etc • operational data: • data on products, accounts, patients etc • typically persistent cf conventional program IO data CS319 Theory of Databases
DB generalities : Why use a database? • Case-study: Banking (after Korth &Silberschatz Chap. 1 • ? How to meet needs using a traditional file-processing system supported by a conventional OS • Files: permanent records of customers, accounts • Applications programs (APs): enable user to modify files • to credit or debit an account • to add a new account • to find the balance in an account • to generate monthly statements • APs written by systems programmers as required • new requirements new files + new programs CS319 Theory of Databases
Original context for data modelling 1 • 1970s style applications • unsophisticated computer users • batch mode interaction • modest response times • no visualisation or GUI • modest expectations for ease-of-use • programming perceived as technical • simple computing infrastructure and environment • no PC, web etc • no live feeds of data • textual interaction the norm CS319 Theory of Databases
Original context for data modelling 2 • 1970s style applications • Business context • simple business model, limited automation, access etc • low volume of data • not initially distributed • Computing context • - existing/emerging DB proposals unconvincing • - computers not very powerful • - human and computing resources very expensive CS319 Theory of Databases
Summary of issues for data management • Problems of data management for file systems • that DBs were originally intended to address: • Data redundancy and inconsistency • Difficulty in accessing data • Data isolation • Concurrent access anomalies • Security problems • Integrity problems CS319 Theory of Databases
DB generalities: Issues for data management • For many commercial applications (as in 1970s-style enterprises above) good solution is offered by a database management system (DBMS). • A DBMS is an unconventional OS operating over a structured file system. CS319 Theory of Databases
Generalities of DBs: the DBMS concept • Motivating idea: • devise an abstract model of the entire corpus of operational data that simplifies the data processing activity, so that • simple queries can be handled without writing new application programs • where applications programs must be written, the task of accessing and manipulating operational data consistently and efficiently is greatly simplified CS319 Theory of Databases
Generalities of DBs: themes of the module • In this module, primarily concerned with a particular data model ("the relational database") that has proved very successful in meeting the needs of many commercial enterprises involved in data processing. • For other applications, such as interactive design, different principles seem to be required. For instance, human interaction and intervention in data processing for design is of the essence. In the latter part of the course, will consider alternative approaches (e.g. object-oriented databases, deductive databases). CS319 Theory of Databases
Generalities of DBs: themes of the module • There is a management aspect to the use of DBs: the issues above can to some extent be addressed either by manual processes or through automation. In some cases (e.g. security), it isn't clear which is most effective. Hardest to automate in areas where essentially new data and new modes of organising data are being generated dynamically. • Relational databases are arguably associated with applications of a particular kind, where the business processes are especially stable and ritualised. CS319 Theory of Databases
Generalities of DBs: themes of the module • Two views of impact of databases … • … can view the DBMS • as a program generator for the end-user • cf. current research on end-user programming • as a means to record persistent real-world state • cf. current research on virtual reality • Key issue: Is it possible to align paradigms for programming and general-purpose data modelling? CS319 Theory of Databases
DB generalities: the ingredients of a database • Data • integrated • shared • possibly distributed • Hardware • primary storage + secondary storage • Software • database management system: DBMS • protects users from hardware level detail • serves the needs of all users CS319 Theory of Databases
DB generalities: the ingredients of a database • Users • end-user: • non-specialist accessing data via a query language • naïve user accessing data via a special-purpose interface performs data retrieval and update (extend / modify) • applications programmer: • writes programs that use the DB by embedding queries to the DB in a HLL • develops interfaces for the naïve user CS319 Theory of Databases
DB generalities: the ingredients of a database • Users • Database Administrator (DBA): • responsible for overall control • decides what data is to be stored • designs the conceptual scheme • used to represent the operational data • implements authorisation checks • decides strategy for backup and recovery • monitors performance • oversees modification to suit user requirements CS319 Theory of Databases
DB generalities: data abstraction in a database • Data abstraction addresses issues of use, design, management and implementation in a database • The data model serves to describe in a formal manner the way in which data is viewed at three different levels of abstraction: • physical level • conceptual level • view level CS319 Theory of Databases
DB generalities: data abstraction in a database • physical level: • how is the data actually represented in the hardware? • bits, bytes • conceptual level: • what meaningful relationships are expressed by the physical data? • entities, and relationships between entities • view level: • what particular relationships are required by users? • more abstract because partial typically very high-level knowledge constitutes the view CS319 Theory of Databases
DB generalities: data abstraction in a database • Illustrating data abstraction: • Data base stores the date of birth of a client as a bit string. When we identify the senior citizens, we find all clients aged over 65. • Representations at different levels of abstraction • conceptual date of birth of a client • physical the bit string that records this information • view refers to age, which isn't stored in the DB. CS319 Theory of Databases
DB generalities: data abstraction in a database DESIGN & MANAGEMENT USE IMPLEMENTATION CS319 Theory of Databases
DB generalities: data abstraction in a database • Internal and external translation schemas serve • to protect the conceptual model from change when physical organisation changes or new views are required • to protect the user from a need to change views and the database programmer from a need to alter the physical organisation if and when the conceptual model is modified CS319 Theory of Databases
DB generalities: data abstraction in a database • The DBA conceives the database in terms of the conceptual model. • Users and application programs access the physical data via the conceptual model. • physical data independence: protecting the conceptual model from change when the physical organisation changes • logical data independence: protecting the user from the need to change views when the conceptual model changes CS319 Theory of Databases
DB generalities: data abstraction in a database • Analogous to the way in which the programmer is protected from low-level issues by using ADTs: • programAbstract Data TypeADT implementation • physical data independence • cf. change in an ADT implementation • logical data independence • cf. use of old program over enhanced ADT CS319 Theory of Databases
Recall - Generalities of DBs: the DBMS concept • Motivating idea: • devise an abstract model of the entire corpus of operational data that simplifies the data processing activity, so that • simple queries can be handled without writing new application programs • where applications programs must be written, the task of accessing and manipulating operational data consistently and efficiently is greatly simplified CS319 Theory of Databases
Characteristics of electronic data 1970 (1) • “Abstract model of the entire corpus of operational data” • Separation between persistent & transient data sharper • file vs executing program • Isolation of persistent data more complete • changes to persistent data initiated by human action • persistent data accessed through text interfaces • Electronic data storage & management rare, expensive • ‘intelligent’ interpretation of electronic state by human • no direct connection between environment and data CS319 Theory of Databases
Modern context for general data modelling Programs CS319 Theory of Databases
Characteristics of electronic data 1970 (2) • “Abstract model of the entire corpus of operational data” • Demands of the abstract model in 1970 quite low … • small volumes of data, modest performance • limited levels of volatility and automation tolerated • Today is very different, BUT – subject to viewing human agency as a metaphor for any agency, the key issues to be addressed by a classical database are still vital • Any DB modelling paradigm must handle 70s problems CS319 Theory of Databases
DB generalities: data models for a database • Many different paradigms have been proposed for developing abstract data models for databases • There are two principal kinds of abstract data model: • object-based models • record-based models • The earliest database systems were record-based • - this reflects the file system culture that they displaced CS319 Theory of Databases
DB generalities: data models for a database • Object-based models • The main models in this category are • entity-relationship models • object-oriented data models • Others include semantic and functional data models. • E-R model widely used to model data abstractly • OO model gaining acceptance in practice: effectively represents data + operations on data. CS319 Theory of Databases
DB generalities: data models for a database • Record-based Logical Models • Used at the conceptual and view levels. Specify both • overall logical structure of the database • higher-level description of the implementation. • Record-based because uses records in fixed-format of several types. This simplifies implementation: cf. trend towards richness and variety in structures used to implement OODBs CS319 Theory of Databases
DB generalities: data models for a database • Varieties of record-based logical model • hierarchical model • records & links organised as a family of trees • network model • records & links organised as a family of graphs • relational model • uses tables to record relationships between data CS319 Theory of Databases
DB generalities: data models for a database • Physical Data Models • There are also models of data at the lowest level of abstraction, concerned with physical organisation. These are not our primary concern in this module. • Relevant issues for relational databases include: • are data tables stored using hashing? • how are data tables indexed? • how are entries in data tables encoded and ordered? • what algorithms are used to retrieve and update? CS319 Theory of Databases
DB generalities: classical database features • Instances and Schemes • State of a DB changes over time: distinguish structure of DB from current state as defined by the data in it. • overall design of DB = database scheme • current content of DB = instance of the DB • Useful analogy with procedural variables: • database scheme type definition for variable • instance of database value of the variable CS319 Theory of Databases
DB generalities: classical database features • Classical DBs support data abstraction at many levels • The three levels of abstraction at which the database can be viewed are modelled by appropriate schemes. • physical scheme at the lowest level • conceptual scheme at intermediate level • several subschemes (possibly user-defined) • at the highest level (also called views of the DB) CS319 Theory of Databases
DB generalities: classical database features • Data Definition Language (DDL) • database scheme is defined using a DDL • compiling the DDL description creates a Data Dictionary • the storage and access methods used by the DB • are specified in a storage and definition language • Implementation details for storage are usually hidden from users CS319 Theory of Databases
DB generalities: classical database features • Data Manipulation Language (DML) • data manipulation means accessing DB to • retrieve, insert, delete, or modify data • most common use of DML is for data retrieval: informally described as "querying the DB" • retrieval component of DML = query language • (and by abuse, sometimes use term ‘query language’ as synonym for DML) CS319 Theory of Databases
DB generalities: classical database features • Varieties of Data Manipulation Language • There is a tension between • efficiency at physical level • intelligibility / ease of use at higher level • Have both procedural and non-procedural DMLs • procedural: requires knowledge of data implementation • non-procedural: need only specify what data is needed CS319 Theory of Databases
DB generalities: classical database features • Data Manipulation Languages for typical data models • object-based, hierarchical, network models have procedural DMLs • user can take explicit responsibility for optimising queries, but needs knowledge of data organisation • relational models use non-procedural DMLs • can formulate queries without knowledge of data organisation, but implementation has to be optimised CS319 Theory of Databases
DB generalities: classical database features • Database Manager (as program module!) • … interfaces between low-level data in DB and application programs & user queries. • Large volumes of data (relative to available technology) • gigabytes thousand megabytes = 1 billion bytes [!] • terabytes million megabytes = 1 trillion bytes • Requires auxiliary storage media, such as disk, CD etc. • Optimisation is primarily concerned with eliminating • data transfers between main and auxiliary memory. CS319 Theory of Databases
DB generalities: classical database features • Functions of the DB manager program module • query processing interacting with the file manager modules doing actual operations on physical data • integrity enforcement checking that data in the DB conforms to specified constraints • security enforcement ensuring that authorisation is given for access to data • backup and recovery coping with failure, and recovery to consistent DB state • concurrency control ensuring that simultaneous transactions do not interfere. CS319 Theory of Databases
DB generalities: classical database features • Role of the Database Administrator • ... functions performed by the DBA .… • scheme definition using DDL • specification of storage structure and access method • modification of schemes and physical storage as required • granting authorisation for data access • specifying integrity constraints CS319 Theory of Databases
DB generalities: classical database features • Overall system structure • Processing components • file manager • database manager • query processor • DML precompiler (to process DML embedded in APs) • DDL compiler • Data structures • data files actual content of db • data dictionary meta-data • indices auxiliary files to assist fast access CS319 Theory of Databases
… to follow Will review features of a DB system with reference to Ingres and Quel CS319 Theory of Databases