1.01k likes | 1.2k Views
Databases and Information Systems 4. Richard Cooper (rich@dcs) and Tony Printezis (tony@dcs). The Fundamental Problem. Database Systems have been very successful in providing good support for managing data which is fairly large and fairly complex What happens when:
E N D
Databases and InformationSystems 4 Richard Cooper (rich@dcs) and Tony Printezis (tony@dcs) DBIS4 - RLC
The Fundamental Problem • Database Systems have been very successful in providing good support for managing data which is fairly large and fairly complex • What happens when: • the data gets very much larger • the data gets very much more complex DBIS4 - RLC
Contents of Course • Week 1 (Richard) • Introduction • Overview of RDB/ORDB/OODB • Week 2 (Richard) • Orthogonal Persistence • Object Oriented Database Systems DBIS4 - RLC
Contents of Course • Week 3 (Tony) • Java Object Serialization • The PJama API • Week 4 (Tony) • Object Caching and Object Faulting • Pointer Swizzling DBIS4 - RLC
Contents of Course 3 • Week 5 (Tony) • Garbage Collection - Disk Behaviour • Object Promotion • Week 6 (Tony) • Object Eviction • Orthogonal Persistence for Java DBIS4 - RLC
Contents of Course 4 • Week 7 (Tony) • Store Organisation • Garbage Collection • Week 8 (Richard) • Object Query Languages • Transaction Models DBIS4 - RLC
Contents of Course 5 • Week 9 (Richard) • Transaction Models for Multi-Site Databases • Schema Evolution • Week 10 • Specialised Indexing (Ela) • XML (Richard) DBIS4 - RLC
Assumptions about Database Use As database systems evolved, it was assumed that: 1. There was a central data store with lots of distributed users. 2. The data was relatively simple (largely alphanumeric). 3. The data was regular and complete. 4. There was a lot of data, but there was also an implicit limit to the size. 5. The users were either consumers or specialised creators DBIS4 - RLC
The Real World • Now we have: • data all over the place • in all kinds of structures • much of it is text • even more of it is graphical or aural • vast amounts of it • some of it is missing or is structured differently in different places • users with various kinds of interest/involvement DBIS4 - RLC
When Data is Small • You can get away with: • non-linear algorithms • hand-crafted code and data • an ad hoc structure • implicit rules and informal conventions DBIS4 - RLC
When Data Gets Large • You must have • linear or (better still) incremental algorithms • systematic code and data management • regular structures, frameworks and tools to support them • explicit, visible and interpretable rules DBIS4 - RLC
When Data is also Long Lived • We have the hardware to keep data for a very long time • and there are often laws forcing us to do so • However, long-lived data tends to change: • new data is added • it is restructured • the software expected to handle it evolves • Can you read a ten year old floppy??? DBIS4 - RLC
When Data is also Heterogeneous • Information Systems increasingly must bring together data produced • of different kinds (numeric and multi-media) • separately (e.g. in merged companies) • for different purposes • using different technologies • As though they were all designed to work together DBIS4 - RLC
Large, Long-lived, Heterogenous and Unstoppable • Because the data supports continuous operations: • utilities, banking, airlines, public service • You may not stop such systems if: • you want to change the hardware or software • you want to change your database • you want to change the application • there are hardware or software failures • there are operations which require exclusive access DBIS4 - RLC
This is the Reality we Live With • There are lots of examples: • shared scientific data (e.g. genomic data) • e-business • governmental systems and health-care data • computer aided design and manufacturing • geographic information systems • etc., etc. DBIS4 - RLC
And There Are Many More Media for Data Access • Not just a private network, but also: • the internet • digital television • mobile devices • etc., etc. DBIS4 - RLC
How To Cope 1 • Software Re-use: • not just small libraries such as Java API’s • but large components, such as: • databases, payroll packages, GUI packages, etc. • Standardised Frameworks • CORBA, DCOM, EJB, .NET, XML DBIS4 - RLC
How to Cope 2 • Generate code rather than write it • since much code is repetitious and can be generated from • a high-level notation or by reflecting over data • Work incrementally • revolution is never affordable • plan and resource route for transition • remember the users! DBIS4 - RLC
The Fundamental Coping Device • Effective high-level and complex standards for representing: • data (relations not enough) • applications (regular, strict languages needed) • distributed systems (CORBA, etc.) • processes (UML, business processes, etc.) • etc., etc. DBIS4 - RLC
But also ... • It may be necessary to create new storage techniques to fit new data structures • It will be necessary to invent new storage structures to manage the new complexity • There is need for work at both • the implementation level and • the usability level DBIS4 - RLC
Lecture 2 • New Requirements on DB Functions • Why Relations Won't Do • Extending Relations: • Historical and Deductive Databases • Object Relational Databases • Oracle Objects, SQL3, etc. • Object Oriented Databases • intro only DBIS4 - RLC
New Applications withNew Requirements 1. CAD, CAE, CIM 2. Computer Aided Software Engineering 3. Office Information Systems 4. Geographic Information Systems 5. Hypermedia Systems Data is large, often graphical, multiple versions required, data is complex DBIS4 - RLC
Requirements which carry over from Traditional Applications • Efficient access to large amounts of data • Recovery mechanisms • Security mechanisms • Data independence • Distribution of data DBIS4 - RLC
Requirements Modified by the New Applications I • Transactions • in traditional applications, these are short - milliseconds to book a seat • in novel applications, they may be long - hours or days to edit a design • in traditional applications they are competitive - don't book the same seat twice • in novel applications they may be co-operative e.g. collaboration on design development DBIS4 - RLC
Requirements Modified by the New Applications II • Integrity Constraints are much more important • as the data is more semantically complex • some of the semantics is best expressed as constraints • User Interfaces play a greater rôle • the data is manageable only if appropriate visualised • complex operations must be made usable DBIS4 - RLC
Requirements Modified by the New Applications III • Data is organised differently Trad. Apps Novel Apps Numbers of Objects Large Small Number of Types Small Large Object size Small Large/Huge DBIS4 - RLC
New Requirements Made by the New Applications I • Complex Data Structures • Just sets of records won't do • Object identity easier than primary keys • Implicit references easier than foreign keys • First Normal Form is a Killer! • Multimedia Data Types DBIS4 - RLC
New Requirements Made by the New Applications II • The Database must hold Code • to hold complex derived data • to hold "active values" • Multiple Versions • We only want one bank account record at any time • But many alternative designs • Building configurations becomes a problem DBIS4 - RLC
Can We Go On UsingRelational DBMS? Only with increased mapping problems The RM only has two ways of relating two pieces of data: • They are in the same record. • They are in two records connected by a foreign key. DBIS4 - RLC
The Semantic Poverty of the RM • The former is used for: • grouping attributes • 1-1 relationships • compound attributes • connecting keys of M-N relationships • The latter is used for: • multi-valued attributes • sub-typing • one-many attributes DBIS4 - RLC
Other Problem with RDBs • You can't do recursive queries • e.g. "Return all the ancestors of X" • Nor much support for constraints • e.g. "All employees earn less than their boss" • You can't add new operations • e.g. "Return the volume of a building" • Impedance mismatch • if you have use a PL this has a different data model than does SQL DBIS4 - RLC
Three Approaches for Progress Start with traditional DBMSObject-Relational System and extend its modelling power or Start with rich data model Object Oriented DBMSand addDBMS facilities or Start with a Programming Persistent Prog LanguageLanguageand addDBMS facilities Manifesto Wars DBIS4 - RLC
The Third-Generation Database System Manifesto I Three tenets: • Besides traditional data management services, third generation DBMSs will provide support for richer object structures and rules • Third generation DBMSs must subsume second generation systems • Third generation DBMSs must be open to other subsystems DBIS4 - RLC
The Third-Generation Database System Manifesto II Thirteen Propositions: Rich type systemInheritance Functions/encapsulationOIDs only if no primary key Rules (triggers and constraints) are important The query language should be central to all access Manual&Automatic Collections Update through views Performance and data model should be kept separate Multiple Prog. LanguagesSQL is the de facto standard Persistent extension of languages is good Network communication through queries and results DBIS4 - RLC
The Object Oriented System Manifesto I Mandatory Features: Complex Objects Object Identity Encapsulation Types and Classes Inheritance Late binding Ad hoc queryingExtensibilityPersistence Efficient storage ConcurrencyRecovery Computational completeness Disagreement: Integrity constraints DB Admin Tools Views Schema Evolution Tools DBIS4 - RLC
The Object Oriented System Manifesto II Optional Features: Multiple inheritance Type checking Distribution Design Transactions Versions Open Choices Programming paradigm Type system Uniformity DBIS4 - RLC
The Third Manifesto The relational model is still important and OO features should be orthogonal Like: relations relational algebra up front integrity constraintsmutiple and single inheritance computational completenessstatic type checking Don't like SQL, object Ids and null values DBIS4 - RLC
Two Extensions of RDBMS • Historical DBMS • keep all past states of the database • Deductive DBMS • derived data as well as base data • uses a language like Prolog to add the derived data DBIS4 - RLC
Historical DBMS Old records are kept when they are deleted to answer queries like "give balance on 1/10/88?" Records have two extra fields - creation and deletion dates delete sets the deletion field insert sets the creation field update sets the deletion field and creates a new record Two notions of time: when the data is valid and when it is entered DBIS4 - RLC
Deductive DBMS (DDB) A DDB is made up of two kinds of component: facts are simple base assertions - i.e. records father( jane, john ) mother( jill, jane) rulesare ways of deriving more facts grandfather( C, G ) :- parent( C, P ), father( P, G ) parent( C, P ) :- father( C, P ), etc. Queries are rules with variables to be filled in: grandfather( X, john )? - who are john's grandchildren DBIS4 - RLC
Object-Relational Databases • Also known as: • Extended relational databases • Complex object databases • Main features • get rid of First Normal Form • add methods to tables • Main examples • Oracle 8/i onwards, SQL3, Infomix DBIS4 - RLC
The Main Additions to RDBs • User defined abstract data types • Row types so that one value can include a nested complex value • Collection types for domains • Inclusion of user-defined functions defined on types • Inheritance • Multimedia data types and large objects DBIS4 - RLC
SQL3 (Evolving Standard) • This is a massive extension to SQL and has: • computational completeness • row types • user-defined types • user-defined procedures, functions and operators • type constructors for arrays, sets, lists and multisets • support for large objects - BLOBs and CLOBs • recursion DBIS4 - RLC
Row Types in SQL3 • A row type is a sequence of field name/type pairs - i.e. the type of a row of a table • In SQL3 it can also be the domain of a column create table Branch( branchNo longInt, address row( street varchar(20), city varchar(20) ) ); • Row types can be named: create row type EmpRT( Ename varchar(35), age integer ); create table Employee of type EmpRT; DBIS4 - RLC
User-Defined Types (UDTs) in SQL3 • These are a means of defining new domain types in SQL3, e.g.: create type StaffNumberType as varchar(5) final; • More generally a UDT is an abstract data type with: • (non First Normal Form) fields • constructor methods • observer and mutator (get and set) methods • general methods DBIS4 - RLC
UDT Example create type personType as ( private dateOfBirth Date, publicfname VARCHAR(15) not null, publiclname VARCHAR(15) not null, function age(p PersonType) returns integer return /* code to calculate age */ end ) ref is system generated // see later instantiable // if not, only subtypes are not final; // can have sub-types DBIS4 - RLC
Subtypes and Supertypes • Given a type, we can create a subtype, e.g.: create type StaffType under PersonType as ( staffNo varchar(6), etc. • This works by creating an extra attribute which refers to a PersonType value • This also works at the table level: create table Manager under Staff( MgrStartDate Date); • This creates a table with all the columns of Staff duplicated and all manager records in both tables DBIS4 - RLC
References • In SQL3 it is possible to set up OID style references. • On slide 46 we said that PersonType had system-generated references, so we can do: create table Branch as ( branchNo integer, address addressType, manager ref(PersonType) ..... ) • In this, the value is a system-generated OID DBIS4 - RLC
Collection Types • SQL3 supports four collection types: ARRAY - one dimensional fixed length array LIST - ordered and allows duplicates SET - unordered and does not allow duplicates MULTISET - unordered and allows duplicates • E.g. if PersonType has an attribute: nextOfKin set(PersonType) • The following makes sense: select fName, lName, count(NextOfKin) DBIS4 - RLC
Triggers • Triggers are pieces of code which act when some condition is met. Each trigger defines: • the event and whether to act before or after it occurs • whether to operate on each row or only once • what to do create trigger MailNewStaffNextOfKin after insert on Staff referencing new row as ST begin insert into StaffToMail values ( select P.name, P.address from Person where ST.nextOFKin[1] = ST.staffNo ) end DBIS4 - RLC