560 likes | 575 Views
This article evaluates the OODB and 3GDB Manifestos, discussing the motivation, design principles, and features of object-oriented approaches in databases. It also raises issues and controversies surrounding the OODB approach and explores the challenges of consistency, schema evolution, and OOP for end-user programming.
E N D
Why Not Relational? Evaluating the Manifestos The future of databases CS319 Theory of Databases
Evaluating the Database manifestos • A closer look at the OODB Manifesto • + commentary from 3GDB Manifesto • Motivation for OODB revisited • (see intro to 3G Manifesto) • rich objects • (e.g. different abstractions, multi-media etc) • design databases • (e.g. 1-off structures, evolutionary character) • behavioural elements (e.g. program code) • better interfaces CS319 Theory of Databases
‘Object’-oriented approaches in general sense • E-R modelling: • objects as ‘organising real-world observations’ • generic classes + methods to transform state • look at the example O2 DB extract • objects substituted for relation tables • methods to get the information hiding • inheritance to share methods CS319 Theory of Databases
Features of ‘object’-oriented approaches 1 • Grouping of real-world observables by locality and by existence dependency … conspicuous omission is the concept of functional dependency cf. the way in which observations are organised across objects • General themes: • tension with mathematical abstraction • e.g. non-procedural flavour of set not list • not ‘variables with an identity’ CS319 Theory of Databases
Features of ‘object’-oriented approaches 2 • Parody of the OODB position ... • OOP can solve the problems of HLL programming • indeed can achieve end-user-programming • DBs are an aberration that came about because programming used to be much more difficult; now that it's easy, need to break down the conceptual barrier between queries and APs • [possible subagenda: why need for DB anyway?] CS319 Theory of Databases
Principal points in the OODB Manifesto 1 • complex objects • need for orderings etc where not present in relations • [A2.1] any constructor should apply to any object • BUT does this rule out non-1NF relational models? • constructors not orthogonal in the relational model • object identity • [A2.2] object sharing and object updates CS319 Theory of Databases
Principal points in the OODB Manifesto 2 • encapsulation • data part + procedure part • [A2.3] in PL context, use information hiding • data part is part of the implementation • in classical DB context, is data structure hidden? • e.g. APs may use the file structure, queries don't • ambiguity: is relation table part of interface? • in OO setting use info hiding paradigm for DB: • can gain access to tuples etc only as objects CS319 Theory of Databases
Principal points in the OODB Manifesto 3 • encapsulation (cont.) • [A2.5] PS why bother encapsulating ad hoc queries?! • philosophy seems to be OOP is easy enough anyway • cf [3GM 1.3 especially p36]: • “encapsulated operators not enough” CS319 Theory of Databases
Principal points in the OODB Manifesto 3 • types and classes • [A2.4] Controversy in OOP itself re type vs class • class more of run-time notion • Issue • does the user or the system maintain classes? • cf design context: 1-off structures / environments • class type rectangle vs class of universal interest CS319 Theory of Databases
Principal points in the OODB Manifesto 4 • class or type hierarchies: inheritance • see the illustrative example from O2 • [A2.5] have many varieties of inheritance .... • overriding, overloading and late binding • one display method interpreted differently by objects • e.g. display person, bitmap, graph • need late binding for this: • decide which display to invoke at run-time CS319 Theory of Databases
Principal points in the OODB Manifesto 5 • persistence • PLs can be more DB-like: data can survive process execution • computational completeness • [A2.7] PLs with persistence gives character of DB • why live with restricted computational power of SQL? • ideally resource complete … [next step for OO?] CS319 Theory of Databases
Principal points in the OODB Manifesto 6 • extensibility • user defines own types • + no distinction user/system-defined • [A2.9-2.11] wish list: • large DBs, concurrency, recovery ... • NB gives no evidence to suggest OO handles these! CS319 Theory of Databases
Issues raised by the OODB approach 1 • consistency of update: how to maintain FDs? • A proposal: active data values + rules [3GM 1.5, p36] • This is a problematic programming style • flexibility in schema evolution • object redesign is non-trivial • a database is quite unlike a program: • is representation of state, not a state-transition model • is OOP really that easy? • evidence against OOP for end-user programming • ... still needs the guru? CS319 Theory of Databases
Issues raised by the OODB approach 2 • OO modelling doesn't link • real-world observations and computer model • as simply as relational DB design • what is the counterpart of the query language? • [3GM 3.2 p37] can't return to navigation! • can't by-pass the optimiser • schema evolution makes problematic • [3GM p38] "arguments against navigation are • compelling & some programmers just need educating" • invidious arguments from efficiency [3GM 2.4] CS319 Theory of Databases
Issues raised by the OODB approach 3 • Disadvantages of OODBs • no formal semantics • loss of relational simplicity • navigational queries • no general query language • lack of support for dynamic processes • Brown, A.W. • Object-oriented DBs: Applications in S/W Engineering CS319 Theory of Databases
Appraisal of the 3G DB Manifesto • Philosophy of the 3G DB manifesto based on • Extended relational models • [3GM] we can get there from where we are ... you can't throw away the benefits of relational DB theory, can exploit it ... • NB Date and Darwen: • relations with objects as data elements • not objects in place of relations • so that relation = interrelationship amongst objects CS319 Theory of Databases
Concerns re 3G DB manifesto • Approach to issues seems unsatisfactory in many ways: • emphasises pragmatism: it attaches too much importance to whether the solution is realistic NOW • suggests no need to change, just subsume, things • presumes that only evolutionary change is required • A proposal commercially not academically motivated? CS319 Theory of Databases
Concerns re 3G DB manifesto • A proposal commercially not academically motivated? • doesn't clarify what principles matter in relational DBs • buries any pretensions to a good underlying theory • cf Codd and Kent's concern for: • understanding what a good data model is • A database is not interesting just as a utility • Theory of DBs is concerned with fundamental issues for • data modelling that are profoundly relevant to the • design of PLs & data representation beyond computers CS319 Theory of Databases
… this is a good point at which to revisit the case for the relational model, recognising the potential need to generalise what relational theory offers … • whyrel.ppt (slide 34) CS319 Theory of Databases
Where is the future of databases? 1 • A personal perspective • 1. Tension between DB as • real-world model vs program generator • Good way of real-world modelling • ? good way of programming • This was the thesis of Simula (1967), BUT • OOP doesn't deliver on this front? CS319 Theory of Databases
Where is the future of databases? 2 • 1. Tension between DB as • real-world model vs program generator • Contrast association of observables in RDB and OODB • Compare with agent-oriented modelling perspective: • model what each agent observes • model what each agent can act to change • DB as defining real-world STATES • programming as defining BEHAVIOURS • program constructs are about TRANSITIONS CS319 Theory of Databases
Where is the future of databases? 3 • 2. Functional dependency • FDs ... seem to have an ambivalent role on the fringe of the relational theory [Kent: Data &Reality p138] • Fundamental to RDB design: powerful link content-form • ? Idea not sufficiently general in the RDB context • e.g. consider 4NF, 5NF • e.g. relationships within a record CS319 Theory of Databases
Where is the future of databases? 4 • 2. Functional dependency • ? Idea not sufficiently general in the RDB context • e.g. consider 4NF • value in one set of columns determines the set of possible values in another set of columns etc • e.g. relationships within tuple Kent: Data & Reality p111 • (Emp, DoB, Spouse, Spouse_DoB, Wedding_Date) • Issues • This is information re employee (the primary key) • Can find that E's S_DoB is .... but not (E'S)'s DoB • Model doesn't know W_D concerns relationship CS319 Theory of Databases
Where is the future of databases? 5 • 2. Functional dependency • ? Idea not sufficiently general in the RDB context • Relational tables can serve as enumerated functions • so … • Why not functions returning non-tabular structures? • Why not functions that can't be tabulated? • Important semantic distinctions: compare • student determines slot in project timetable • student determines supervisor • student determines project mark CS319 Theory of Databases
Where is the future of databases? 6 • 3. Dependencies between observations are viewpoint dependent • atomicity of data • indivisibility of association between observables • are BOTH influenced by • who you are, and what you're doing • must distinguish between • rules, triggers and constraints CS319 Theory of Databases
Where is the future of databases? 7 • 3. Dependencies between observations are viewpoint dependent • must distinguish between rules, triggers and constraints • cf. observations about a game of cricket include • dependencies that declare indivisibility: • boundary is scored as ball crosses rope • event-driven action: • when ball is received batsman plays shot • constraint: • always at most 4 of the batting side on the field CS319 Theory of Databases
Where is the future of databases? 8 • 3. Dependencies between observations are viewpoint dependent • must distinguish between rules, triggers and constraints • expert systems, deductive databases, ad hoc triggering, prototyping tools, hypercard, spreadsheets ... • ... all use these powerful mechanisms, but have no satisfactory theoretical data modelling foundation CS319 Theory of Databases
Where is the future of databases? 9 • 4. "Computers are only good for logic?!" [HD] • logic is emphatically not about state • - need variables with identity for state • Many great mathematicians contributed to formalising mathematics “... unfortunately, they also died." [BC-S] • logical variables don't have identity … • cf HD mode of reference to data (cf nested relations, atomicity of data) is not a logical concept • cf HD - what is the ROBIN attribute if not the identifier of a BIRD object? CS319 Theory of Databases
Where is the future of databases? 10 • 4. "Computers are only good for logic?!" [HD] • logic isn’t always an appropriate medium for knowledge • representation [Mensa] • This is the basis of a very significant philosophical • argument in AI: the logicist vs. the non-logicist position • A Mensa problem (slides 37-45) illustrates the logicist view of knowledge as rational in an extreme form …. CS319 Theory of Databases
Looking to the future 1 • Emphasis of modern computing: • metaphor not symbolic representation • metaphor: the form reflects the content • e.g. metaphor is behind virtual reality • cf. a postscript file and the image it defines: • - the image is a metaphor for the thing itself • sensory elements are involved in metaphor • cf. no good objective criteria by which the user can choose • the "right" way to represent some given piece of data (p6) CS319 Theory of Databases
Looking to the future 2 • Crystal Ball Gazing ... • key ideas of relational DBs will be taken over & generalised away from relational algebras • the emphasis will shift from representation to metaphor: "database as a real-world model" • a new focus for foundations will emerge, more general than classical logic CS319 Theory of Databases
Where is the future of databases? 11 • 5. Where next? • database is about generating views for different agents • database is a generator of metaphors [cf virtual reality] • technology / medium dependent: if computers could • only generate smells would we have relational DBs? • in general (e.g. concurrent engineering) • no guaranteed consistent view, hence conflicts • + need to represent outside framework of logic CS319 Theory of Databases
Where is the future of databases? 12 • 5. Where next? • … in general, no guaranteed consistent view, hence conflicts • + need to represent outside framework of logic • Classical DB suits where there is sharing + consensus • BUT harder to represent cooperation than consensus • cf individual idosyncratic representations + many different perspectives on data • [cf. Brooks: No Silver Bullet - “the essence of software development”] • Compare seminars and books: book is a milestone, seminars are elusive, incomplete, but fundamentally just as important and more primary CS319 Theory of Databases
Where is the future of databases? 13 • 5. Where next? • … in general, no guaranteed consistent view, hence conflicts • + need to represent outside framework of logic • Consensus operates in many ways at many levels: • agreement about experimental outcomes • language and ritual assigns meaning • object and domain identification • essential milestones in design "progress” CS319 Theory of Databases
Where is the future of databases? 14 • 6. Technical concepts being used to support • Definitive scripts to express FDs between observations via metaphor • Agents + redefinitions to model changes of state • Functions in underlying algebra encapsulate + displace tables in RDB • Modes of definition of variables, different agent viewpoints • ... from experiment to theory aspect • see http://www.dcs.warwick.ac.uk/modelling/ CS319 Theory of Databases
project_table_LHS_FD is project(current_table, makestrlist(FDs[current_FD][1])); project_table_RHS_FD is project(current_table, [FDs[current_FD][2]]); pattern_duplicate_rows is index_duplicated(tail(project_table_LHS_FD)); newcol is transformcol(makelistcol(project_table_RHS_FD), pattern_duplicate_rows); newtable is apply_current_FD_current_table(current_table, newcol); Listing 1: Observables and dependencies in the TLJ construal An observation-oriented model of the testing lossless join algorithm (constructed using tkeden)
End of the module CS319 Theory of Databases
Logic and Commonsense Knowledge 1 • A logical (?) problem [taken from a MENSA publication] • The Captain of the darts team needs 72 to win. Before throwing a dart, he remarks that (coincidentally) 72 is the product of the ages of his three daughters. After throwing one dart, he remarks that (coincidentally) the score for the dart he has just thrown is the sum of the ages of his daughters. Fred, his opponent, observes at this point, that he doesn't know the ages of the Captain's daughters. "I'll give you a clue", says the Captain. My eldest daughter is called Vanessa. "I see", says Fred. "Now I know their ages." • Exercise in inference: What were their ages? CS319 Theory of Databases
Logic and Commonsense Knowledge 2 • There is much domain knowledge and convention that is - or might be - relevant to the solution • ages are integers • ages are positive • ages are restricted to a plausible range of values • “knowing their ages” actually means “knowing the abstract set of ages” (in Mensa-speak) … CS319 Theory of Databases
Logic and Commonsense Knowledge 3 • … “knowing their ages” means “knowing the abstract set of ages” • when Fred observes that he doesn't know their ages, he refers to knowing the set of ages, and not to being able to associate an age with any particular daughter who might turn up at the darts match. • even when Fred says "Now I know their ages", were one or more of the daughters to turn up at the darts match, much more domain knowledge would be required to identify their ages. CS319 Theory of Databases
Logic and Commonsense Knowledge 4 • correct use of "eldest" presupposes that there is only one eldest daughter • what can be scored with one dart is restricted • There are also many conventions of the problem … • For instance, who's doing the reasoning? • if Fred said "Now I know (the set of) their ages" before he knew that the eldest daughter was called Vanessa, would we know their ages? CS319 Theory of Databases
Logic and Commonsense Knowledge 5 • In any case: • why should we attach any significance to the Fred's observation that he doesn't know their ages? As will emerge ... we are meant to suppose that he is very clever, can be sure that everyone else is also equally clever, and has taken full account of all the available information, but he might just be too lazy, ignorant or drunk to be able to factorise 72, or not realise the significance of such factorisation. CS319 Theory of Databases
Logic and Commonsense Knowledge 6 • Solution to the problem • Because Fred doesn't know the ages before he knows that the Captain has an eldest daughter, we know that the value of the first dart is some number v such that xyz=72 and x+y+z=v has more than one solution set {x,y,z}. • The possible sets of factors of 72 are • {1,1,72}, {1,2,36}, {1,3,24}, {1,4,18}, {1,6,12}, {1,8,9}, • {2,2,18}, {2,3,12}, (2,4,9},{2,6,6}, {3,3,8}, {3,4,6} CS319 Theory of Databases
Logic and Commonsense Knowledge 7 • Solution to the problem • The possible sets of factors of 72 are • {1,1,72}, {1,2,36}, {1,3,24}, {1,4,18}, {1,6,12}, {1,8,9}, • {2,2,18}, {2,3,12}, (2,4,9},{2,6,6}, {3,3,8}, {3,4,6} • These are the associated sums of factors; they correspond to the value of first dart: • {1,1,72}: 74, {1,4,18}: 23, • {1,2,36}: 39, {1,3,24}: 28, {1,6,12}: 19, {1,8,9}: 18 • {2,2,18}: 22, {2,3,12}: 17, (2,4,9}: 15, {2,6,6}: 14 • {3,3,8}: 14, {3,4,6}: 13 • The only relevant information here is that there is just one way in which two distinct sets of ages generate the same sum viz. {2,6,6}: 14, {3,3,8}: 14 CS319 Theory of Databases
Logic and Commonsense Knowledge 8 • Solution to the problem • there is just one way in which two distinct sets of ages generate the same sum viz. {2,6,6}: 14, {3,3,8}: 14 • If we know that there is an eldest daughter, this rules • out the possibility that their set of ages is {2,6,6}, so • Vanessa is 8 etc. • Some interesting irrelevant information might have • played a role in getting the answer had the problem • been more subtle. For instance: {1,1,72} & {1,4,18} are • impossible because of the constraints on the value of • the first dart, whilst {1,2,36} is implausible if the girls • really have the same mother. CS319 Theory of Databases
Logic and Commonsense Knowledge 9 • Moral: real-world inference is not abstract logic but situated reasoning in which many incidental observations about the nature of the world determine what can be inferred. Such inference uses premises that are acts of faith. • Data modelling techniques need to be suitable for this ... CS319 Theory of Databases
Logic and Commonsense Knowledge 10 • I am at a conference in the Netherlands. • I arrive late at night and hardly notice where my room is. • Next morning, I notice that my room is on the top floor. • I walk down to breakfast thinking about my talk later on. • After breakfast I meet two other delegates X and Y. • We get in the lift to return to our rooms. CS319 Theory of Databases
Logic and Commonsense Knowledge 11 • X presses the button for floor 3. • Y says he is on the floor above X, and selects floor 4. • Since the top button is selected, I don’t press a button. • We talk as we ascend. The lift stops. The door opens. • The floor numbers aren’t clearly marked. • I say to X – ‘this must be floor 3’ – he gets out. CS319 Theory of Databases
Logic and Commonsense Knowledge 12 • Y and I carry on talking. • When the lift next stops, the floor is still unclear. • I say to Y ‘X is on the floor below you; this is your floor’. • Y gets out. I think something is not quite right. • I think ‘is this the top floor?’ and ‘should I get out?’. • I’m unsure, but notice that the button for floor 4 is still lit. CS319 Theory of Databases
Logic and Commonsense Knowledge 13 • I proceed to the top floor which is the next floor, floor 4. • When I get out of the lift, I can’t find my room. • There’s no room where my room is on floor 4. • I walk down to floor 3, and pass Y on his way to floor 4. • When I reach floor 3, I meet X coming up from floor 2 … • How did I manage to get all 3 of us to the wrong floor? CS319 Theory of Databases