420 likes | 525 Views
SBA (Stack-Based Approach) and SBQL (Stack-Based Query Language). Presentation prepared for OMG Object Database Technology Working Group OMG TECHNICAL MEETING, Anaheim, CA USA September 25-29, 2006 by Prof . Kazimierz Subieta
E N D
SBA (Stack-Based Approach) and SBQL (Stack-Based Query Language) Presentation prepared for OMG Object Database Technology Working Group OMG TECHNICAL MEETING, Anaheim, CA USASeptember 25-29, 2006 by Prof. Kazimierz Subieta Polish-Japanese Institute of Information Technology, Warsaw, Poland subieta@pjwstk.edu.pl http://www.ipipan.waw.pl/~subieta SBA/SBQL pages:http://www.sbql.pl
What is SBA and SBQL? • SBA is a conceptual frame for developing O-O database query/programming languages • Query languages are programming languages. • SBQL is a model query language according to SBA. • It has the same role and meaning as object algebras, but it is formally sound and much more universal. • SBA/SBQL deal with various data models and all imaginable and reasonable query constructs. • Abstract implementation is the basic paradigm of formal specification of semantics.
General architecture of query processing • Actually, we do not fix the architecture • It can be similar to SQL or ODMG architectures (server-side query processing, ODBC, ADO or JDBC style, queries embedded in popular programming languages) • It can be similar to Oracle PL/SQL (programs integrated with queries, client-side query processing) • Shifting query processing and optimization to the client side • Lower workload for the server better overall performance. • More flexible for query optimization.
Detailed client-server architecture Software development environment (editor, debugger, etc.) Client Parser of queries and programs Syntactic tree of a query/program Strong type checker Optimization by rewriting Optimization by indices Interpreter of queries & programs Static ENVS ENVS Static QRES QRES Volatile (non-shared) objects Local metabase Network Register of views Register of indices Object manager Server Metabase of persistent objects Processing persistent abstractions (views, stored procedures, triggers) Administration Transactions Persistent (shared) objects
Object model and database schema • … are inevitable parts of a query language. • The application programmer must be aware what the database contains and how it is organized. • Usually, an object model and a database schema language are presented at the beginning of the given specification, c.f. ODMG • The model involves such concepts as types, classes, interfaces, joined into a coherent whole as a schema language, c.f. ODL. • However, the concepts are difficult, especially types. • Introducing them at the beginning usually results in inconsistencies. • Hence, we must first understand the semantics of a query language on the ground of an abstract object store model. • First, realize what is the semantics of a query language, then define the corresponding type system.
SBA semantics of QL-s – general point of view • Query - all syntactically correct queries • State - all states (not only database states) • Result - all possible query results. • Semantics of any query is a function that maps State→Result • Closure property assumes that a state and a result are sets of objects • In SBA a state contains objects(but not only objects) and a result never contains objects • Closure property is conceptual nonsense.
What is State? • State includes all data or programming features that can influence the result of some query, in particular: • Database state • Local objects used in queries on the client side • Computer and software environment (e.g. date, time) • Libraries, procedures, functions, classes, views, etc. • State also includes structures that determine the run-time environment of computations. • In SBA there is one such structure: environment stack (ENVS) - an extended and modified call stack. • state = object store + ENVS
Is ENVS purely implementation notion? • No. The environment stack is a conceptual notion. • ENVS makes it possible to specify precisely the semantics of query languages, … • … the mechanisms of classes, roles, static and dynamic inheritance, ... • … (recursive) procedures, parameter passing, database views,... • … etc. • In SBA we deal with ENVS on an abstract level. We are not interested in its physical implementation. • Implementation can be different, introducing many optimizations. • Usually ENVS is a client-side data structure stored in main memory. • The main roles of ENVS: determining scopes for names and binding namesoccurring in queries.
What is Result? • Query can return any stored or computed value. • For instance, query 2+2 returns 4. • Query can return references(OID, file name, memory address, etc.). • For instance, query Person returns references to person objects. • Queries can return nested complex values consisting of atomic values, references, names, structure constructors and collection constructors. • SBQL queries never return objects. • Objects are stored within the object store only.
Query result stack, QRES • Temporary and final query results are accumulated on the query result stack, QRES. • QRES is a client-side structure stored in main memory. • QRES must be prepared to store in a single section any complex query result. • QRES is not a component of State • … because the result of a new query does not depend on the previous QRES state. • In SBA precise specification of the QRES mechanism is fundamental.
Example of QRES state 15 i17 struct{ x(i61), y(i93) } bag{ struct{ n("Doe"), s(i9)}, struct{ n("Poe"), s(i14)}, struct{ n("Lee" ), s(i18)}} top the only visible stack section invisible stack sections bottom
Total internal identification • Each database or program entity, which could be separately retrieved, updated, inserted, deleted, authorized, indexed, protected, locked, should possess a unique internal identifier. • We are not interested in the form and meaning of internal identifiers. • Unique internal identifiers should be assigned to all components of objects, including atomic ones. • The principle makes it possible to make references and pointers to all possible entities, thus to avoid conceptual problems with binding, scoping, updating, deleting, parameter passing, and other functionalities that require references as query primitives. • ODMG does not follow the idea. • ODMG „literals” (components of objects) have no identifiers. • I consider this a fundamental conceptual flaw.
Object relativism • If some object O1 can be defined, then object O2 having O1 as a component can also be defined. • No limitations concerning the number of hierarchy levels of objects. • Objects on any hierarchy level should be treated uniformly. • An atomic object (having no attributes) should be allowed as a regular data structure. • Object relativism implies the relativism of corresponding query capabilities. • There is no need for attributes, sub-attributes, etc. - all are objects too. • The idea radically reduces a database model, cuts the size of specification of query languages, the size of implementation, and the size of documentation. • It much supports query optimization and strong typing.
Abstract Object Store Models • A component of State is an object store. • To define the semantics of a query language we have to define an object store precisely, but on the abstract level. • Because various object models introduce a lot of incompatible notions, SBA assumes some family of object store models which are enumerated M0, M1, M2 and M3. • M0 covers relational, nested-relational and XML-oriented databases. M0 assumes hierarchical objects and binary links between objects. • Advanced store models introduce classes and static inheritance (M1), object roles and dynamic inheritance (M2), and encapsulation (M3). • All the models are served by SBQL. • These store models are pivots - they can be extended and modified, depending on features that one would like to cover.
Notions common to store models • Internal object identifier (OID) • Uniquely identifies an object in the store. • Assigned automatically, no external meaning. • Used as a reference or a pointer to an object. • External object name • Usually bears some external semantics of an object, e.g. Person, Customer. • Explicitly assigned by a database designer, programmer, etc. • It is usually not unique, e.g. many objects named Person. • Atomic object value • Cannot be subdivided into smaller parts • E.g. 2, 3.14, “Doe”, “Hello, World!”. • The size is not constrained – from 1 bit to gigabytes. • So far we neglect types (we deal with types later).
M0 : Complex Objects and Pointer Links I - a set of internal identifiers N - a set of external names V - a set of atomic values • No record, tuple, array, set, etc. constructors in the model: essentially all of them are collections of objects. • External names are not unique: modeling collections (bags). • Uniform treatment of relational, nested relational, etc. databases. < i, n, v > - atomic object < i1, n, i2 > - pointer object < i, n, T > - complex object, T is a set of objects R I – start identifiers < i, n, φ > object object ID object name object value
M0 object store - example < i1, Emp, { < i2, name, ”Doe” >, < i3, sal, 2500 >, < i4, worksIn, i17 > } > < i5, Emp, { < i6, name, ”Poe” >, < i7, sal, 2000 >, < i8, worksIn, i22> } > < i17, Dept, { <i18, dname, ”Trade” >, < i19, loc, “Paris” >, < i20, loc, “London” >, < i21, employs, i1 > } > < i22, Dept, { < i23, dname, ”Ads” >, < i24, loc, “Rome” >, < i25, employs, i5 >, < i26, employs, i9 > } > Start identifiers i1, i5, i9, i17, i22 Objects < i9, Emp, { < i10, name, ”Lee” >, < i11, sal, 900 >, < i12, address, { <i13, city, “Rome” >, <i14, street, “Boogie” >, <i15, house#, 13 > } >, < i16, worksIn, i22 > } >
M0 object store – graphical view i1Emp i5Emp i9Emp i10 name ”Lee” i2 name ”Doe” i6 name ”Poe” i11 sal 900 i3 sal 2500 i7 sal 2000 i4 worksIn i8 worksIn i12 address i13 city ”Rome” i14 street ”Boogie” i15 house# 13 i16 worksIn i17Dept i22Dept i23 dname ”Ads” i18 dname ”Trade” i24 loc ”Rome” i19 loc ”Paris” i25 employs i20 loc ”Rome” i21 employs i26 employs
A relational database in M0 name Doe Poe Lee sal 2500 2000 2000 worksIn Production Sales Sales Relational schema: Emp( name, sal, worksIn ) Model M0: Objects: < i1 , Emp, { < i2, name, ” Doe” >, < i3, sal, 2500 >, < i4, worksIn, ” Production” > } >, < i5 , Emp, { < i6, name, ” Poe” >, < i7, sal, 2000 >, < i8, worksIn, ” Sales” > } >, < i9 , Emp, { < i10, name, ” Lee” >, < i11, sal, 2000 >, < i12, worksIn, ” Sales” > } > Start identifiers: i1 , i5 , i9 • A similar mapping can be applied to hierarchical DB, nested relational DB, XML, RDF, … Relation: Emp
Environment Stack, ENVS • ENVS is also known as call stack. • For query processing we modified and generalized it: • ENVS is used to binding objects that are stored at a server, hence ENVS contains references to objects rather than object values. • The same object can be referenced from different stack sections. • For collections the binding is macroscopic, for instance, if Emp is bound, the binding returns many references. • In PLs the stack has usually two incarnations: static (compile time) and dynamic (run-time). • Because database objects are always dynamically bound, some properties of a static stack must be shifted to a dynamic stack. • We deal with the static stack when we consider strong typing. • Besides classical roles of the stack, SBA provides many new roles of it, in particular, processing non-algebraic operators.
Naming, scoping, binding • SBA is based on the naming, scoping and binding paradigm: • Every name occurring in a query is bound to run time program or database entities, according to the actual scope for the name. • Binding is substituting a name occurring in a query by a run-time program entity (or entities). • This concerns all names, in particular: • Names of persistent or volatile objects, subobjects (attributes), pointers, procedures, functions, methods, views, parameters. • Names of entities from the computer or software environment • Any auxiliary names that are defined and used in queries • ENVS presents a universal scoping and binding mechanism. • No name occurring in a query can be bound otherwise. • ENVS stores binders, i.e. pairs n(r), where n N, rResult.
Opening a new section of ENVS (1) • In PLs opening a new scope on ENVS is caused by entering a new procedure (function, method) or entering a new block. • Respectively, removing the scope is performed when the control leaves the body of the procedure/block. • To these classical situations we add a new one. • It is the essence of SBA. The idea is that some query operators (called non-algebraic) behave on the stack similarly to program blocks. • In the SBQL query: Emp where ( name = “Poe” and sal > 1000 ) the part ( name = “Poe” and sal > 1000 ) behaves as a program block executed in an environment consisting of the interior of an Emp object. • Binding concerns also names name and sal. • Hence, we push on ENVS a section with the interior of the currently processed Emp object (next slide).
Opening a new section of ENVS (2) condition Empwhere (name = ”Poe” andsal > 1000) binding binding name(i10) sal(i11) address(i12) worksIn(i16) Emp(i1) Emp(i5) Emp(i9) Dept(i17) Dept(i22) Interior of the 3-rd object Emp Emp(i1) Emp(i5) Emp(i9) Dept(i17) Dept(i22) Initial ENVS state. bind( Emp ) = {i1, i5, i9} ENVS during evaluation of the condition for the third object Emp. bind( name ) = i10; bind( sal ) = i11
Function nested – computing object’s interior • Function nested acts on an object reference and returns its interior as a set of binders. For instance: • The result of nested is then pushed at ENVS. i9Emp i10 name ”Lee” i11 sal 900 i12 address i13 city ”Rome” i14 street ”Boogie” i15 house# 13 i16 worksIn nested( i9 ) = { name( i10 ), sal( i11 ), address( i12 ), worksIn( i16 ) }
Generalization of function nested • In general, it can be applied to any element of Result. • For a complex object <i, n, { <i1, n1,...>, <i2, n2,...>, ... , <ik, nk> }> it holds: nested( i ) = { n1(i1), n2(i2), ... , nk(ik) } • The case is illustrated on the previous slide. • If i is an identifier of a pointer object <i, n, i1>, and the object store contains the object <i1, n1, ... >, then nested( i ) = { n1(i1) } • This accomplishes navigation according to a pointer. • For a bindern(x) holds: nested( n(x) ) = { n(x) } • According to understanding of auxiliary names introduced in queries. • For a structurenested returns the union of the results of the nested function applied for elements of the structure: nested( struct{ x1, x2, ... } ) = nested(x1) nested(x2) ... • For other arguments nested returns the empty set.
Definition of Result for SBQL • Any atomic value belongs to Result. • Any reference (OID) belongs to Result. • Ifxbelongs to Result, then any bindern(x) belongs to Result. • If x1, x2, x3, ...belong to Result, then the structurestruct{ x1, x2, x3, ... } belongs to Result. • In contrast to typical structures, we do not assume that all elements of a structure must be named. • Empty structures are not allowed. • If x1, x2, x3, ... belong to Result, then bagbag{x1, x2, x3, ... } and sequencesequence{x1, x2, x3, ... } belong to Result. • bag and sequence are collection constructors. • Other collection constructors are possible.
Summing up: what we have defined so far? • We know precisely what is an object store, atomic object, complex object, pointer object and collection. • We know precisely what is the construction of an environment stack ENVS, what it is for, what is binding, and how a new section on the stack is constructed (binders, function nested). • We know precisely what is a query result and a query result stack QRES. • Abstract implementation of a query language has the form of the recursive procedure eval (evaluation of a query). • This is all the semantic equipment to define SBQL and its abstract implementation for the M0 store model. • For details see http://www.sbql.pl
Examples of SBQL queries for M0 • Get references of departments for employee named Doe: (Emp wherename = “Doe”).worksIn.Dept • Get names of departments together with their average salaries: (Dept join avg(employs.Emp.sal)asavgsal) . (dname, avgsal) • Names and cities for employees working in the department managed by Kim: (Dept where (boss.Emp.name) = “Kim”).employs.Emp.(name, ifexists(Address) thenAddress.cityelse “No address”) • Get departments employing a professional for any job in the company. Dept wheredistinct(Emp.job) asj (employs.Emp (j = job)) • Names and salaries of employees earning more than their bosses. (Emp where sal > (worksIn.Dept.boss.Emp.sal)).(name, sal)
M1 : Classes and static inheritance • Classes, methods and inheritance require extension of M0. • Classes have two incarnations: as pieces of a source code and as run-time database entities. • Usually programming languages deal with classes as second-class citizens, i.e. in the source code only. • In our model we are (so far) not interested in this point of view. • We deal with them when we consider static binding and strong typing. • In the M1 store model classes are first class entities storing invariant properties of their objects, i.e. methods (but not only). • Hence in our model classes are objects too, connected with their member objects by a special relationship. • Classes are also connected with classes by another relationship know as inheritance.
Classes as objects in M1 i40PersonClass i41 age (...code...) ... inherits from i50EmpClass member of i51 changeSal (...code...) i52 netSal (...code...) ... i1Person member of member of i2 name ”Doe” i5Emp i9Emp ... i6 name ”Poe” i10 name ”Lee” i7 sal 2000 i11 sal 900 i8 worksIn i16 worksIn ... ... i33 i22
SBQL semantics for M1 • Changes concern only ENVS and non-algebraic operators • When a non-algebraic operator processes an object <i, …>, which is a member of a class <iC1, …>, which inherits from a class <iC2, …>, etc. then the ENVS is augmented (starting from the top) by nested(i), nested(iC1), nested(iC2), …up to the most general class. • When a non-algebraic operator finishes processing the object <i, …>, all these sections are removed from ENVS. During processing the object <i, …> nested( i ) nested(iC1) nested (iC2) ….. Before processing the object <i, …> After processing the object <i, …> Previous ENVS state Previous ENVS state Previous ENVS state
Example: Processing an object in M1 (Empwherename = “Poe”) . (name, netSal, age) • ENVS during processing the subquery after the dot: name(i6) sal(i7) worksIn(i8) … changeSal(i51) netSal(i52) ... age(i41) ... … Person(i1) ... Emp(i5) Emp(i9) .. ... nested(i5) - internals of the currently processed Poe’s object nested (i50) – internals of EmpClass nested (i40) – internals of PersonClass Binders to database objects Sections pushed by the dot
Some peculiarities of M1 • Binding and processing methods: • Invocation of a method means that a new section (activation record) is additionally pushed at top of ENVS. • The section contains parameters of the method (evaluated previously), its local environment and a return track. • Some peculiarities connected with encapsulation. • A problem - multiple inheritance: • M1 allows for multiple inheritance, but in case of name conflict there is no solution. • This is a general problem, not specific to M1. • Another problem - collections: • They violate object-oriented principles such as substitutability and open-close (reuse, conceptual continuation). • Possible solutions require specific extensions of M1.
Examples of SBQL queries for M1 - schema Person[0..*] name birthYear age() Address [0..1] city street house# Emp[0..*] e# job[1..*] sal[0..1] changeSal(newSal) netSal( ) UML-like, but: • Cardinalities assigned to all database entities • Nested classes • Pointers rather than association roles Dept[0..*] d# dname loc[1..*] budget() worksIn employs[1..*] boss manages[0..1]
Examples of SBQL queries for M1 • Get names of departments and the average age of their employees (inheritance of the method age). Dept . (dname, avg(employs.Emp.age)) • Get employees that for sure live in the cities where their departments are located (inheritance of Address). Emp whereAddress as a ( (worksIn.Dept.loc)as l (a.city = l)) • For each employee get name and the percent of the annual budget of his/her department that is consumed by his/her sal. Emp . (name, (((ifexists(sal) thensalelse 0) ass). ((s * 12 * 100)/(worksIn.Dept.budget))) • For each person having no salary give the minimal salary in his/her department. for each(Emp where not exists(sal)) ase do e.changeSal( min(e.works_in.Dept.employs.Emp.sal) );
M2: Dynamic roles and dynamic inheritance • The object model with dynamic object roles removes essential conceptual drawbacks of the classical static inheritance. • The idea is that an object during its life can acquire and lose its roles without changing its identity. • Object’s business semantics depends on a currently considered role. • SBQL is the first (and only) QL dealing with dynamic roles. • Dynamic object roles and dynamic inheritance require extension of M1 and extension of the semantics of non-algebraic operators. Person Employee Club-member Patient Student Student Dog-owner Tax-payer
Example of the M2 store model i40 PersonClass i41 age(...code...) ............. i50 EmpClass i60 StudentClass i51 changeSal (...code...) i61 avgScore (...code...) i52 netSal (...code...) ............. ............. i13Emp i16Emp i19Student i14 sal 2500 i17 sal 1500 i20 studentNo 223344 i15 worksIn i18worksIn i21 faculty”Physics” i127 i128 i1Person i4Person i7Person i2 name ”Doe” i5 name ”Poe” i8 name ”Lee” i3 born 1948 i6 born 1975 i9 born 1951 is member of inherits from dynamically inherits from
SBQL semantics for M2 • Changes concern only ENVS and non-algebraic operators • The order of sections of roles and classes on ENVS is determined by a simple rule (c.f. full description of SBA/SBQL). • Some new operators dealing with roles (dynamic cast, has role). (Empwherename = ”Lee”) . (sal, born, age) Properties of the currently processed Emp role Properties of the EmpClass Properties of the Person super-role of the Emp role Properties of the PersonClass Database section sal(i17) worksIn(i18) changeSal(i51) netSal(i52 ) ... name(i8) born(i9) age(i41) ... ......... Person(i1) Person(i4) Person(i7) Emp(i13) Emp(i16) Student(i19) ... ......... Sections pushed by the dot
Examples of SBQL queries for M2 • Get employees older than 60 who live in Warsaw (dynamic inheritance of the attribute Address and static inheritance of the method age). Emp where age > 60 andAddress (city = “Warsaw”) • For each person get name and the sum of all the incomings (salary and scholarships). (Personasp). (p.name, sum(bag(0, ((Student)p).scholarship, ((Emp)p).sal))) • Get students who live in the same city as the city of their school. StudentwhereAddress (city = (studiesAt.School.city)) • Get name, faculty and school name for each person studying at two or more faculties. (((Personasp) join ((((Student)p) group ass))) wherecount(s) ≥ 2). (p.name, s.(faculty, (studiesAt.School.name)))
Conclusions • To make a high quality standard for object-oriented databases, the specification of semantics is the must, … • …to avoid the fate of SQL-99 and ODMG standards, perceived as loose recommendations rather than technical specifications. • SBA offers the unique method of query languages’ construction and semantic specification. • SBA is a holistic database theory, it doesn’t give up any (even the most advanced) feature of current practical O-O database QL/PL. • Efficiency has been proven by several implementations. • The new standardization activity should not trust the currently well-known concepts concerning O-O query languages. • IMO: limited, imprecise, immature, inconsistent. • Following them standard’s qualities will be among nice wishes. • So far SBA has no serious competitive approach.
10 unique qualities of SBA/SBQL for a new O-O database standard • Orthogonal syntax, full compositionality of queries. • Universal formal semantics based on abstract implementation. • Computational universality, advanced data structures, integration with PL constructs. • Strong typing of advanced O-O queries and programs. • Several advanced implementations, next are pending. • Fully transparent O-O virtual updatable views. • Strong potential for query optimization. • All O-O notions treated formally and uniformly. • Sound and manageable metamodel. • The potential for distributed query processing.