1 / 29

A Data Model for Multidatabases: Don’t Integrate, Coordinate!

A Data Model for Multidatabases: Don’t Integrate, Coordinate!. John Mylopoulos Department of Computer Science University of Toronto Luciano Serafini and Fausto Giunchiglia Department of Computer Science University of Trento. A Motivating Example. Consider a company database:

paiva
Download Presentation

A Data Model for Multidatabases: Don’t Integrate, Coordinate!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Data Model for Multidatabases: Don’t Integrate, Coordinate! John Mylopoulos Department of Computer Science University of Toronto Luciano Serafini and Fausto Giunchiglia Department of Computer Science University of Trento

  2. A Motivating Example • Consider a company database: Cust(name,addr,phone) Sales(custName,prod#,price,amount,date) Prod(prod#,name,price,inStock) • Each salesperson leaving for a trip downloads parts of the Prod, Sales and Cust relations. On their trip, they update customer, and sales information. • Each of these databases evolves autonomously from the original, and there is no global manager. However, we’d like to enforce coordination rules, such as: “Updates to a customer address must be propagated to other databases”

  3. Managing Data in Multidatabases • Within such a context, it makes sense to assume that the databases participating in a multidatabase coalition need not be connected for part of the time. Unavailability is not a failure, but a fact of life • Nevertheless, we’d like to be able to perform some forms of (soft) constraint enforcement as well as weak forms of distributed query processing. • In addition, we’d like to make sure that our model is founded on a logical foundation, much like the Relational Model (e.g., [Reiter84]).

  4. Outline • The rest of the talk covers the following topics: • Coordination rules; • Correspondence rules and query processing; • A formal semantics for the multidatabase problem; • Directions for further research.

  5. Coordination Rules • These are soft inter-database constraints. They are checked every time there is an update to one of the relevant databases, and are enforced through some protocol; for example, • Master:Cust(n,x) and Paolo:Cust(n,y) --> x=y) propagate last /* the latest addr is propagated to the other database */ • Master:Prod(p#,p) and Paolo:Prod(p#,p’) --> p=p’) propagate (Master->Paolo) /* the Master copy prices are always propagated to the other databases when there is a discrepancy, not the other way around */

  6. Expressions are Relative • All expressions appearing in a coordination rule are relative to one of the participating databases, e.g., Master:Cust(n,x), Paolo:Cust(n,y) • Expressions with no associated database are shorthands, e.g., for the rule M:Cust(n,x) and P:Cust(n,y) --> x=y) ‘x=y’ is a shorthand for ‘M:x=y and P:x=y’

  7. More Coordination Rules • Luciano:TravelB=x and Paolo:TravelB=y and Fausto:TravelB=z --> x+y+z=15MLit equi-distribute /* if their total budget is x > 15MLit, reduce each budget by (x - 15ML)/3 */ • Master:Prod(p#,n) and Paolo:Prod(p#,n’) --> n=n’ undo /* no updates allowed to product names */ • Master:Prod(p#,.) == Paolo:Prod(p#,.) propagate (Master->Paolo) /* propagation of values always proceeds from the Master to Paolo’s database */

  8. Restoration Modality • The last part of each coordination rule (in red in the examples of the previous slides) describes the restoration modality of the rule, I.e., the means by which the rule will be enforced if it is violated. • For example, • Propagate last -- propagate the last update; • Undo -- undo last update; • Equi-distribute -- restore a particular numerical sum constraint by reducing each of the participating variables; • Propagate (A -> B) -- update B to make it consistent with A; • ...

  9. Enforcement Protocol • The enforcement protocol can be characterized along (at least) two dimensions: • When are they enforced (ASAP, periodically, ...) • What force do they have over participating databases (constraints, guidelines, suggestions,...) • The protocol includes an optional restoration action, but also one or more followup actions, e.g., amend a coordination rule, delete a rule, add more rules,... • Important to stress that since the databases are assumed to be autonomous, the enforcement of any coordination rule may be refused in particular cases, and/or the rule may be amended to reflect a new coordination arrangement.

  10. Acquaintances • Each database has zero or more acquaintances; these are other databases with which it can share coordination rules. • Each database keeps track of its acquaintances through the Acq relation: Acq(name,eAddr,owner,startDate) • In general, there is no central coordination, no global schema, and no database knows all the participating databases,

  11. Coordination Rules, Again • Each coordination rule is expressed locally using the database names found in the Acq relation. • Since databases may have different names in different databases, the same coordination rule will have slightly different forms depending on the database with respect to which it is expressed. • Coordination rules for a given database are stored in the CR relation: CR(name,expr,startDate)

  12. Cooperation Rules • These are coordination rules involving the two special relations Acq and CR. • For example, we can say that A and B have the same acquaintances: • A:Acq(x,y,z) == B:Acq(x,y,z) propagate (B -> A) or, that A has all the coordination rules of B • A:CR(nm,e) == B:CR(nm,e) propagate last

  13. Correspondences • A correspondence relation specifies how the symbols used in one database translate to symbols used in another database. • A correspondence relation is defined at three levels: • Constant to constant, e.g., ‘one’ --> ‘uno’; or CAN$1.00 --> US$0.65 • Relation to relation, e.g., Cust --> Customer • Relational attribute to relational attribute, e.g., name(Cust) --> nm(Customer)

  14. Correspondence Rules • Correspondences between databases i and j are defined in terms of two possibly multi-valued and/or partial functions rij, rji. • Note that rij, rji need not be symmetric, i.e., rij(rji(d)) = x (...the “change bureau” phenomenon...) • For example, consider DBicontaining length measurements in meters and DBj in kilometers. One can have • rij(x) = roundToClosestK(x), e.g., rij(653)=1, rij(453)=0 • rji(x) = x*1000,e.g., rji(1)=1000

  15. Local vs. Global Queries • Local queries are evaluated by the DBMS managing the local database. • A local-global query expressed in DBi involves only terms used in Li, but will be translated using correspondence rules so that it can be evaluated with respect to DBi, and all the databases in the transitive closure of the acquaintance relationship. • A global-local query involves a general wff which mentions several dtabases. Evaluation of such a query proceeds by evaluating each local query of the form i:f(x1,...,xn) with respect to DBi. • Finally, global-global queries involve a general expressions where each local query is to be evaluated as a local-global query.

  16. Query evaluation • Query evaluation is done in different modes: • Immediately; • ASAP -- as soon as all relevant databases are connected (may be a long wait...) • At time T -- evaluation is done at a particular time; • Subscriptively -- query is evaluated periodically • Global queries are obviously harder to evaluation in the absence of warrantees for connectivity among the multidatabases.

  17. A Data Model for Multidatabases • A Multidatabase system consists of one or more databases and a set of coordination rules. • Available operations include: • Add or delete a database (as an acquaintance); • Update or query a database; • Add, delete or update a coordination rule. • Each database shares coordination and correspondence rules only with databases it is acquainted with.

  18. Formal Semantics for MDM • A model in Local Model Semantics (hereafter LMS) is a pair MDB = <{DBi},{rij}ij>, where • DBiis a relational database (a la [Reiter84]) over schema Li; the domain of values of the database, Domi, is assumed to be finite. • rij is a binary relation over DomiDomjwhich defines correspondences of values in the domains of databases DBi and DBj.

  19. Local Satisfiability • MDB |= i:f iff DBi |= f • A Local Query on a database i is an open formula f(x1,...,xn) on the language of Li. • Result of a Local Queryi:f(x1,...,xn) is the set of tuples (d1,...,dn)  Domi x ... x Domisuch that DBi |= f (d1,...,dn)

  20. Global Formulas • You can build them up by using local formulas of the form i:f(x) and the inter-database connectives and, or, not, --> (implication), foralli, existsi(quantifiers on the domain Domi) • Note that these are different from the local database connectives. For instance: • A --> B is not logically equivalent to not(A) or B; • Quantification is always done with respect to the domain of one database and we write foralli x A(x)

  21. Satisfiability for and and forall • MDB |= A and B iff MDB |= AandMDB |= B • MDB |= forallix.A(x) iff for all d  Domi MDB |= A[x/d i] • where A[x/d i] is obtained by substituting each free occurrence of x in A in the context j with rji(d) • Note that if A(x) contains expressions local to a database other than DBi, then these expressions have to be satisfiable wrt all local values that correspond to values of Domj i.e., MDB |= foralli x j:f(x) iff for all bDomj such that b  rij(a) for someaDomi,DBj |= f(b)

  22. A Shorthand Notation • In the coordination rules shown earlier, quantifications are not local to a database. As before, we will interpret this to mean that the quantification is true in every database mentioned within its scope, i.e., M:C(n,x) and P:C’(n,y) --> x=y) means forall M:n,x,y [M:C(n,x) and P:C’(n,y) --> x=y) and forall P:n,x,y [M:C(n,x) and P:C’(n,y) --> x=y) • Of course, if the domains involved are isomorphic, this expansion is not necessary.

  23. Proof Theory • To be added by Luciano

  24. Soundness and Completeness • State the S&C theorem, also the theorem that generalizes Reiter’s result.

  25. Related Work • There has been much related research on replicated databases, I.e., distributed databases which include some replication of data on different nodes of the distributed system. • A distributed, replicated database is coherent if the replicated data are consistent at all times. • There are many proposals for distributed, replicated database control with relaxed coherency. • Relaxed coherency means that replicated data are allowed to diverge temporarily (bounded relaxed coherency), or possibly forever (unbounded relaxed coherency).

  26. Relaxed Coherency Schemes • Update (preferrably all) copies • ROWA -- read one, write all; • ROWA available -- read one, write available nodes • Update selected copies • Primary site -- stores master copy; • Quorum protocols -- pick a subset of nodes to be updated, read from several nodes; • Bounded coherence -- update eventually; • Epidemic algorithms -- propagate updates through a spreading activation algorithm. [Ceri91], [Beuter96], [Nicola99]

  27. Discussion • Much of this work is relevant to the implementation of the proposed Multidatabase model. • The key difference of our proposal is that it is based on a local notion of inconsistency, assumes no global schema, no global coordination, and treats coordination rules as soft constraints of variable force. • Moreover, data replication is a special case of situations where coordination is useful.

  28. Research Problems • A formal semantics to the Multidatabase Model, as sketched out in the previous slides. • Efficient global query processing techniques (exploiting parallelism.) • A formal transaction model for coordination rules, supporting ‘soft’ enforcement mechanisms. • Efficient implementation techniques for enforcing coordination rules and using a range of protocols. • Extend all of the above to multidatabases which involve data soutces other than relational databases (such as OODBs, websites,...) • ...more...

  29. References • [Beuter96] Beuter, T., Dadam, P., “Principles of Replication Control in Distributed Database Systems”, Informatik Forschung und Technik 11(4), 203-212, 1996, (in German). • [Ceri91] Ceri, S., Houtsma, M., Keller, A., Samarati, P., “A Classification of Update Methods for Replicated Databases”, Stanford University, technical report STAN-CS-91-1392, October 1991. • [Nicola99] Nicolas, M., Performance Evaluation of Distributed Replicated, and Wireless Information Systems, Dissertation RWTH Aachen, report no. 99-10, Fachgruppe Informatik, RWTH Aachen, 1999. • [Reiter84] Reiter, R., “A Logical Reconstruction of the Relational Model”, in brodie, M., Mylopoulos, J., Schmidt, J., (eds.) On Conceptual Modelling: Perspectives from Artificial Intelligence, Databases and Programming Languages, Springer-Verlag, 1984. • [Ozsu99] Ozsu, T., Valduriez, P., Principles of Distributed Database Systems, Prentice Hall, 1999, 2nd Edition.

More Related