260 likes | 401 Views
Towards a new approach to Document Collaboration. A Concurrency Control mechanism for XML databases. Jan Hidders U. Antwerp, Belgium. Stijn Dekeyser USQ, Australia. Introduction: XML Running Example Classic CC methods Data Models User Data Model Scheduler Data Model. Path Lock Schemes
E N D
Towards a new approach to Document Collaboration A Concurrency Control mechanism for XML databases Jan Hidders U. Antwerp, Belgium Stijn Dekeyser USQ, Australia
Introduction: XML Running Example Classic CC methods Data Models User Data Model Scheduler Data Model Path Lock Schemes Propagation (PL-PROP) Satisfiability (PL-SAT) Schedulers Commit scheduler Conflict scheduler Contents • Asynchronous collaborative work • cvs, change tracking, cscw, ems • Synchronous collaborative work • A new approach: use cases and clients • Discussion Part I Part II
Asynchronous collaborative work Part I • CVS • Version repository + concurrency • Text, Line-based • Human intervention • Change Tracking • MS Office, OOo et al • Human mgt + intervention
Asynchronous collaborative work • CSCW & EMS • Many different systems • Docs distributed wholly at all sites • Messages update docs • Human intervention needed
Synchronous collaborative work • CSCW & EMS • XML-enabled RDBMS • Traditional table locking • Native XML databases • Document-based locking
A new approach: use cases • Document Authoring • May move section while updating • CMS • XForms causes overwrite • Design & art • May update same parts! • Programming • Change function name globally
A new approach: servers • Implementation: • Native or XML-enabled db • Use of Path Locks in transactions • Document collaboration protocol (dcp) • Type: • Enhanced web server, e.g. in Apache • or P2P implementation, e.g. in Word • Extra features: • Access Control (Elena Ferrari et al.) • Elements contain user rights • version management (Epic) • Elements contain version information
A new approach: clients • General purpose XML editors • E.g. Epic or XMLSpy • Specific purpose XML editors • E.g. Autocad or Excel • Issues: • Intelligently query section to be updated • Commit when possible • Refresh content
Introduction: XML Part II • XML is evolution of document language technology • 1969: GML (Generalized MarkUp L) • 1974: SGML (Standard …), Goldfarb • 1986: ISO standard • 1989: HTML, Berners-Lee, Berglund, Cailliau at CERN • XML much simpler than SGML (10% of spec) • Now: much more data stored as XML • Enter the XML-DBMS age…
Running example (1/2) <document id="0"> <person id="1", age="55"> <name> Peter </name><addr> Parklane 7 </addr> <child> <person id="3", age="22"> <name> John </name> <addr> Unistreet 1 </addr> <hobby> swimming </hobby> <hobby> cycling </hobby> </person> </child> <child> <person id="4", age="7"> <name> David </name> <addr> Parklane 7 </addr> </person> </child> </person> <person id="2", age="43"> <name> Mary </name> <addr> Parklane 7 </addr> <hobby> painting </hobby> </person> </document>
Queries: • /document/person//hobby • //child//hobby Running example (2/2)
Classic CC methods (1/3) Table locking • How:On update, whole table is locked • Precludes phantoms • XML: parent-child relation in 1(*) table • Example: • Query: //child//hobby • Update that should be allowed: change hobby element not occuring under child • Not possible when entire table is locked
Predicate locking • How • Locks in form of predicate: name=“person” • Predicate indicates what has been read • Example: • Query: /document/person//hobby • Update(ok): create person under root element • Update(~ok): create hobby under this person • Both are not possible since 1st predicate locks all persons under the root Classic CC methods (2/3)
Classic CC methods (3/3) Hierarchical locking • How • Lock granule intention lock on ancestors • Change granule exclusive lock on X Tree locking • How • Lock node lock parent of node • Add node under X exclusive lock on X And ... query //A//B requires shared-locks on entire tree
User Data Model (1/2) Data Model • (XPath-tree) Tree with labelled nodes • NB: we ignore ordering of children Path Expressions • Sequence of tag names and wild-cards (*) • Separated by / (child) and // (descendants). • person/child • *//person/child
Node-correctness:Thou shalt only use nodes which you have obtained via an addition or via a query. User Data Model (2/2) Query • Q(n,p): yields set of nodes which are reachable from n via path expression p. • Q(n,*//hobby) Addition • A(n,a): add node with name a under n • A(n,hobby) • Fails if n is not there, yields new node. Deletion • D(n): delete n • Fails if n has children. Commit • C(): end of transaction
Scheduler’s Data Model (1/2) • Instance Graph • Acyclic graph with labelled nodes • Nodes labelled with a delete set: • Identifiers of transactions which deleted the node. • Actual Instance • Subgraph of instance graph formed by the nodes with an empty delete set • Is always an XPath-tree
Scheduler’s Data Model (2/2) Query • Q(n,p): yields a set of nodes which are -- in the actual instance – reachable from n via path expression p. Addition • A(n,a): adds node with name a under n • Empty delete set • Fails if n is not in the actual instance. Deletion • D(n): add transaction to the delete set of n • Fails if n has children in the actual instance. Commit • C(): delete nodes with transaction in delete set
PL propagation scheme (1/3) Read Locks • rl(n, p) • e.g. rl(n12, //a//b) Required read locks: • For Q(n,p) request rl(n,p) and do .... • Read lock propagation: • rl(n, a/p) -> rl(n', p) if n' is an a-child of n • rl(n, */p) -> rl(n', p) if n' is a child of n • rl(n, a//p) -> rl(n', *//p) and rl(n', p) if n' is an a-child of n • rl(n, *//p) -> rl(n', *//p) and rl(n', p) if n' is a child of n • Recomputing propagation on update is very easy (!)
PL propagation scheme (2/3) *//child//hobby Example: Doc. root *//child//hobby child//hobby document *//child//hobby child//hobby *//child//hobby person person child//hobby *//child//hobby *//hobby child//hobby child child age name addr hobby age name addr *//child//hobby child//hobby *//hobby hobby *//child//hobby person person age name addr hobby hobby age name addr
Number of locks: • updates: O(1) • queries: O(|p|.|G|) PL propagation scheme (3/3) Write Locks • wl(n, a) or wl(n, *) Required write locks: • For A(n,a) request: • wl(n,a) • For D(n) request: • wl(n, *) if n exists • wl(n', a) if n is an a-child of n' Conflict rules: • wl(n, *) and wl(n, a) conflicts with rl(n, *) and rl(n, a) • All others do not. (... write-write conflicts?)
Number of locks: • updates: O(1) • queries: O(1) Path-lock satisfiability scheme Read locks • rl(n, p) (see PL-Prop) • For Q(n,p) only rl(n,p) is necessary Write locks • wl(n,a) and wl(n,*) • For A(n,a) and D(n) necessary as w. PL-Prop. Conflict rules • wl(n,a) and wl(n,*) conflict with rl(n',p) if • there is a path from n' to n with label-list L and • L/a (or L/*) satisfies the path expression p
Commit scheduler • Transactions make requests for operations: • Query, Addition, Deletion, Commit, Roll-back • Scheduler accepts request only if • the operation does not fail, and • the required locks do not conflict with existing locks • On Commit the locks disappear of • the committing transaction, and • the nodes deleted by the transaction • On Roll-back the locks disappear of • the transaction being rolled-back
Conflict Scheduler • Scheduler has a Dependency Graph (DG): • arrow t1 --> t2 if a lock of an operation of transaction t1 conflicts with a lock of a preceding operation of t2 • Scheduler accepts request only if • the operation does not fail, and • no cycles appear in the DG • A commit of t1 is not accepted if in the DG arrows depart from t1. • A roll-back of t1 leads to a roll-back of t2 if t2 --> t1 in the DG.
Conclusion and Further Research • Commit and conflict schedulers guarantee serialisability • Complexity is decided by the size of the document / instance • Is order of children a problem? • simulation • write-write conflicts • What with the relocation of subtrees? • Identity of nodes to be taken into account? • No use / knowledge of (entire) instance? • Use of DataGuide • Instance Independent CC for SSD