270 likes | 390 Views
everything you always wanted to know about Updates in MonetDB/XQuery but were afraid to ask. Peter Boncz (CWI) Sjoerd Mullender update actions Jens Teubner XQUF parsing Niels Nes logging Stefan Manegold the rest. XQuery Update Facility (XQUF) semantics & the update tape
E N D
everything you always wanted to know about Updates in MonetDB/XQuery but were afraid to ask Peter Boncz (CWI) Sjoerd Mullender update actions Jens Teubner XQUF parsing Niels Nes logging Stefan Manegold the rest
XQuery Update Facility (XQUF) • semantics & the update tape • Updatable XML storage in BATs • maintaining order in an array without O(N) cost • Snapshot Isolation • why we want it, how we got it • Concurrency Control • optimistic, with “abort convoys” • Durability • physical logging • Conclusion & Future Challenges Overview
XQuery Update Facility (XUF) • January 2006, first proposal • Internal primitives: • upd:insertBeforeupd:insertAfterupd:insertIntoupd:insertIntoAsLastupd:insertAttributesupd:deleteupd:replaceValueupd:rename • Pending update list concept • upd:applyUpdates
Example insert <item id="{id}"> <location>Brazil</location> <quantity>200</quantity> <name>XML in a nutshell</name> <payment>Credit Card, Personal check</payment> <shipping>Will ship internationally</shipping> <incategory category="category1"/> </item> as last into fn:doc("xmark.xml")/site/regions/samerica
Semantics let $root = doc(“foo.xml”) for $i in (1,2,3) return do insert <x>$i</x> as first into $root), do insert <y>$i</y> as first into $root))
Semantics let $root = doc(“foo.xml”) for $i in (1,2,3) return (do insert <x>$i</x> as first into $root), do insert <y>$i</y> as first into $root)) • • We need to • define an execution order, and • enforce it
The Update Tape update = sequence ( int, node, node/str, node/str) fn:delete() (DELETE, node, nil, nil) fn:insert_*() (INSERT, tgt-node, tgt-level, expr-node) fn:set-attr() (ATTR, node, qn, val) fn:unset-attr() (ATTR, node, qn, nil) fn:set-text() (TEXT, node, val, nil) fn:set-pi() (PI, node, ins-val, arg-val) fn:set-comment() (COMMENT, node, val, nil) ( element construction ), that combines updates, will enforce the correct order of the update tape. Pathfinder compiler automatically inserts call to fn:update(item*) on the result of all update queries
ancestor following preceding descendant XPath Accellerator [SIGMOD02] <a> <b> <c> <d/> <e/> </c> </b> <f> <g/> <h> <i/> <j/> </h> </f> </a> Node-based relational encoding of XQuery's data model
XML Storage Revisited post = pre + size - level
ancestor following preceding descendant Updates: Mission Impossible? SIZE + |I| <a> <b> <c> <d/> <e/> </c> </b> <f> <g/> <h> <i/> <j/> </h> </f> </a> PRE+ |I| INSERT SUBTREE size(following) = O(N) killer (?)
XML Storage Revisited post = pre + size - level Allow holes Define logical pages
rid = pre.swizzle( ) XML Storage Revisited post = pre + size - level Allow holes Define logical pages
XML Storage Revisited Update-friendly • rid-table is append-only • rid-tuples may be unused • rid = autoincrement column MonetDB: • rid not stored but computed (virtual oid) • allows positional lookup/join Not stored no need to update it either
XML Storage Revisited Update-friendly • rid-table is append-only • rid-tuples may be unused • rid = autoincrement column Updatable document collection: • pf:add-doc(URI, docname, perc>0) • pf:add-doc(URI, docname, collname, perc>0) • pre := nid.leftfetchjoin(nid_rid).swizzle(map_pid) Read-only document collection: • pf:add-doc(URI, docname, 0) • pf:add-doc(URI, docname, collname, 0) • NID = RID = PRE • pre := nid.leftfetchjoin(nid_rid).swizzle(map_pid) = FREE!!
Snapshot Isolation • Versus 2-phase locking (2PL) == full serializability • Why not 2PL XML: • lock semantics much more complex than in relational case (order matters!!) • node-level locking in staircase join?? (now 10 cycles/node…)
Snapshot Isolation • Versus 2-phase locking (2PL) == full serializability • Why not 2PL XML: • lock semantics much more complex than in relational case (order matters!!) • node-level locking in staircase join?? (now 10 cycles/node…) • Why Snapshot Isolation: • great for read-queries, great for ll_scj (runs unmodified) • quite strong. Better than repeatable read. Oracle/Postgres do it. • Problem with Snapshot Isolation: • in XQuery, it is unknown at compile-time what to snapshot (fn:doc(..))
Snapshot Isolation • Read Query1 Read Query 2 Update Query • Isolation By Shadow Paging (copy-on-write mmap) • rid/pre delete/insert + attr-replace • Touch one byte per physical page: *addr = *addr; • MMU traps, OS replaces page by a copy • we would like to replace the master copy once, not all client copies
Snapshot Isolation • Read Query1 Read Query 2 Update Query Isolate-page • Isolation By Shadow Paging (copy-on-write mmap) • rid/pre delete/insert + attr-replace • Touch one byte per physical page: *addr = *addr; • MMU traps, OS replaces page by a copy • we would like to replace the master copy once, not all client copies
Snapshot Isolation • Read Query1 Read Query 2 Update Query Isolate-page • Isolation By Shadow Paging (copy-on-write mmap) • rid/pre delete/insert + attr-replace • Touch one byte per physical page: *addr = *addr; • MMU traps, OS replaces page by a copy
Snapshot Isolation • Read Query1 Read Query 2 Update Query Master-update • Isolation By Shadow Paging (copy-on-write mmap) • rid/pre delete/insert + attr-replace • Touch one byte per physical page: *addr = *addr; • MMU traps, OS replaces page by a copy • we would like to replace the master copy once, not all client copies
Durability • Masters become dirty • no time to flush them during query • log all changes to a WAL • = log all tuples that changed = entire pages • Recovery: • after a crash, we do not know whether dirty pages got saved • solution: overwrite tables with values from the WAL • Checkpointing Thread: • every 5 minutes, if ‘many’ changes occurred, checkpoint • memory mapped bats are sync()-ed ony dirty pages get written • checkpoint locks collection, halts query processing
Durability • Masters become dirty • no time to flush them during query • log all changes to a WAL • = log all tuples that changed = entire pages • Recovery: • after a crash, we do not know whether dirty pages got saved • solution: overwrite tables with values from the WAL • Checkpointing Thread: • every 5 minutes, if ‘many’ changes occurred, checkpoint • memory mapped bats are sync()-ed ony dirty pages get written • checkpoint locks collection, halts query processing
The Update Sequence • Execute Query • build update tape • queries get isolated copies of a document (VM copy-on-write mmap) • Prepare Intensional Updates • execute update tape. • does not modify masters (except append-only tables) • Commit Phase (locked phase – per doc-collection) • precommit • detect conflicts (not the size-ancestors) • write WAL (globally locked) • read master-size-ancestors, use delta, log result • update master tables • isolate first! Only then update masters. • update index structures
Many more Issues Solved • Indexing and Updates • Runtime QN NID mapping, with hash table • read-only: not a hash, but keep sorted & persistent • keep INS + DEL deltas to commit without changing the hash table • Runtime NID ATTR hash table • isolation loses you MonetDB dynamic hash table reuse • share an old copy, exploit append-mostly Concurrency Updates Checkpoint Shredding Query Shredding Updates • Conflicting Updates • detect conflicting queries: • look at RID page numbers and attr-IDs • reacting to conflicts: • abort query + automatic restart • run CONVOY of 5 next update queries serially • ACID properties on the Meta Level • Shredding a new doc into a collection Query • Shredding a new doc into a collection Update • Using a collection Deleting/adding documents • Meta Querying Deleting/adding documents • Allocating New Pages and NIDS • Offload shredding interference with freelist • Unlocked access to private pages
Snapshot Isolation • Versus 2-phase locking (2PL) == full serializability • Why not 2PL XML: • lock semantics much more complex than in relational case (order matters!!) • node-level locking in staircase join?? (now 10 cycles/node…) • Why Snapshot Isolation: • great for read-queries, great for ll_scj (runs unmodified) • quite strong. Better than repeatable read. Oracle/Postgres do it. • Problem with Snapshot Isolation: • in XQuery, it is unknown at compile-time what to snapshot (fn:doc(..)) 2PL (++) 375 transactions/5 minutes = 1.2 transaction/sec
Conclusions • It works! Reasonable/good performance! • transaction mgmt as a module extension outside a kernel works • identified VM primitives that databases really need • Future work: • Test on XML update benchmark TPOX (DB2: 700 trans/second) • Packed Memory Arrays: alternative for page remapping? • page remapping is technically O(N) • Engineering: • support for value-indexing (does PF support it already) • asynchronous WAL writing to boost throughput • port MIL to C primitives; port C primitives to Monet5