Peter Boncz (CWI) Sjoerd Mullender update actions Jens Teubner XQUF parsing Niels Nes logging

everything you always wanted to know about Updates in MonetDB/XQuery but were afraid to ask Peter Boncz (CWI) Sjoerd Mullender update actions Jens Teubner XQUF parsing Niels Nes logging Stefan Manegold the rest

XQuery Update Facility (XQUF) • semantics & the update tape • Updatable XML storage in BATs • maintaining order in an array without O(N) cost • Snapshot Isolation • why we want it, how we got it • Concurrency Control • optimistic, with “abort convoys” • Durability • physical logging • Conclusion & Future Challenges Overview

XQuery Update Facility (XUF) • January 2006, first proposal • Internal primitives: • upd:insertBeforeupd:insertAfterupd:insertIntoupd:insertIntoAsLastupd:insertAttributesupd:deleteupd:replaceValueupd:rename • Pending update list concept • upd:applyUpdates

Example insert <item id="{id}"> <location>Brazil</location> <quantity>200</quantity> <name>XML in a nutshell</name> <payment>Credit Card, Personal check</payment> <shipping>Will ship internationally</shipping> <incategory category="category1"/> </item> as last into fn:doc("xmark.xml")/site/regions/samerica

Semantics let $root = doc(“foo.xml”) for $i in (1,2,3) return do insert <x>$i</x> as first into $root), do insert <y>$i</y> as first into $root))

Semantics let $root = doc(“foo.xml”) for $i in (1,2,3) return (do insert <x>$i</x> as first into $root), do insert <y>$i</y> as first into $root)) •  • We need to • define an execution order, and • enforce it

The Update Tape update = sequence ( int, node, node/str, node/str) fn:delete()  (DELETE, node, nil, nil) fn:insert_*()  (INSERT, tgt-node, tgt-level, expr-node) fn:set-attr()  (ATTR, node, qn, val) fn:unset-attr()  (ATTR, node, qn, nil) fn:set-text()  (TEXT, node, val, nil) fn:set-pi()  (PI, node, ins-val, arg-val) fn:set-comment()  (COMMENT, node, val, nil) ( element construction ), that combines updates, will enforce the correct order of the update tape. Pathfinder compiler automatically inserts call to fn:update(item*) on the result of all update queries

ancestor following preceding descendant XPath Accellerator [SIGMOD02] <a> <c> <d/> <e/> </c> <f> <g/> <h> <j/> </h> </f> </a> Node-based relational encoding of XQuery's data model

XML Storage Revisited post = pre + size - level

ancestor following preceding descendant Updates: Mission Impossible? SIZE + |I| <a> <c> <d/> <e/> </c> <f> <g/> <h> <j/> </h> </f> </a> PRE+ |I| INSERT SUBTREE size(following) = O(N)  killer (?)

XML Storage Revisited post = pre + size - level Allow holes Define logical pages

rid = pre.swizzle( ) XML Storage Revisited post = pre + size - level Allow holes Define logical pages

XML Storage Revisited Update-friendly • rid-table is append-only • rid-tuples may be unused • rid = autoincrement column MonetDB: • rid not stored but computed (virtual oid) • allows positional lookup/join Not stored  no need to update it either

XML Storage Revisited Update-friendly • rid-table is append-only • rid-tuples may be unused • rid = autoincrement column Updatable document collection: • pf:add-doc(URI, docname, perc>0) • pf:add-doc(URI, docname, collname, perc>0) • pre := nid.leftfetchjoin(nid_rid).swizzle(map_pid) Read-only document collection: • pf:add-doc(URI, docname, 0) • pf:add-doc(URI, docname, collname, 0) • NID = RID = PRE • pre := nid.leftfetchjoin(nid_rid).swizzle(map_pid) = FREE!!

Snapshot Isolation • Versus 2-phase locking (2PL) == full serializability • Why not 2PL XML: • lock semantics much more complex than in relational case (order matters!!) • node-level locking in staircase join?? (now 10 cycles/node…)

Snapshot Isolation

Snapshot Isolation • Versus 2-phase locking (2PL) == full serializability • Why not 2PL XML: • lock semantics much more complex than in relational case (order matters!!) • node-level locking in staircase join?? (now 10 cycles/node…) • Why Snapshot Isolation: • great for read-queries, great for ll_scj (runs unmodified) • quite strong. Better than repeatable read. Oracle/Postgres do it. • Problem with Snapshot Isolation: • in XQuery, it is unknown at compile-time what to snapshot (fn:doc(..))

Snapshot Isolation • Read Query1 Read Query 2 Update Query • Isolation By Shadow Paging (copy-on-write mmap) • rid/pre delete/insert + attr-replace • Touch one byte per physical page: *addr = *addr; • MMU traps, OS replaces page by a copy • we would like to replace the master copy once, not all client copies

Snapshot Isolation • Read Query1 Read Query 2 Update Query Isolate-page • Isolation By Shadow Paging (copy-on-write mmap) • rid/pre delete/insert + attr-replace • Touch one byte per physical page: *addr = *addr; • MMU traps, OS replaces page by a copy • we would like to replace the master copy once, not all client copies

Snapshot Isolation • Read Query1 Read Query 2 Update Query Isolate-page • Isolation By Shadow Paging (copy-on-write mmap) • rid/pre delete/insert + attr-replace • Touch one byte per physical page: *addr = *addr; • MMU traps, OS replaces page by a copy

Snapshot Isolation • Read Query1 Read Query 2 Update Query Master-update • Isolation By Shadow Paging (copy-on-write mmap) • rid/pre delete/insert + attr-replace • Touch one byte per physical page: *addr = *addr; • MMU traps, OS replaces page by a copy • we would like to replace the master copy once, not all client copies

Durability • Masters become dirty • no time to flush them during query • log all changes to a WAL • = log all tuples that changed = entire pages • Recovery: • after a crash, we do not know whether dirty pages got saved • solution: overwrite tables with values from the WAL • Checkpointing Thread: • every 5 minutes, if ‘many’ changes occurred, checkpoint • memory mapped bats are sync()-ed  ony dirty pages get written • checkpoint locks collection, halts query processing

The Update Sequence • Execute Query • build update tape • queries get isolated copies of a document (VM copy-on-write mmap) • Prepare Intensional Updates • execute update tape. • does not modify masters (except append-only tables) • Commit Phase (locked phase – per doc-collection) • precommit • detect conflicts (not the size-ancestors) • write WAL (globally locked) • read master-size-ancestors, use delta, log result • update master tables • isolate first! Only then update masters. • update index structures

Many more Issues Solved • Indexing and Updates • Runtime QN  NID mapping, with hash table • read-only: not a hash, but keep sorted & persistent • keep INS + DEL deltas to commit without changing the hash table • Runtime NID  ATTR hash table • isolation loses you MonetDB dynamic hash table reuse • share an old copy, exploit append-mostly Concurrency Updates  Checkpoint Shredding  Query Shredding  Updates • Conflicting Updates • detect conflicting queries: • look at RID page numbers and attr-IDs • reacting to conflicts: • abort query + automatic restart • run CONVOY of 5 next update queries serially • ACID properties on the Meta Level • Shredding a new doc into a collection  Query • Shredding a new doc into a collection  Update • Using a collection  Deleting/adding documents • Meta Querying  Deleting/adding documents • Allocating New Pages and NIDS • Offload shredding interference with freelist • Unlocked access to private pages

Snapshot Isolation • Versus 2-phase locking (2PL) == full serializability • Why not 2PL XML: • lock semantics much more complex than in relational case (order matters!!) • node-level locking in staircase join?? (now 10 cycles/node…) • Why Snapshot Isolation: • great for read-queries, great for ll_scj (runs unmodified) • quite strong. Better than repeatable read. Oracle/Postgres do it. • Problem with Snapshot Isolation: • in XQuery, it is unknown at compile-time what to snapshot (fn:doc(..)) 2PL (++) 375 transactions/5 minutes = 1.2 transaction/sec

Conclusions • It works! Reasonable/good performance! • transaction mgmt as a module extension outside a kernel works • identified VM primitives that databases really need • Future work: • Test on XML update benchmark TPOX (DB2: 700 trans/second) • Packed Memory Arrays: alternative for page remapping? • page remapping is technically O(N) • Engineering: • support for value-indexing (does PF support it already) • asynchronous WAL writing to boost throughput • port MIL to C primitives; port C primitives to Monet5

Peter Boncz (CWI) Sjoerd Mullender update actions Jens Teubner XQUF parsing Niels Nes logging

Peter Boncz (CWI) Sjoerd Mullender update actions Jens Teubner XQUF parsing Niels Nes logging

Presentation Transcript

Saffman-Taylor streamer discharges

Internet search engines: Fluctuations in document accessibility

CWI: CBA

CWI DMA

CWI: CBA

Leakage-Resilient Signatures

2012/ 04/ 30

Sustainable Management of water surfaces, submitted to Kyoto Grand Prize

CWI DMA

SP2.3: UI and VR Based Visualization

S= Semester Y= Year

SUBJECT SPECIFIC FACE RECOGNITION

CWI DMA

CWI DMA

CWI DMA

Tom Chothia CWI

Task Force Youth Unemployment

Challenges CWI Public Employment Services Drs. Theo Mensen

Stability of size-based scheduling in resource-sharing networks

AWS CWI Schedule Chennai 2016

Working meeting of WP4 Task WP4.1

Challenges CWI Public Employment Services Drs. Theo Mensen