1 / 21

Efficient Range Queries in Non-blocking k-ary Search Trees

This paper presents a solution to efficiently store keys in a dynamic data structure supporting insertion, deletion, and range queries in non-blocking k-ary search trees. It introduces a range query algorithm that can determine if a key was added or removed during traversal. The algorithm extends the data structure by adding a dirty bit to each leaf, allowing for faster checking during traversal. Experimental results compare the performance of different k-values and other data structures.

matthewrosa
Download Presentation

Efficient Range Queries in Non-blocking k-ary Search Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Range Queries in Non-blockingk-ary Search Trees Trevor Brown Hillel Avni

  2. Problem Statement Want to store keys in a dynamic data structure supporting insertion, deletion, and: RangeQuery(a, b): returns all keys of the data structure in range [a, b]

  3. Previous Solutions Software transactional memory Locks Persistence root pointer e e Insert(c) b b f a d d c

  4. k-ary Search Tree (k-ST) Add or remove keys by replacing node(s) Related to persistent data structures [Brown, Helga]

  5. The Range Query Algorithm RangeQuery(a, b): Traverse the tree, skipping sub-trees which cannot contain a key in [a, b] During this traversal, save a pointer to each leaf that contains a key in [a, b] […] Problem: how to efficiently tell if a key was added or removed during this traversal?

  6. Extending the Data Structure Add a dirty bit to each leaf Each leaf has its dirty bit set just before it is replaced Consequence: If a leaf's dirty bit is not set, then it has not been replaced

  7. The Range Query Algorithm RangeQuery(a, b): Traverse the tree, skipping sub-trees which cannot contain a key in [a, b] During this traversal, save a pointer to each leaf that contains a key in [a, b] After this traversal, check the dirty bits of these leaves, one by one If no dirty bit is set, then return “the result” Otherwise, retry Reading dirty bit is farfaster than re-traversing

  8. Example: RangeQuery(3, 14) RangeQuery sees 4 is dirty… Retry! 8, 13, 25 2, 3, 5 8, 9, 12 14, 19, 21 29, 35 1 4 5, 7 13 15, 16, 18 23, 24 3, 4 Insert(3) Saved pointers

  9. Retrying RangeQuery(3, 14) Success! Return the result… 8, 13, 25 2, 3, 5 8, 9, 12 14, 19, 21 29, 35 1 4 5, 7 13 15, 16, 18 23, 24 3, 4 Saved pointers

  10. Ctrie Taking a “snapshot:” atomically replace root Old tree no longer changes Future searches and updates copy nodes from the old tree [Prokopec, Bronson, Bagwell, Odersky] root pointer | | | | | | 00 01 01 10 10 11 11 … … … …

  11. When our Algorithm is Good When workloads contain range queries over small ranges (i.e., where snapshots are bad) Example: database applications such asairline database of flights When it might not be Very large ranges increase the chance that a range query will have to retry Our experiments explore how much this matters In extreme cases Ctrieor Snap might be better

  12. Experiment: compare performance of k-ST: k=16, 32, 64 Snap Ctrie Java’s Concurrent Skip List (SL) NOT LINEARIZABLE!

  13. Experiment Throughput vs. number of concurrent threads Each thread repeatedly chooses a random operation (Search, Insert, Delete, RangeQuery) with arguments chosen uniformly randomly in [0, 10^6) Each experiment ran with a fixed amount of memory, for a fixed, sufficiently long amount of time

  14. Hardware Intel 4-chip, 40-core, 80-thread Sun 2-chip, 16-core, 128-thread

  15. Many queries with small ranges 50% search, 5% insert, 5% delete, 40% range query size 100 Throughput (millions) 64-ST 32-ST 16-ST SL Ctrie Snap Number of threads

  16. Many queries with bigger ranges 50% search, 5% insert, 5% delete, 40% range query size 10,000 Throughput (hundred thousands) 64-ST 32-ST 16-ST SL Ctrie Snap Number of threads

  17. Few queries (with small ranges) 59% search, 20% insert, 20% delete, 1% range query size 100 64-ST Throughput (ten millions) 32-ST 16-ST SL Ctrie Snap Number of threads

  18. Throughput versus P(range query) Throughput (ten millions) 1:10000 operations is RQ KST64 KST32 KST16 SL Ctrie Snap Probability of range query

  19. Throughput versus arity 20i-20d-1r-size100 Throughput (ten millions) 5i-5d-40r-size100 20i-20d-1r-size10000 5i-5d-40r-size10000 Degree of tree

  20. Conclusion Provably correct algorithm Searches can ignore concurrent updates Although dirty bits invalidate range queries,they do not invalidate searches Range queries are invisible No CAS, don’t change data structure Avoids excessive duplication of nodes Appears to be practical when workloads contain queries over small ranges

  21. Future work Adding balance (a, b)-tree, chromatic tree, relaxed AVL tree Wait-freedom?

More Related