1 / 16

Speeding up STL Set/Map Usage in C++ Applications SIPEW 2008

Dibyendu Das, Madhavi Valluri, Michael Wong, Chris Cambly dibyendu.das@in.ibm.com,mvalluri@us.ibm.com, michaelw@ca.ibm.com, ccambly@ca.ibm.com. Software/Systems Tech Group. Rational. Speeding up STL Set/Map Usage in C++ Applications SIPEW 2008 . An idea and implementation.

kana
Download Presentation

Speeding up STL Set/Map Usage in C++ Applications SIPEW 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dibyendu Das, Madhavi Valluri, Michael Wong, Chris Cambly dibyendu.das@in.ibm.com,mvalluri@us.ibm.com, michaelw@ca.ibm.com, ccambly@ca.ibm.com Software/Systems Tech Group Rational Speeding up STL Set/Map Usage in C++ ApplicationsSIPEW 2008 SPEC CPU 2006

  2. An idea and implementation • A way to speed up SPEC CPU 2006 dealII • that can work for all compiler vendors • Without violating C++ Std library rules • small increase in memory usage does not change cache • IBM’s P5+/P6 shows ~ 20% improvement • Delivered on IBM’s xlC C++ compiler V10.1 IBM

  3. C++ Standard Template Library and Generic Programming • Better Data structure can provide the best speed gain • Generic programming is about lifting common algorithms, and data structures • C++ Standard Template Library unifies algorithms, with data structures, glued by iterators • Effectively match any algorithms with any data structures through the abstractions of iterators • Universally supplied by all C++ compiler vendors • Vector, dequeue, list, set, map • With limits on performance and memory usage • Written by the best C++ programmers to be reusable and composable IBM

  4. The right tool for the right time • What data structures are used in each SPEC CPU 2006 C++ benchmark IBM

  5. Which data structure to choose • Depends on how to balance the cost of lookups, erasures, insertions, copies, traversals(++/--) • Found that dealII slows down due to long traversal time (++/-- is costly ) for set<> • in the traditional binary tree search implementation of set<>/map<> • Optimized for a mixed combination of insertions, erasure, then some lookups,traversals, then maybe more insertions, etc. IBM

  6. What are we allowed to do? • Can’t change the data structure in SPEC CPU benchmarks • However, we are allowed to alter the underlying vendor implementation of libraries if we can sense how data is used • Sometimes they are indeed chaotic • Sometimes they more organized • Setup through insertion • Lookup to find information • Traversal for doing something applicable to many elements • Reorganize to a more suitable set, then return to lookup IBM

  7. As a balanced binary tree known as red-black trees O(logn) for insertion and deletion O(logn) for lookups(find) O(1) amortized cost for traversals via ++/-- iterators A set<int> iSet as a red-black tree Details of normal set implementation IBM

  8. Starting with sitr=iSet.begin() Advance (++sitr) will put it on node (2) after 1 link Advance again will put it on node (5) after 2 links Advance again will put it on node (7) after 1 link Advance again will put it on node (8) after 1 link Advance again will put it on node (11) after 3 links Advance again will put it on node (12) after 2 links Advance again will put it on node (14) after 1 link Advance again will put it on node (15) after 1 link Total is 12 links after 9 traversals = 1.3 links/traversal A set<int> iSet as a red-black tree What does O (1) amortized cost for ++/-- mean? IBM

  9. Our Implementation • Add a doubly-linked list on top of the red-black tree • Using _Next and _Prev pointers to the next sorted tree node in non-decreasing order and non-increasing order respectively • Now it is exactly Θ(1) for ++/-- operations • But insert and delete has added O(1) complexity, still within O(logn) needed by C++ Standard • Copy adds O(1) for every copied node IBM

  10. New in IBM xlC 10.1 compiler • Just released June 2008 with many new features • Compiler defined flag to enable • -D __IBM_FAST_SET_MAP_ITERATOR • Default is to not enable this behavior • Entire application must be compiled with this, or we can have erroneous behavior. IBM

  11. Results of our implementation • dealII, xalancbmk, omnetpp all use set and map • Only deallII and xalancbmk will benefit • Omnetpp use of set is cold • In peak mode (-O5 with profile directed feedback enabled) • Verified no cache effect IBM

  12. Other work and future investigation • All commercial implementations use some form of red-black tree • No commercial implementations use doubly-linked list to augment red-black tree • Some research use a B-tree • But it slows deletion compared to RB trees • Advised dealII author to switch to sorted vector instead of associative container IBM

  13. BACKUP IBM

  14. Insertion • Inserts a node _Z in a red-black tree • if it is left of a node _Y • Then update RB_Prev(_Y), _Y, _Z IBM

  15. Erasure • Delete node 5 • Need to modify _Left, _Right, _Parent pointers • Increment and Decrement only need to follow 1 link instead of multiple links IBM

  16. Copy • When we use = in C++, it will create a copy • Allocate new nodes, copy contents from source to destination tree • Scan from first to last node in new tree in sorted order • Set up _Prev and _Next pointers • Traversal requires multiple links using original Increment and decrement • Requires additional O(1) amortized time for every copied node IBM

More Related