1 / 24

Chameleon Automatic Selection of Collections

This research explores optimizing data storage in collections by automatically selecting the most efficient implementation based on space-time metrics, reducing collection bloat and runtime degradation in applications.

maryernest
Download Presentation

Chameleon Automatic Selection of Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ChameleonAutomatic Selection of Collections Ohad Shacham Martin Vechev Eran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi Bu

  2. Collections Set HashSet LinkedSet ArraySet LazySet Map HashMap LinkedMap ArrayMap LazyMap List LinkedList ArrayList LazyList • Abstract data types • Many implementations • Different space/time tradeoffs • Incompatible selection might lead to • runtime degradation • Space bloat – wasted space

  3. Collection Bloat Collection bloat is a non justified space overhead for storing data in collections List s = new ArrayList(); s.add(1); Bloat for s is 9 1

  4. Collection Bloat Collection-bloat is a serious problem in practice Observed to occupy 90% of the heap in real-world applications Hard to detect and fix Accumulation: death by a thousand cuts Correction: Need to correlate bloat to program code How to pick the right implementation? Minimize bloat But without degrading running time

  5. Our Vision Programmer declares the ADT to be used Set s = new Set(); Programmer defines what metric to optimize e.g. space-time Runtime automatically selects implementation based on metric Online: detect application usage of Set Online: select appropriate implementation of Set Set HashSet ArraySet LinkedSet …

  6. This Work Programmer defines the implementation to be used Set s = new HashSet(); Programmer defines what metric to optimize space-time product Space = Bloat Runtime suggests implementation based on metric Online: automatically detect application usage of HashSet() Online: automatically suggest alternative to HashSet() Offline: programmer modifies program accordingly e.g. Set s = new ArraySet();

  7. How Can We Calculate Bloat ? Data structure Bloat Occupied Data – Used Data Example: List s = new ArrayList(); s.add(1); Bloat for s is 9 1

  8. How to Detect Collection Bloat? Each collection maintains a field for used data Language runtime can find out actually occupied data Bloat = Occupied Data – Used Data Solution: Garbage Collector Computes Bloat Online Reads used data fields from collections Low-overhead: can work online in production

  9. Semantic Maps How Collections Communicate Information to GC Includes size and pointers to actual data fields Allows for trivial support of Custom Collections … int size … Object[] Array … … Used Data Occupied Data … elementCount … elementData … Used Data Occupied Data ArrayList ArrayList Semantic map HashMap Semantic map HashMap GC

  10. Example: Collections Bloat in TVLA

  11. Example: Collections Bloat in TVLA

  12. Example: Collections Bloat in TVLA Lower bound for bloat

  13. Fixing Bloat Must correlate all bloat stats to program point Need Trace Information Remember: do not want to degrade time

  14. Correlating Code and Bloat public final class ConcreteKAryPredicate extends ConcretePredicate { … public void modify() { … values = HashMapFactory.make(this.values); } … } public class GenericBlur extends Blur { … public void blur(TVS structure) { … Map invCanonicName =HashMapFactory.make(structure.nodes().size()); … } } public class HashMapFactory { public static Map make(int size) { return new HashMap(size); } } Ctx4 7% Ctx1 40% Ctx2 11% Ctx3 5% Ctx5 5% Ctx6 3% Ctx7 7% Ctx8 3% • Aggregate bloat potential per allocation context • Done by the garbage collector

  15. Trace Information Track Collection Usage in Library: Distribution of operations Distribution of size Aggregated per allocation context ctx1 Size = 7 Get = 3 Add = 9 …. ctx2 Size = 1 Contains = 100 Insert = 1 …. ctx3 Size = 103 Contains = 10041 Insert = 140 Remove = 20 … ctxi …. ….

  16. But how to choose the new Collection ? Rule Engine: user defined rules Input: Heap and Trace Statistics per-context Output: Suggested Collection for that context Rules based on trace and heap information HashMap: #contains < X  CollmaxSize < Y → ArrayMap HashMap: #contains < X  CollmaxSize < Y+10  %liveHeap > Z→ ArrayMap Rule Engine Hashmap: maxSize < X → ArrayMap LinkedList: NoListOp → ArrayList Hashmap:(#contains < X  CollmaxSize < Y+10  %liveHeap > Z ) → ArrayMap …

  17. ctx1 Size = 7 Get = 3 Add = 9 …. ctx2 Size = 1 Contains = 100 Insert = 1 …. Rule Engine Hashmap: maxSize < X → ArrayMapLinkedList: NoListOp → ArrayList Hashmap:(#contains < X  CollmaxSize < Y+10  %liveHeap > Z ) → ArrayMap … … … Overall Picture Potential report Recommendations Semantic Profiler Program Rules Semantic maps

  18. Correct Collection Bloat – Typical Usage Step 1: Profile for Bloat without Context Low-overhead, can run in production If problem detected, go to step 2 Automatic Step 2: Combine heap information with trace information per context Can switch automatically to step 2 from step 1 Higher-overhead than step 1 Automatic: prior to Chameleon - a manual step (very hard) Step 3: Suggest fixes to user based on rules Automatic Step 4: Programmer applies suggested fixes Manual

  19. Potential Potential Operations Operations Max 15 26 7 7 Avg 11.33 6.31 4.8 4.8 Stddev 1.36 5.05 1.17 1.17 Max 15 26 7 7 Avg 11.33 6.31 4.8 4.8 Stddev 1.36 5.05 1.17 1.17 Size Size Chameleon on TVLA 1: HashMap:tvla...HashMapFactory:31 ;tvla.core.base.BaseTVS:50 replace with ArrayMap … 4: ArrayList:BaseHashTVSSet:112; tvla...base.BaseHashTVSSet:60 set initial capacity

  20. Implementation Built on top of IBM’s JVM Modifications to Parallel Mark and Sweep GC Modular changes, readily applicable to other GCs Modifications to collection libraries Runtime overhead Detection Phase: Negligible Correction Phase: ~2x (due to cost of getting context) Can Use PCC by Bond & McKinley

  21. Experimental Results – Memory

  22. Experimental Results – Time

  23. Related Work • Large volume of work on SETL • Automatic data structure selection in SETL [Schonberg et. al., POPL'79] • SETL representation sublanguage [Dewar et. al, TOPLAS'79] • … • Bloat • The Causes of Bloat, The Limits of Health [ Mitchell and Sevitsky, OOPSLA’07]

  24. Summary • Collection selection is a real problem • Runtime penalty • Bloat • Chameleon integrates trace and heap information for choosing a collection implementation • based on predefined rules • Using Chameleon, reduced the footprint of several applications • Never degrading running time, often improving it • First step towards automatic collection selection as part of the runtime system

More Related